Files
kubernetes/rook/readme.md

412 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Despliegue de **RookCeph** en Kubernetes (SUSE) con 2 zonas + **árbitro**
> Guía basada en el estado **actual** del clúster (A/B + *arbiter*), sin fase previa “sin árbitro”. Discos locales (Bluestore), distribución por **zona**, 3 MON (uno por zona) y 2 MGR (uno por site A y otro por site B). Pool RBD con **size=4** (2+2 por zona) y **min
> to**=2.
---
## 1) Topología y requisitos
* Nodos y zonas:
* **site-a**: `srvfkvm01`, `srvfkvm02`
* **site-b**: `srvfkvm03`, `srvfkvm04`
* **arbiter**: `srvfkvm05` *(sin OSDs)*
* Cada nodo de datos con **6 discos** dedicados a Ceph (usar rutas persistentes `/dev/disk/by-id/...`).
* Acceso a Internet desde los nodos. `kubectl` con permisos de admin.
* Versiones empleadas: **Rook v1.18.x**, **Ceph v18 (Reef)**.
> **Objetivo de resiliencia**: tolerar la caída completa de un site (A **o** B). El árbitro aloja MON (y opcionalmente MGR), **no** OSDs.
---
## 2) Etiquetar nodos por **zona**
```bash
# SITE A
kubectl label node srvfkvm01 topology.kubernetes.io/zone=site-a --overwrite
kubectl label node srvfkvm02 topology.kubernetes.io/zone=site-a --overwrite
# SITE B
kubectl label node srvfkvm03 topology.kubernetes.io/zone=site-b --overwrite
kubectl label node srvfkvm04 topology.kubernetes.io/zone=site-b --overwrite
# ÁRBITRO
kubectl label node srvfkvm05 topology.kubernetes.io/zone=arbiter --overwrite
```
---
## 3) Preparar discos (SUSE)
Instalar utilidades (en **cada nodo de datos**):
```bash
sudo zypper -n install gdisk util-linux
```
Limpiar de forma segura (ajusta IDs según cada host):
```bash
# Ejemplo genérico; usa *by-id* reales de cada nodo
for d in \
/dev/disk/by-id/wwn-...a \
/dev/disk/by-id/wwn-...b \
/dev/disk/by-id/wwn-...c \
/dev/disk/by-id/wwn-...d \
/dev/disk/by-id/wwn-...e \
/dev/disk/by-id/wwn-...f; do
echo ">>> $d"
sudo wipefs -a "$d" || true
# Cabecera 100MiB
sudo dd if=/dev/zero of="$d" bs=1M count=100 oflag=direct,dsync || true
# Cola 100MiB
real=$(readlink -f "$d"); dev=$(basename "$real")
sz=$(cat /sys/class/block/$dev/size); tail=$((100*1024*1024/512)); seek=$((sz - tail)); ((seek<0)) && seek=0
sudo dd if=/dev/zero of="$real" bs=512 seek="$seek" count="$tail" oflag=direct,dsync || true
sudo partprobe "$real" || true; sudo udevadm settle || true
done
```
> **Consejo**: guarda las rutas *byid* exactas de cada nodo; son las que se usarán en el `CephCluster`.
---
## 4) Instalar Rook (CRDs + operador)
```bash
kubectl create namespace rook-ceph || true
# CRDs + common + operator (Rook v1.18.x)
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/crds.yaml \
-f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/common.yaml \
-f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/operator.yaml
kubectl -n rook-ceph get pods | grep operator
```
> **Toolbox** (útil para diagnosticar):
>
> ```bash
> kubectl -n rook-ceph apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/toolbox.yaml
> ```
---
## 5) Manifiesto **CephCluster** (A/B + árbitro, OSDs solo en A/B)
Archivo `cluster/ceph-cluster.yaml` **adaptado a tu entorno actual**:
```yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v18
dataDirHostPath: /var/lib/rook
dashboard:
enabled: true
mgr:
count: 2
mon:
count: 3
allowMultiplePerNode: false
placement:
# MGR repartidos entre site-a y site-b
mgr:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: ["site-a","site-b"]
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["rook-ceph-mgr"]
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: rook-ceph-mgr
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
# MON uno por zona (site-a, site-b, arbiter)
mon:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: ["site-a","site-b","arbiter"]
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: rook-ceph-mon
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
security:
cephx:
csi: {}
daemon: {}
rbdMirrorPeer: {}
storage:
useAllDevices: false
nodes:
- name: srvfkvm01
devices:
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5bb177a1716, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5dc196bd3a7, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5f81b10f7ef, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d6151cca8afd, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d62f1e5e9699, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d64f204b2405, config: {deviceClass: ssd}}
- name: srvfkvm02
devices:
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127eef88828273, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127f879197de32, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128081a076ba0c, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128114a93e33b9, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94300301281a7b1fc151a, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128235ba79d801, config: {deviceClass: ssd}}
- name: srvfkvm03
devices:
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128aef3bb4e0ae, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b0e3d8bc1dc, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b2b3f446dd7, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b4440c2d027, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b5e42510c2a, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b7d442e592c, config: {deviceClass: ssd}}
- name: srvfkvm04
devices:
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c003012887ebfca6752, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c0030128896e360075f, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288ac038600d4, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288c62acb6efc, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288e456c6d441, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288f976534b4f, config: {deviceClass: ssd}}
```
Aplicar y verificar:
```bash
kubectl apply -f cluster/ceph-cluster.yaml
kubectl -n rook-ceph get pods
```
> **Nota**: los MON deberían quedar uno en `site-a`, otro en `site-b` y otro en `arbiter`; los MGR en `site-a` y `site-b`. Los OSDs solo en A/B.
---
## 6) Activar **Orchestrator** (backend Rook)
```bash
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status
```
---
## 7) Pool **RBD** 2×2 por **zona** + StorageClass
`pools/ceph-blockpool-rbd.yaml`:
```yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rbd-2x2-sites
namespace: rook-ceph
spec:
deviceClass: ssd
failureDomain: zone
replicated:
size: 4 # 2 por site (A/B)
minSize: 2
replicasPerFailureDomain: 2
subFailureDomain: host
requireSafeReplicaSize: true
parameters:
pg_autoscale_mode: "on"
```
`storageclasses/rbd.yaml`:
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: rbd-2x2-sites
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions: ["discard"]
```
Aplicar y comprobar:
```bash
kubectl apply -f pools/ceph-blockpool-rbd.yaml
kubectl apply -f storageclasses/rbd.yaml
# Verificaciones rápidas
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites size
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites min_size
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule dump rbd-2x2-sites -f json-pretty
```
> La regla CRUSH generada elige **zona** y luego **host** (2 réplicas por zona). Con OSDs solo en A/B, el árbitro **no** aloja datos.
---
## 8) Dashboard por **Ingress** (opcional)
`ingress/dashboard.yaml` (backend HTTP:7000):
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ceph-dashboard
namespace: rook-ceph
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
ingressClassName: nginx
rules:
- host: ceph.example.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rook-ceph-mgr-dashboard
port:
number: 7000
```
Contraseña admin:
```bash
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d; echo
```
Crear usuario admin.c3s (el otro suele resetear la pass):
```bash
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc \
'echo -n "Pozuelo12345" | ceph dashboard ac-user-create admin.c3s administrator -i - && ceph dashboard ac-user-list'
```
---
## 9) Prueba de StorageClass (PVC + Pod)
`tests/pvc.yaml`:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-rbd
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
storageClassName: ceph-rbd
```
`tests/pod.yaml`:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: rbd-tester
spec:
containers:
- name: app
image: busybox
command: ["sh","-c","sleep 36000"]
volumeMounts:
- mountPath: /data
name: vol
volumes:
- name: vol
persistentVolumeClaim:
claimName: test-rbd
```
```bash
kubectl apply -f tests/pvc.yaml
kubectl apply -f tests/pod.yaml
kubectl exec -it rbd-tester -- sh -c 'df -h /data && dd if=/dev/zero of=/data/test.bin bs=1M count=100 && ls -lh /data'
```
---
## 10) Guardar manifiestos exactos desde el clúster
```bash
# CephCluster “limpio” sin campos efímeros
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml --show-managed-fields=false \
| yq 'del(.metadata.creationTimestamp,.metadata.generation,.metadata.resourceVersion,.metadata.uid,.status)' \
> ceph-cluster-export.yaml
# Pool y StorageClass
kubectl -n rook-ceph get cephblockpool rbd-2x2-sites -o yaml > ceph-blockpool-export.yaml
kubectl get sc ceph-rbd -o yaml > storageclass-rbd-export.yaml
```
---
## 11) Troubleshooting breve
* **MON no se reprograma** tras borrar uno: el operador necesita que el **quórum** quede seguro. Revisa `rook-ceph-mon-endpoints`, `deployment/rook-ceph-mon-*` y `op-mon` en logs del operador.
* **OSDs detectados como HDD** vía HBA: puedes forzar `deviceClass: ssd` por disco (como en el `CephCluster`) o, ya desplegado, ajustar con `ceph osd crush set-device-class ssd osd.N`.
* **Dashboard “Orchestrator is not available”**:
```bash
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status
```
---
### Fin
Con esto dispones de un despliegue RookCeph alineado con la realidad actual: 2 zonas de datos + árbitro, 3 MON (uno por zona), 2 MGR (A/B), OSDs solo en A/B, y un pool RBD con réplicas **2+2** por zona. ¡Listo para producción y ampliaciones futuras!