# Despliegue de **Rook‑Ceph** en Kubernetes (SUSE) con 2 zonas + **árbitro** > Guía basada en el estado **actual** del clúster (A/B + *arbiter*), sin fase previa “sin árbitro”. Discos locales (Bluestore), distribución por **zona**, 3 MON (uno por zona) y 2 MGR (uno por site A y otro por site B). Pool RBD con **size=4** (2+2 por zona) y **min > to**=2. --- ## 1) Topología y requisitos * Nodos y zonas: * **site-a**: `srvfkvm01`, `srvfkvm02` * **site-b**: `srvfkvm03`, `srvfkvm04` * **arbiter**: `srvfkvm05` *(sin OSDs)* * Cada nodo de datos con **6 discos** dedicados a Ceph (usar rutas persistentes `/dev/disk/by-id/...`). * Acceso a Internet desde los nodos. `kubectl` con permisos de admin. * Versiones empleadas: **Rook v1.18.x**, **Ceph v18 (Reef)**. > **Objetivo de resiliencia**: tolerar la caída completa de un site (A **o** B). El árbitro aloja MON (y opcionalmente MGR), **no** OSDs. --- ## 2) Etiquetar nodos por **zona** ```bash # SITE A kubectl label node srvfkvm01 topology.kubernetes.io/zone=site-a --overwrite kubectl label node srvfkvm02 topology.kubernetes.io/zone=site-a --overwrite # SITE B kubectl label node srvfkvm03 topology.kubernetes.io/zone=site-b --overwrite kubectl label node srvfkvm04 topology.kubernetes.io/zone=site-b --overwrite # ÁRBITRO kubectl label node srvfkvm05 topology.kubernetes.io/zone=arbiter --overwrite ``` --- ## 3) Preparar discos (SUSE) Instalar utilidades (en **cada nodo de datos**): ```bash sudo zypper -n install gdisk util-linux ``` Limpiar de forma segura (ajusta IDs según cada host): ```bash # Ejemplo genérico; usa *by-id* reales de cada nodo for d in \ /dev/disk/by-id/wwn-...a \ /dev/disk/by-id/wwn-...b \ /dev/disk/by-id/wwn-...c \ /dev/disk/by-id/wwn-...d \ /dev/disk/by-id/wwn-...e \ /dev/disk/by-id/wwn-...f; do echo ">>> $d" sudo wipefs -a "$d" || true # Cabecera 100MiB sudo dd if=/dev/zero of="$d" bs=1M count=100 oflag=direct,dsync || true # Cola 100MiB real=$(readlink -f "$d"); dev=$(basename "$real") sz=$(cat /sys/class/block/$dev/size); tail=$((100*1024*1024/512)); seek=$((sz - tail)); ((seek<0)) && seek=0 sudo dd if=/dev/zero of="$real" bs=512 seek="$seek" count="$tail" oflag=direct,dsync || true sudo partprobe "$real" || true; sudo udevadm settle || true done ``` > **Consejo**: guarda las rutas *by‑id* exactas de cada nodo; son las que se usarán en el `CephCluster`. --- ## 4) Instalar Rook (CRDs + operador) ```bash kubectl create namespace rook-ceph || true # CRDs + common + operator (Rook v1.18.x) kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/crds.yaml \ -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/common.yaml \ -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/operator.yaml kubectl -n rook-ceph get pods | grep operator ``` > **Toolbox** (útil para diagnosticar): > > ```bash > kubectl -n rook-ceph apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/toolbox.yaml > ``` --- ## 5) Manifiesto **CephCluster** (A/B + árbitro, OSDs solo en A/B) Archivo `cluster/ceph-cluster.yaml` **adaptado a tu entorno actual**: ```yaml apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: name: rook-ceph namespace: rook-ceph spec: cephVersion: image: quay.io/ceph/ceph:v18 dataDirHostPath: /var/lib/rook dashboard: enabled: true mgr: count: 2 mon: count: 3 allowMultiplePerNode: false placement: # MGR repartidos entre site-a y site-b mgr: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: ["site-a","site-b"] podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: ["rook-ceph-mgr"] topologyKey: kubernetes.io/hostname topologySpreadConstraints: - labelSelector: matchLabels: app: rook-ceph-mgr maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule # MON uno por zona (site-a, site-b, arbiter) mon: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: ["site-a","site-b","arbiter"] topologySpreadConstraints: - labelSelector: matchLabels: app: rook-ceph-mon maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule security: cephx: csi: {} daemon: {} rbdMirrorPeer: {} storage: useAllDevices: false nodes: - name: srvfkvm01 devices: - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5bb177a1716, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5dc196bd3a7, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5f81b10f7ef, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d6151cca8afd, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d62f1e5e9699, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d64f204b2405, config: {deviceClass: ssd}} - name: srvfkvm02 devices: - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127eef88828273, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127f879197de32, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128081a076ba0c, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128114a93e33b9, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94300301281a7b1fc151a, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128235ba79d801, config: {deviceClass: ssd}} - name: srvfkvm03 devices: - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128aef3bb4e0ae, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b0e3d8bc1dc, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b2b3f446dd7, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b4440c2d027, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b5e42510c2a, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b7d442e592c, config: {deviceClass: ssd}} - name: srvfkvm04 devices: - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c003012887ebfca6752, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c0030128896e360075f, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288ac038600d4, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288c62acb6efc, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288e456c6d441, config: {deviceClass: ssd}} - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288f976534b4f, config: {deviceClass: ssd}} ``` Aplicar y verificar: ```bash kubectl apply -f cluster/ceph-cluster.yaml kubectl -n rook-ceph get pods ``` > **Nota**: los MON deberían quedar uno en `site-a`, otro en `site-b` y otro en `arbiter`; los MGR en `site-a` y `site-b`. Los OSDs solo en A/B. --- ## 6) Activar **Orchestrator** (backend Rook) ```bash kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status ``` --- ## 7) Pool **RBD** 2×2 por **zona** + StorageClass `pools/ceph-blockpool-rbd.yaml`: ```yaml apiVersion: ceph.rook.io/v1 kind: CephBlockPool metadata: name: rbd-2x2-sites namespace: rook-ceph spec: deviceClass: ssd failureDomain: zone replicated: size: 4 # 2 por site (A/B) minSize: 2 replicasPerFailureDomain: 2 subFailureDomain: host requireSafeReplicaSize: true parameters: pg_autoscale_mode: "on" ``` `storageclasses/rbd.yaml`: ```yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: rook-ceph.rbd.csi.ceph.com parameters: clusterID: rook-ceph pool: rbd-2x2-sites imageFormat: "2" imageFeatures: layering csi.storage.k8s.io/fstype: ext4 csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: ["discard"] ``` Aplicar y comprobar: ```bash kubectl apply -f pools/ceph-blockpool-rbd.yaml kubectl apply -f storageclasses/rbd.yaml # Verificaciones rápidas kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites size kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites min_size kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule dump rbd-2x2-sites -f json-pretty ``` > La regla CRUSH generada elige **zona** y luego **host** (2 réplicas por zona). Con OSDs solo en A/B, el árbitro **no** aloja datos. --- ## 8) Dashboard por **Ingress** (opcional) `ingress/dashboard.yaml` (backend HTTP:7000): ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ceph-dashboard namespace: rook-ceph annotations: nginx.ingress.kubernetes.io/backend-protocol: "HTTP" spec: ingressClassName: nginx rules: - host: ceph.example.local http: paths: - path: / pathType: Prefix backend: service: name: rook-ceph-mgr-dashboard port: number: 7000 ``` Contraseña admin: ```bash kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d; echo ``` Crear usuario admin.c3s (el otro suele resetear la pass): ```bash kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc \ 'echo -n "Pozuelo12345" | ceph dashboard ac-user-create admin.c3s administrator -i - && ceph dashboard ac-user-list' ``` --- ## 9) Prueba de StorageClass (PVC + Pod) `tests/pvc.yaml`: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-rbd spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 5Gi storageClassName: ceph-rbd ``` `tests/pod.yaml`: ```yaml apiVersion: v1 kind: Pod metadata: name: rbd-tester spec: containers: - name: app image: busybox command: ["sh","-c","sleep 36000"] volumeMounts: - mountPath: /data name: vol volumes: - name: vol persistentVolumeClaim: claimName: test-rbd ``` ```bash kubectl apply -f tests/pvc.yaml kubectl apply -f tests/pod.yaml kubectl exec -it rbd-tester -- sh -c 'df -h /data && dd if=/dev/zero of=/data/test.bin bs=1M count=100 && ls -lh /data' ``` --- ## 10) Guardar manifiestos exactos desde el clúster ```bash # CephCluster “limpio” sin campos efímeros kubectl -n rook-ceph get cephcluster rook-ceph -o yaml --show-managed-fields=false \ | yq 'del(.metadata.creationTimestamp,.metadata.generation,.metadata.resourceVersion,.metadata.uid,.status)' \ > ceph-cluster-export.yaml # Pool y StorageClass kubectl -n rook-ceph get cephblockpool rbd-2x2-sites -o yaml > ceph-blockpool-export.yaml kubectl get sc ceph-rbd -o yaml > storageclass-rbd-export.yaml ``` --- ## 11) Troubleshooting breve * **MON no se reprograma** tras borrar uno: el operador necesita que el **quórum** quede seguro. Revisa `rook-ceph-mon-endpoints`, `deployment/rook-ceph-mon-*` y `op-mon` en logs del operador. * **OSDs detectados como HDD** vía HBA: puedes forzar `deviceClass: ssd` por disco (como en el `CephCluster`) o, ya desplegado, ajustar con `ceph osd crush set-device-class ssd osd.N`. * **Dashboard “Orchestrator is not available”**: ```bash kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status ``` --- ### Fin Con esto dispones de un despliegue Rook‑Ceph alineado con la realidad actual: 2 zonas de datos + árbitro, 3 MON (uno por zona), 2 MGR (A/B), OSDs solo en A/B, y un pool RBD con réplicas **2+2** por zona. ¡Listo para producción y ampliaciones futuras!