Files
kubernetes/rook

Despliegue de RookCeph en Kubernetes (SUSE) con 2 zonas + árbitro

Guía basada en el estado actual del clúster (A/B + arbiter), sin fase previa “sin árbitro”. Discos locales (Bluestore), distribución por zona, 3 MON (uno por zona) y 2 MGR (uno por site A y otro por site B). Pool RBD con size=4 (2+2 por zona) y min to=2.


1) Topología y requisitos

  • Nodos y zonas:

    • site-a: srvfkvm01, srvfkvm02
    • site-b: srvfkvm03, srvfkvm04
    • arbiter: srvfkvm05 (sin OSDs)
  • Cada nodo de datos con 6 discos dedicados a Ceph (usar rutas persistentes /dev/disk/by-id/...).

  • Acceso a Internet desde los nodos. kubectl con permisos de admin.

  • Versiones empleadas: Rook v1.18.x, Ceph v18 (Reef).

Objetivo de resiliencia: tolerar la caída completa de un site (A o B). El árbitro aloja MON (y opcionalmente MGR), no OSDs.


2) Etiquetar nodos por zona

# SITE A
kubectl label node srvfkvm01 topology.kubernetes.io/zone=site-a --overwrite
kubectl label node srvfkvm02 topology.kubernetes.io/zone=site-a --overwrite

# SITE B
kubectl label node srvfkvm03 topology.kubernetes.io/zone=site-b --overwrite
kubectl label node srvfkvm04 topology.kubernetes.io/zone=site-b --overwrite

# ÁRBITRO
kubectl label node srvfkvm05 topology.kubernetes.io/zone=arbiter --overwrite

3) Preparar discos (SUSE)

Instalar utilidades (en cada nodo de datos):

sudo zypper -n install gdisk util-linux

Limpiar de forma segura (ajusta IDs según cada host):

# Ejemplo genérico; usa *by-id* reales de cada nodo
for d in \
  /dev/disk/by-id/wwn-...a \
  /dev/disk/by-id/wwn-...b \
  /dev/disk/by-id/wwn-...c \
  /dev/disk/by-id/wwn-...d \
  /dev/disk/by-id/wwn-...e \
  /dev/disk/by-id/wwn-...f; do
  echo ">>> $d"
  sudo wipefs -a "$d" || true
  # Cabecera 100MiB
  sudo dd if=/dev/zero of="$d" bs=1M count=100 oflag=direct,dsync || true
  # Cola 100MiB
  real=$(readlink -f "$d"); dev=$(basename "$real")
  sz=$(cat /sys/class/block/$dev/size); tail=$((100*1024*1024/512)); seek=$((sz - tail)); ((seek<0)) && seek=0
  sudo dd if=/dev/zero of="$real" bs=512 seek="$seek" count="$tail" oflag=direct,dsync || true
  sudo partprobe "$real" || true; sudo udevadm settle || true
done

Consejo: guarda las rutas byid exactas de cada nodo; son las que se usarán en el CephCluster.


4) Instalar Rook (CRDs + operador)

kubectl create namespace rook-ceph || true

# CRDs + common + operator (Rook v1.18.x)
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/crds.yaml \
             -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/common.yaml \
             -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/operator.yaml

kubectl -n rook-ceph get pods | grep operator

Toolbox (útil para diagnosticar):

kubectl -n rook-ceph apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/toolbox.yaml

5) Manifiesto CephCluster (A/B + árbitro, OSDs solo en A/B)

Archivo cluster/ceph-cluster.yaml adaptado a tu entorno actual:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18
  dataDirHostPath: /var/lib/rook

  dashboard:
    enabled: true

  mgr:
    count: 2

  mon:
    count: 3
    allowMultiplePerNode: false

  placement:
    # MGR repartidos entre site-a y site-b
    mgr:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: topology.kubernetes.io/zone
                  operator: In
                  values: ["site-a","site-b"]
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values: ["rook-ceph-mgr"]
            topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              app: rook-ceph-mgr
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule

    # MON uno por zona (site-a, site-b, arbiter)
    mon:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: topology.kubernetes.io/zone
                  operator: In
                  values: ["site-a","site-b","arbiter"]
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              app: rook-ceph-mon
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule

  security:
    cephx:
      csi: {}
      daemon: {}
      rbdMirrorPeer: {}

  storage:
    useAllDevices: false
    nodes:
      - name: srvfkvm01
        devices:
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5bb177a1716, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5dc196bd3a7, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5f81b10f7ef, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d6151cca8afd, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d62f1e5e9699, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d64f204b2405, config: {deviceClass: ssd}}
      - name: srvfkvm02
        devices:
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127eef88828273, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127f879197de32, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128081a076ba0c, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128114a93e33b9, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94300301281a7b1fc151a, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128235ba79d801, config: {deviceClass: ssd}}
      - name: srvfkvm03
        devices:
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128aef3bb4e0ae, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b0e3d8bc1dc, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b2b3f446dd7, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b4440c2d027, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b5e42510c2a, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b7d442e592c, config: {deviceClass: ssd}}
      - name: srvfkvm04
        devices:
          - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c003012887ebfca6752, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c0030128896e360075f, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288ac038600d4, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288c62acb6efc, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288e456c6d441, config: {deviceClass: ssd}}
          - { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288f976534b4f, config: {deviceClass: ssd}}

Aplicar y verificar:

kubectl apply -f cluster/ceph-cluster.yaml
kubectl -n rook-ceph get pods

Nota: los MON deberían quedar uno en site-a, otro en site-b y otro en arbiter; los MGR en site-a y site-b. Los OSDs solo en A/B.


6) Activar Orchestrator (backend Rook)

kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status

7) Pool RBD 2×2 por zona + StorageClass

pools/ceph-blockpool-rbd.yaml:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: rbd-2x2-sites
  namespace: rook-ceph
spec:
  deviceClass: ssd
  failureDomain: zone
  replicated:
    size: 4                   # 2 por site (A/B)
    minSize: 2
    replicasPerFailureDomain: 2
    subFailureDomain: host
    requireSafeReplicaSize: true
  parameters:
    pg_autoscale_mode: "on"

storageclasses/rbd.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: rbd-2x2-sites
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions: ["discard"]

Aplicar y comprobar:

kubectl apply -f pools/ceph-blockpool-rbd.yaml
kubectl apply -f storageclasses/rbd.yaml

# Verificaciones rápidas
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites size
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites min_size
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule dump rbd-2x2-sites -f json-pretty

La regla CRUSH generada elige zona y luego host (2 réplicas por zona). Con OSDs solo en A/B, el árbitro no aloja datos.


8) Dashboard por Ingress (opcional)

ingress/dashboard.yaml (backend HTTP:7000):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ceph-dashboard
  namespace: rook-ceph
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
  ingressClassName: nginx
  rules:
    - host: ceph.example.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: rook-ceph-mgr-dashboard
                port:
                  number: 7000

Contraseña admin:

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d; echo

Crear usuario admin.c3s (el otro suele resetear la pass):

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc \
'echo -n "Pozuelo12345" | ceph dashboard ac-user-create admin.c3s administrator -i - && ceph dashboard ac-user-list'

9) Prueba de StorageClass (PVC + Pod)

tests/pvc.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-rbd
spec:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 5Gi
  storageClassName: ceph-rbd

tests/pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: rbd-tester
spec:
  containers:
    - name: app
      image: busybox
      command: ["sh","-c","sleep 36000"]
      volumeMounts:
        - mountPath: /data
          name: vol
  volumes:
    - name: vol
      persistentVolumeClaim:
        claimName: test-rbd
kubectl apply -f tests/pvc.yaml
kubectl apply -f tests/pod.yaml
kubectl exec -it rbd-tester -- sh -c 'df -h /data && dd if=/dev/zero of=/data/test.bin bs=1M count=100 && ls -lh /data'

10) Guardar manifiestos exactos desde el clúster

# CephCluster “limpio” sin campos efímeros
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml --show-managed-fields=false \
 | yq 'del(.metadata.creationTimestamp,.metadata.generation,.metadata.resourceVersion,.metadata.uid,.status)' \
 > ceph-cluster-export.yaml

# Pool y StorageClass
kubectl -n rook-ceph get cephblockpool rbd-2x2-sites -o yaml > ceph-blockpool-export.yaml
kubectl get sc ceph-rbd -o yaml > storageclass-rbd-export.yaml

11) Troubleshooting breve

  • MON no se reprograma tras borrar uno: el operador necesita que el quórum quede seguro. Revisa rook-ceph-mon-endpoints, deployment/rook-ceph-mon-* y op-mon en logs del operador.

  • OSDs detectados como HDD vía HBA: puedes forzar deviceClass: ssd por disco (como en el CephCluster) o, ya desplegado, ajustar con ceph osd crush set-device-class ssd osd.N.

  • Dashboard “Orchestrator is not available”:

    kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook
    kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status
    

Fin

Con esto dispones de un despliegue RookCeph alineado con la realidad actual: 2 zonas de datos + árbitro, 3 MON (uno por zona), 2 MGR (A/B), OSDs solo en A/B, y un pool RBD con réplicas 2+2 por zona. ¡Listo para producción y ampliaciones futuras!