13 KiB
Despliegue de Rook‑Ceph en Kubernetes (SUSE) con 2 zonas + árbitro
Guía basada en el estado actual del clúster (A/B + arbiter), sin fase previa “sin árbitro”. Discos locales (Bluestore), distribución por zona, 3 MON (uno por zona) y 2 MGR (uno por site A y otro por site B). Pool RBD con size=4 (2+2 por zona) y min to=2.
1) Topología y requisitos
-
Nodos y zonas:
- site-a:
srvfkvm01,srvfkvm02 - site-b:
srvfkvm03,srvfkvm04 - arbiter:
srvfkvm05(sin OSDs)
- site-a:
-
Cada nodo de datos con 6 discos dedicados a Ceph (usar rutas persistentes
/dev/disk/by-id/...). -
Acceso a Internet desde los nodos.
kubectlcon permisos de admin. -
Versiones empleadas: Rook v1.18.x, Ceph v18 (Reef).
Objetivo de resiliencia: tolerar la caída completa de un site (A o B). El árbitro aloja MON (y opcionalmente MGR), no OSDs.
2) Etiquetar nodos por zona
# SITE A
kubectl label node srvfkvm01 topology.kubernetes.io/zone=site-a --overwrite
kubectl label node srvfkvm02 topology.kubernetes.io/zone=site-a --overwrite
# SITE B
kubectl label node srvfkvm03 topology.kubernetes.io/zone=site-b --overwrite
kubectl label node srvfkvm04 topology.kubernetes.io/zone=site-b --overwrite
# ÁRBITRO
kubectl label node srvfkvm05 topology.kubernetes.io/zone=arbiter --overwrite
3) Preparar discos (SUSE)
Instalar utilidades (en cada nodo de datos):
sudo zypper -n install gdisk util-linux
Limpiar de forma segura (ajusta IDs según cada host):
# Ejemplo genérico; usa *by-id* reales de cada nodo
for d in \
/dev/disk/by-id/wwn-...a \
/dev/disk/by-id/wwn-...b \
/dev/disk/by-id/wwn-...c \
/dev/disk/by-id/wwn-...d \
/dev/disk/by-id/wwn-...e \
/dev/disk/by-id/wwn-...f; do
echo ">>> $d"
sudo wipefs -a "$d" || true
# Cabecera 100MiB
sudo dd if=/dev/zero of="$d" bs=1M count=100 oflag=direct,dsync || true
# Cola 100MiB
real=$(readlink -f "$d"); dev=$(basename "$real")
sz=$(cat /sys/class/block/$dev/size); tail=$((100*1024*1024/512)); seek=$((sz - tail)); ((seek<0)) && seek=0
sudo dd if=/dev/zero of="$real" bs=512 seek="$seek" count="$tail" oflag=direct,dsync || true
sudo partprobe "$real" || true; sudo udevadm settle || true
done
Consejo: guarda las rutas by‑id exactas de cada nodo; son las que se usarán en el
CephCluster.
4) Instalar Rook (CRDs + operador)
kubectl create namespace rook-ceph || true
# CRDs + common + operator (Rook v1.18.x)
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/crds.yaml \
-f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/common.yaml \
-f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/operator.yaml
kubectl -n rook-ceph get pods | grep operator
Toolbox (útil para diagnosticar):
kubectl -n rook-ceph apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/toolbox.yaml
5) Manifiesto CephCluster (A/B + árbitro, OSDs solo en A/B)
Archivo cluster/ceph-cluster.yaml adaptado a tu entorno actual:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v18
dataDirHostPath: /var/lib/rook
dashboard:
enabled: true
mgr:
count: 2
mon:
count: 3
allowMultiplePerNode: false
placement:
# MGR repartidos entre site-a y site-b
mgr:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: ["site-a","site-b"]
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["rook-ceph-mgr"]
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: rook-ceph-mgr
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
# MON uno por zona (site-a, site-b, arbiter)
mon:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: ["site-a","site-b","arbiter"]
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: rook-ceph-mon
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
security:
cephx:
csi: {}
daemon: {}
rbdMirrorPeer: {}
storage:
useAllDevices: false
nodes:
- name: srvfkvm01
devices:
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5bb177a1716, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5dc196bd3a7, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5f81b10f7ef, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d6151cca8afd, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d62f1e5e9699, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d64f204b2405, config: {deviceClass: ssd}}
- name: srvfkvm02
devices:
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127eef88828273, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127f879197de32, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128081a076ba0c, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128114a93e33b9, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94300301281a7b1fc151a, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128235ba79d801, config: {deviceClass: ssd}}
- name: srvfkvm03
devices:
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128aef3bb4e0ae, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b0e3d8bc1dc, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b2b3f446dd7, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b4440c2d027, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b5e42510c2a, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b7d442e592c, config: {deviceClass: ssd}}
- name: srvfkvm04
devices:
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c003012887ebfca6752, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c0030128896e360075f, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288ac038600d4, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288c62acb6efc, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288e456c6d441, config: {deviceClass: ssd}}
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288f976534b4f, config: {deviceClass: ssd}}
Aplicar y verificar:
kubectl apply -f cluster/ceph-cluster.yaml
kubectl -n rook-ceph get pods
Nota: los MON deberían quedar uno en
site-a, otro ensite-by otro enarbiter; los MGR ensite-aysite-b. Los OSDs solo en A/B.
6) Activar Orchestrator (backend Rook)
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status
7) Pool RBD 2×2 por zona + StorageClass
pools/ceph-blockpool-rbd.yaml:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rbd-2x2-sites
namespace: rook-ceph
spec:
deviceClass: ssd
failureDomain: zone
replicated:
size: 4 # 2 por site (A/B)
minSize: 2
replicasPerFailureDomain: 2
subFailureDomain: host
requireSafeReplicaSize: true
parameters:
pg_autoscale_mode: "on"
storageclasses/rbd.yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: rbd-2x2-sites
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions: ["discard"]
Aplicar y comprobar:
kubectl apply -f pools/ceph-blockpool-rbd.yaml
kubectl apply -f storageclasses/rbd.yaml
# Verificaciones rápidas
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites size
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites min_size
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule dump rbd-2x2-sites -f json-pretty
La regla CRUSH generada elige zona y luego host (2 réplicas por zona). Con OSDs solo en A/B, el árbitro no aloja datos.
8) Dashboard por Ingress (opcional)
ingress/dashboard.yaml (backend HTTP:7000):
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ceph-dashboard
namespace: rook-ceph
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
ingressClassName: nginx
rules:
- host: ceph.example.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rook-ceph-mgr-dashboard
port:
number: 7000
Contraseña admin:
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d; echo
Crear usuario admin.c3s (el otro suele resetear la pass):
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc \
'echo -n "Pozuelo12345" | ceph dashboard ac-user-create admin.c3s administrator -i - && ceph dashboard ac-user-list'
9) Prueba de StorageClass (PVC + Pod)
tests/pvc.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-rbd
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
storageClassName: ceph-rbd
tests/pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: rbd-tester
spec:
containers:
- name: app
image: busybox
command: ["sh","-c","sleep 36000"]
volumeMounts:
- mountPath: /data
name: vol
volumes:
- name: vol
persistentVolumeClaim:
claimName: test-rbd
kubectl apply -f tests/pvc.yaml
kubectl apply -f tests/pod.yaml
kubectl exec -it rbd-tester -- sh -c 'df -h /data && dd if=/dev/zero of=/data/test.bin bs=1M count=100 && ls -lh /data'
10) Guardar manifiestos exactos desde el clúster
# CephCluster “limpio” sin campos efímeros
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml --show-managed-fields=false \
| yq 'del(.metadata.creationTimestamp,.metadata.generation,.metadata.resourceVersion,.metadata.uid,.status)' \
> ceph-cluster-export.yaml
# Pool y StorageClass
kubectl -n rook-ceph get cephblockpool rbd-2x2-sites -o yaml > ceph-blockpool-export.yaml
kubectl get sc ceph-rbd -o yaml > storageclass-rbd-export.yaml
11) Troubleshooting breve
-
MON no se reprograma tras borrar uno: el operador necesita que el quórum quede seguro. Revisa
rook-ceph-mon-endpoints,deployment/rook-ceph-mon-*yop-monen logs del operador. -
OSDs detectados como HDD vía HBA: puedes forzar
deviceClass: ssdpor disco (como en elCephCluster) o, ya desplegado, ajustar conceph osd crush set-device-class ssd osd.N. -
Dashboard “Orchestrator is not available”:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status
Fin
Con esto dispones de un despliegue Rook‑Ceph alineado con la realidad actual: 2 zonas de datos + árbitro, 3 MON (uno por zona), 2 MGR (A/B), OSDs solo en A/B, y un pool RBD con réplicas 2+2 por zona. ¡Listo para producción y ampliaciones futuras!