412 lines
13 KiB
Markdown
412 lines
13 KiB
Markdown
# Despliegue de **Rook‑Ceph** en Kubernetes (SUSE) con 2 zonas + **árbitro**
|
||
|
||
> Guía basada en el estado **actual** del clúster (A/B + *arbiter*), sin fase previa “sin árbitro”. Discos locales (Bluestore), distribución por **zona**, 3 MON (uno por zona) y 2 MGR (uno por site A y otro por site B). Pool RBD con **size=4** (2+2 por zona) y **min
|
||
> to**=2.
|
||
|
||
---
|
||
|
||
## 1) Topología y requisitos
|
||
|
||
* Nodos y zonas:
|
||
|
||
* **site-a**: `srvfkvm01`, `srvfkvm02`
|
||
* **site-b**: `srvfkvm03`, `srvfkvm04`
|
||
* **arbiter**: `srvfkvm05` *(sin OSDs)*
|
||
* Cada nodo de datos con **6 discos** dedicados a Ceph (usar rutas persistentes `/dev/disk/by-id/...`).
|
||
* Acceso a Internet desde los nodos. `kubectl` con permisos de admin.
|
||
* Versiones empleadas: **Rook v1.18.x**, **Ceph v18 (Reef)**.
|
||
|
||
> **Objetivo de resiliencia**: tolerar la caída completa de un site (A **o** B). El árbitro aloja MON (y opcionalmente MGR), **no** OSDs.
|
||
|
||
---
|
||
|
||
## 2) Etiquetar nodos por **zona**
|
||
|
||
```bash
|
||
# SITE A
|
||
kubectl label node srvfkvm01 topology.kubernetes.io/zone=site-a --overwrite
|
||
kubectl label node srvfkvm02 topology.kubernetes.io/zone=site-a --overwrite
|
||
|
||
# SITE B
|
||
kubectl label node srvfkvm03 topology.kubernetes.io/zone=site-b --overwrite
|
||
kubectl label node srvfkvm04 topology.kubernetes.io/zone=site-b --overwrite
|
||
|
||
# ÁRBITRO
|
||
kubectl label node srvfkvm05 topology.kubernetes.io/zone=arbiter --overwrite
|
||
```
|
||
|
||
---
|
||
|
||
## 3) Preparar discos (SUSE)
|
||
|
||
Instalar utilidades (en **cada nodo de datos**):
|
||
|
||
```bash
|
||
sudo zypper -n install gdisk util-linux
|
||
```
|
||
|
||
Limpiar de forma segura (ajusta IDs según cada host):
|
||
|
||
```bash
|
||
# Ejemplo genérico; usa *by-id* reales de cada nodo
|
||
for d in \
|
||
/dev/disk/by-id/wwn-...a \
|
||
/dev/disk/by-id/wwn-...b \
|
||
/dev/disk/by-id/wwn-...c \
|
||
/dev/disk/by-id/wwn-...d \
|
||
/dev/disk/by-id/wwn-...e \
|
||
/dev/disk/by-id/wwn-...f; do
|
||
echo ">>> $d"
|
||
sudo wipefs -a "$d" || true
|
||
# Cabecera 100MiB
|
||
sudo dd if=/dev/zero of="$d" bs=1M count=100 oflag=direct,dsync || true
|
||
# Cola 100MiB
|
||
real=$(readlink -f "$d"); dev=$(basename "$real")
|
||
sz=$(cat /sys/class/block/$dev/size); tail=$((100*1024*1024/512)); seek=$((sz - tail)); ((seek<0)) && seek=0
|
||
sudo dd if=/dev/zero of="$real" bs=512 seek="$seek" count="$tail" oflag=direct,dsync || true
|
||
sudo partprobe "$real" || true; sudo udevadm settle || true
|
||
done
|
||
```
|
||
|
||
> **Consejo**: guarda las rutas *by‑id* exactas de cada nodo; son las que se usarán en el `CephCluster`.
|
||
|
||
---
|
||
|
||
## 4) Instalar Rook (CRDs + operador)
|
||
|
||
```bash
|
||
kubectl create namespace rook-ceph || true
|
||
|
||
# CRDs + common + operator (Rook v1.18.x)
|
||
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/crds.yaml \
|
||
-f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/common.yaml \
|
||
-f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/operator.yaml
|
||
|
||
kubectl -n rook-ceph get pods | grep operator
|
||
```
|
||
|
||
> **Toolbox** (útil para diagnosticar):
|
||
>
|
||
> ```bash
|
||
> kubectl -n rook-ceph apply -f https://raw.githubusercontent.com/rook/rook/v1.18.0/deploy/examples/toolbox.yaml
|
||
> ```
|
||
|
||
---
|
||
|
||
## 5) Manifiesto **CephCluster** (A/B + árbitro, OSDs solo en A/B)
|
||
|
||
Archivo `cluster/ceph-cluster.yaml` **adaptado a tu entorno actual**:
|
||
|
||
```yaml
|
||
apiVersion: ceph.rook.io/v1
|
||
kind: CephCluster
|
||
metadata:
|
||
name: rook-ceph
|
||
namespace: rook-ceph
|
||
spec:
|
||
cephVersion:
|
||
image: quay.io/ceph/ceph:v18
|
||
dataDirHostPath: /var/lib/rook
|
||
|
||
dashboard:
|
||
enabled: true
|
||
|
||
mgr:
|
||
count: 2
|
||
|
||
mon:
|
||
count: 3
|
||
allowMultiplePerNode: false
|
||
|
||
placement:
|
||
# MGR repartidos entre site-a y site-b
|
||
mgr:
|
||
nodeAffinity:
|
||
requiredDuringSchedulingIgnoredDuringExecution:
|
||
nodeSelectorTerms:
|
||
- matchExpressions:
|
||
- key: topology.kubernetes.io/zone
|
||
operator: In
|
||
values: ["site-a","site-b"]
|
||
podAntiAffinity:
|
||
requiredDuringSchedulingIgnoredDuringExecution:
|
||
- labelSelector:
|
||
matchExpressions:
|
||
- key: app
|
||
operator: In
|
||
values: ["rook-ceph-mgr"]
|
||
topologyKey: kubernetes.io/hostname
|
||
topologySpreadConstraints:
|
||
- labelSelector:
|
||
matchLabels:
|
||
app: rook-ceph-mgr
|
||
maxSkew: 1
|
||
topologyKey: topology.kubernetes.io/zone
|
||
whenUnsatisfiable: DoNotSchedule
|
||
|
||
# MON uno por zona (site-a, site-b, arbiter)
|
||
mon:
|
||
nodeAffinity:
|
||
requiredDuringSchedulingIgnoredDuringExecution:
|
||
nodeSelectorTerms:
|
||
- matchExpressions:
|
||
- key: topology.kubernetes.io/zone
|
||
operator: In
|
||
values: ["site-a","site-b","arbiter"]
|
||
topologySpreadConstraints:
|
||
- labelSelector:
|
||
matchLabels:
|
||
app: rook-ceph-mon
|
||
maxSkew: 1
|
||
topologyKey: topology.kubernetes.io/zone
|
||
whenUnsatisfiable: DoNotSchedule
|
||
|
||
security:
|
||
cephx:
|
||
csi: {}
|
||
daemon: {}
|
||
rbdMirrorPeer: {}
|
||
|
||
storage:
|
||
useAllDevices: false
|
||
nodes:
|
||
- name: srvfkvm01
|
||
devices:
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5bb177a1716, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5dc196bd3a7, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5f81b10f7ef, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d6151cca8afd, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d62f1e5e9699, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d64f204b2405, config: {deviceClass: ssd}}
|
||
- name: srvfkvm02
|
||
devices:
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127eef88828273, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127f879197de32, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128081a076ba0c, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128114a93e33b9, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94300301281a7b1fc151a, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128235ba79d801, config: {deviceClass: ssd}}
|
||
- name: srvfkvm03
|
||
devices:
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128aef3bb4e0ae, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b0e3d8bc1dc, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b2b3f446dd7, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b4440c2d027, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b5e42510c2a, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b7d442e592c, config: {deviceClass: ssd}}
|
||
- name: srvfkvm04
|
||
devices:
|
||
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c003012887ebfca6752, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c0030128896e360075f, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288ac038600d4, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288c62acb6efc, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288e456c6d441, config: {deviceClass: ssd}}
|
||
- { fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288f976534b4f, config: {deviceClass: ssd}}
|
||
```
|
||
|
||
Aplicar y verificar:
|
||
|
||
```bash
|
||
kubectl apply -f cluster/ceph-cluster.yaml
|
||
kubectl -n rook-ceph get pods
|
||
```
|
||
|
||
> **Nota**: los MON deberían quedar uno en `site-a`, otro en `site-b` y otro en `arbiter`; los MGR en `site-a` y `site-b`. Los OSDs solo en A/B.
|
||
|
||
---
|
||
|
||
## 6) Activar **Orchestrator** (backend Rook)
|
||
|
||
```bash
|
||
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook
|
||
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status
|
||
```
|
||
|
||
---
|
||
|
||
## 7) Pool **RBD** 2×2 por **zona** + StorageClass
|
||
|
||
`pools/ceph-blockpool-rbd.yaml`:
|
||
|
||
```yaml
|
||
apiVersion: ceph.rook.io/v1
|
||
kind: CephBlockPool
|
||
metadata:
|
||
name: rbd-2x2-sites
|
||
namespace: rook-ceph
|
||
spec:
|
||
deviceClass: ssd
|
||
failureDomain: zone
|
||
replicated:
|
||
size: 4 # 2 por site (A/B)
|
||
minSize: 2
|
||
replicasPerFailureDomain: 2
|
||
subFailureDomain: host
|
||
requireSafeReplicaSize: true
|
||
parameters:
|
||
pg_autoscale_mode: "on"
|
||
```
|
||
|
||
`storageclasses/rbd.yaml`:
|
||
|
||
```yaml
|
||
apiVersion: storage.k8s.io/v1
|
||
kind: StorageClass
|
||
metadata:
|
||
name: ceph-rbd
|
||
annotations:
|
||
storageclass.kubernetes.io/is-default-class: "true"
|
||
provisioner: rook-ceph.rbd.csi.ceph.com
|
||
parameters:
|
||
clusterID: rook-ceph
|
||
pool: rbd-2x2-sites
|
||
imageFormat: "2"
|
||
imageFeatures: layering
|
||
csi.storage.k8s.io/fstype: ext4
|
||
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
|
||
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
|
||
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
|
||
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
|
||
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
|
||
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
|
||
reclaimPolicy: Delete
|
||
allowVolumeExpansion: true
|
||
mountOptions: ["discard"]
|
||
```
|
||
|
||
Aplicar y comprobar:
|
||
|
||
```bash
|
||
kubectl apply -f pools/ceph-blockpool-rbd.yaml
|
||
kubectl apply -f storageclasses/rbd.yaml
|
||
|
||
# Verificaciones rápidas
|
||
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites size
|
||
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites min_size
|
||
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule dump rbd-2x2-sites -f json-pretty
|
||
```
|
||
|
||
> La regla CRUSH generada elige **zona** y luego **host** (2 réplicas por zona). Con OSDs solo en A/B, el árbitro **no** aloja datos.
|
||
|
||
---
|
||
|
||
## 8) Dashboard por **Ingress** (opcional)
|
||
|
||
`ingress/dashboard.yaml` (backend HTTP:7000):
|
||
|
||
```yaml
|
||
apiVersion: networking.k8s.io/v1
|
||
kind: Ingress
|
||
metadata:
|
||
name: ceph-dashboard
|
||
namespace: rook-ceph
|
||
annotations:
|
||
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
|
||
spec:
|
||
ingressClassName: nginx
|
||
rules:
|
||
- host: ceph.example.local
|
||
http:
|
||
paths:
|
||
- path: /
|
||
pathType: Prefix
|
||
backend:
|
||
service:
|
||
name: rook-ceph-mgr-dashboard
|
||
port:
|
||
number: 7000
|
||
```
|
||
|
||
Contraseña admin:
|
||
|
||
```bash
|
||
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d; echo
|
||
```
|
||
|
||
Crear usuario admin.c3s (el otro suele resetear la pass):
|
||
|
||
```bash
|
||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc \
|
||
'echo -n "Pozuelo12345" | ceph dashboard ac-user-create admin.c3s administrator -i - && ceph dashboard ac-user-list'
|
||
```
|
||
|
||
---
|
||
|
||
## 9) Prueba de StorageClass (PVC + Pod)
|
||
|
||
`tests/pvc.yaml`:
|
||
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: PersistentVolumeClaim
|
||
metadata:
|
||
name: test-rbd
|
||
spec:
|
||
accessModes: ["ReadWriteOnce"]
|
||
resources:
|
||
requests:
|
||
storage: 5Gi
|
||
storageClassName: ceph-rbd
|
||
```
|
||
|
||
`tests/pod.yaml`:
|
||
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: Pod
|
||
metadata:
|
||
name: rbd-tester
|
||
spec:
|
||
containers:
|
||
- name: app
|
||
image: busybox
|
||
command: ["sh","-c","sleep 36000"]
|
||
volumeMounts:
|
||
- mountPath: /data
|
||
name: vol
|
||
volumes:
|
||
- name: vol
|
||
persistentVolumeClaim:
|
||
claimName: test-rbd
|
||
```
|
||
|
||
```bash
|
||
kubectl apply -f tests/pvc.yaml
|
||
kubectl apply -f tests/pod.yaml
|
||
kubectl exec -it rbd-tester -- sh -c 'df -h /data && dd if=/dev/zero of=/data/test.bin bs=1M count=100 && ls -lh /data'
|
||
```
|
||
|
||
---
|
||
|
||
## 10) Guardar manifiestos exactos desde el clúster
|
||
|
||
```bash
|
||
# CephCluster “limpio” sin campos efímeros
|
||
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml --show-managed-fields=false \
|
||
| yq 'del(.metadata.creationTimestamp,.metadata.generation,.metadata.resourceVersion,.metadata.uid,.status)' \
|
||
> ceph-cluster-export.yaml
|
||
|
||
# Pool y StorageClass
|
||
kubectl -n rook-ceph get cephblockpool rbd-2x2-sites -o yaml > ceph-blockpool-export.yaml
|
||
kubectl get sc ceph-rbd -o yaml > storageclass-rbd-export.yaml
|
||
```
|
||
|
||
---
|
||
|
||
## 11) Troubleshooting breve
|
||
|
||
* **MON no se reprograma** tras borrar uno: el operador necesita que el **quórum** quede seguro. Revisa `rook-ceph-mon-endpoints`, `deployment/rook-ceph-mon-*` y `op-mon` en logs del operador.
|
||
* **OSDs detectados como HDD** vía HBA: puedes forzar `deviceClass: ssd` por disco (como en el `CephCluster`) o, ya desplegado, ajustar con `ceph osd crush set-device-class ssd osd.N`.
|
||
* **Dashboard “Orchestrator is not available”**:
|
||
|
||
```bash
|
||
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch set backend rook
|
||
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph orch status
|
||
```
|
||
|
||
---
|
||
|
||
### Fin
|
||
|
||
Con esto dispones de un despliegue Rook‑Ceph alineado con la realidad actual: 2 zonas de datos + árbitro, 3 MON (uno por zona), 2 MGR (A/B), OSDs solo en A/B, y un pool RBD con réplicas **2+2** por zona. ¡Listo para producción y ampliaciones futuras!
|