Actualizar rook/cephrook.md
This commit is contained in:
447
rook/cephrook.md
Normal file
447
rook/cephrook.md
Normal file
@@ -0,0 +1,447 @@
|
||||
# Despliegue de Rook‑Ceph en clúster **Kubernetes** (SUSE) con discos locales (Bluestore)
|
||||
|
||||
> Guía actualizada para un clúster **Kubernetes** (no K3s) en SUSE, con 4 nodos iniciales y **futura ampliación a stretch** con un quinto nodo **árbitro**. Discos locales (RAID/HBA), red de almacenamiento dedicada **VLAN 30 – 192.168.3.0/24**, y exposición del dashboard **vía Ingress NGINX** con TLS.
|
||||
|
||||
---
|
||||
|
||||
## 1) Requisitos previos
|
||||
|
||||
* 4 nodos Kubernetes operativos: `srvfkvm01`, `srvfkvm02`, `srvfkvm03`, `srvfkvm04` (control-plane o mixtos)
|
||||
* Cada nodo con **6 discos** dedicados (\~894 GB) para Ceph
|
||||
* Acceso a Internet desde los nodos
|
||||
* Red de almacenamiento dedicada **VLAN 30 – 192.168.3.0/24** (Ceph public/cluster)
|
||||
* `kubectl` configurado y permisos de admin
|
||||
|
||||
> **Nota de versiones**: ejemplos probados con Rook 1.17.x y Ceph v19.x (Squid) o v18.x (Reef). En los manifiestos se usa una imagen estable.
|
||||
|
||||
---
|
||||
|
||||
## 2) Preparar discos en SUSE (solo discos de datos)
|
||||
|
||||
Instala utilidades necesarias en **cada nodo**:
|
||||
|
||||
```bash
|
||||
sudo zypper -n install gdisk util-linux
|
||||
```
|
||||
|
||||
Limpieza segura **solo** de `sdb…sdg` (ajusta si difiere):
|
||||
|
||||
```bash
|
||||
set -euo pipefail
|
||||
DISKS=(sdb sdc sdd sde sdf sdg)
|
||||
|
||||
for d in "${DISKS[@]}"; do
|
||||
echo ">>> /dev/$d"
|
||||
sudo sgdisk --zap-all /dev/$d || true # limpia GPT/MBR
|
||||
sudo wipefs -a /dev/$d || true # borra firmas FS/LVM
|
||||
sudo blkdiscard -f /dev/$d || \ # TRIM (si soporta)
|
||||
sudo dd if=/dev/zero of=/dev/$d bs=1M count=10 oflag=direct,dsync
|
||||
done
|
||||
```
|
||||
|
||||
Obtén las rutas **persistentes** *by‑id* para cada disco (en cada nodo):
|
||||
|
||||
```bash
|
||||
for d in sdb sdc sdd sde sdf sdg; do
|
||||
echo "=== $HOSTNAME -> $d ==="
|
||||
ls -l /dev/disk/by-id/ | awk -v d="$d" '$NF ~ ("/" d "$") {print "/dev/disk/by-id/"$9}'
|
||||
done
|
||||
```
|
||||
|
||||
> **Usa siempre** `/dev/disk/by-id/...` en los manifiestos (campo `fullpath:`) para evitar cambios de letra.
|
||||
|
||||
---
|
||||
|
||||
## 3) Etiquetado de nodos por **site**
|
||||
|
||||
Vamos a distribuir por zonas lógicas desde el inicio (A/B). El árbitro llegará después.
|
||||
|
||||
```bash
|
||||
# SITE A
|
||||
kubectl label node srvfkvm01 topology.kubernetes.io/zone=site-a --overwrite
|
||||
kubectl label node srvfkvm02 topology.kubernetes.io/zone=site-a --overwrite
|
||||
|
||||
# SITE B
|
||||
kubectl label node srvfkvm03 topology.kubernetes.io/zone=site-b --overwrite
|
||||
kubectl label node srvfkvm04 topology.kubernetes.io/zone=site-b --overwrite
|
||||
```
|
||||
|
||||
> Cuando exista el nodo **árbitro**, se etiquetará como `topology.kubernetes.io/zone=arbiter`.
|
||||
|
||||
---
|
||||
|
||||
## 4) Instalar Rook (CRDs, comunes y operador)
|
||||
|
||||
```bash
|
||||
kubectl create namespace rook-ceph || true
|
||||
|
||||
# Clonar repo oficial (opcional para tener toolbox/ejemplos)
|
||||
git clone https://github.com/rook/rook.git
|
||||
cd rook/deploy/examples
|
||||
|
||||
kubectl apply -f crds.yaml -f common.yaml -f operator.yaml
|
||||
```
|
||||
|
||||
Comprueba el operador:
|
||||
|
||||
```bash
|
||||
kubectl -n rook-ceph get pods | grep operator
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5) CephCluster – 4 nodos, discos *by‑id*, red de storage (VLAN 30)
|
||||
|
||||
Archivo `cluster/ceph-cluster.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: ceph.rook.io/v1
|
||||
kind: CephCluster
|
||||
metadata:
|
||||
name: rook-ceph
|
||||
namespace: rook-ceph
|
||||
spec:
|
||||
cephVersion:
|
||||
image: quay.io/ceph/ceph:v19.2.3 # estable (puedes usar v18.2.x si prefieres)
|
||||
dataDirHostPath: /var/lib/rook
|
||||
|
||||
# Red: usamos hostNetworking y restringimos a VLAN de storage
|
||||
network:
|
||||
provider: host
|
||||
addressRanges:
|
||||
public:
|
||||
- "192.168.3.0/24"
|
||||
cluster:
|
||||
- "192.168.3.0/24"
|
||||
|
||||
mon:
|
||||
count: 3
|
||||
allowMultiplePerNode: false
|
||||
|
||||
dashboard:
|
||||
enabled: true
|
||||
|
||||
# No queremos OSDs en el futuro nodo árbitro
|
||||
placement:
|
||||
osd:
|
||||
nodeAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
nodeSelectorTerms:
|
||||
- matchExpressions:
|
||||
- key: topology.kubernetes.io/zone
|
||||
operator: In
|
||||
values: ["site-a", "site-b"]
|
||||
|
||||
storage:
|
||||
useAllNodes: false
|
||||
useAllDevices: false
|
||||
nodes:
|
||||
- name: srvfkvm01
|
||||
devices:
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5bb177a1716
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5dc196bd3a7
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d5f81b10f7ef
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d6151cca8afd
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d62f1e5e9699
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94b003012d64f204b2405
|
||||
- name: srvfkvm02
|
||||
devices:
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127eef88828273
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030127f879197de32
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128081a076ba0c
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128114a93e33b9
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d94300301281a7b1fc151a
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9430030128235ba79d801
|
||||
- name: srvfkvm03
|
||||
devices:
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128aef3bb4e0ae
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b0e3d8bc1dc
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b2b3f446dd7
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b4440c2d027
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b5e42510c2a
|
||||
- fullpath: /dev/disk/by-id/wwn-0x64cd98f036d9510030128b7d442e592c
|
||||
- name: srvfkvm04
|
||||
devices:
|
||||
- fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c003012887ebfca6752
|
||||
- fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c0030128896e360075f
|
||||
- fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288ac038600d4
|
||||
- fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288c62acb6efc
|
||||
- fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288e456c6d441
|
||||
- fullpath: /dev/disk/by-id/wwn-0x6ec2a72037894c00301288f976534b4f
|
||||
```
|
||||
|
||||
Aplicar y verificar:
|
||||
|
||||
```bash
|
||||
kubectl apply -f cluster/ceph-cluster.yaml
|
||||
kubectl -n rook-ceph get pods
|
||||
```
|
||||
|
||||
> Instala el **toolbox** para diagnósticos: `kubectl -n rook-ceph apply -f rook/deploy/examples/toolbox.yaml`
|
||||
|
||||
---
|
||||
|
||||
## 6) Pool RBD inicial (replica **4** sobre **hosts**) + StorageClass
|
||||
|
||||
> Con 2 sites (A/B) y **sin** árbitro, **no** uses `failureDomain: zone` con `size: 4` o las PGs quedarán *undersized*. Empezamos con **`host`** y, cuando activemos **stretch**, pasaremos a `zone`.
|
||||
|
||||
`pools/ceph-blockpool-rbd.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: ceph.rook.io/v1
|
||||
kind: CephBlockPool
|
||||
metadata:
|
||||
name: rbd-2x2-sites
|
||||
namespace: rook-ceph
|
||||
spec:
|
||||
failureDomain: host
|
||||
replicated:
|
||||
size: 4
|
||||
```
|
||||
|
||||
`storageclasses/rbd.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: ceph-rbd
|
||||
annotations:
|
||||
storageclass.kubernetes.io/is-default-class: "true"
|
||||
provisioner: rook-ceph.rbd.csi.ceph.com
|
||||
parameters:
|
||||
clusterID: rook-ceph
|
||||
pool: rbd-2x2-sites
|
||||
imageFormat: "2"
|
||||
imageFeatures: layering
|
||||
csi.storage.k8s.io/fstype: ext4
|
||||
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
|
||||
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
|
||||
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
|
||||
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
|
||||
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
|
||||
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
|
||||
reclaimPolicy: Delete
|
||||
allowVolumeExpansion: true
|
||||
mountOptions: ["discard"]
|
||||
```
|
||||
|
||||
Aplicar:
|
||||
|
||||
```bash
|
||||
kubectl apply -f pools/ceph-blockpool-rbd.yaml
|
||||
kubectl apply -f storageclasses/rbd.yaml
|
||||
kubectl get sc
|
||||
```
|
||||
|
||||
> Si creaste el pool inicialmente con `failureDomain: zone` y ves `active+undersized`, crea y asigna una **CRUSH rule** a host:
|
||||
>
|
||||
> ```bash
|
||||
> kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc '
|
||||
> set -e
|
||||
> ceph osd crush rule create-replicated rbd-4x-host default host || true
|
||||
> ceph osd pool set rbd-2x2-sites crush_rule rbd-4x-host
|
||||
> ceph osd pool get rbd-2x2-sites crush_rule
|
||||
> '
|
||||
> ```
|
||||
|
||||
---
|
||||
|
||||
## 7) Marcar OSDs como **SSD** (si Ceph los detecta como HDD por el HBA)
|
||||
|
||||
```bash
|
||||
# Desde el toolbox
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc '
|
||||
for id in $(ceph osd ls); do ceph osd crush rm-device-class osd.$id || true; done
|
||||
for id in $(ceph osd ls); do ceph osd crush set-device-class ssd osd.$id; done
|
||||
ceph osd tree | egrep "zone|host|osd."
|
||||
'
|
||||
```
|
||||
|
||||
> Si más adelante creas un pool **solo‑SSD**, añade `spec.deviceClass: ssd` al `CephBlockPool`.
|
||||
|
||||
---
|
||||
|
||||
## 8) Dashboard por **Ingress** (NGINX) en `ceph.c2et.net`
|
||||
|
||||
> El dashboard del MGR escucha por defecto en **HTTP 7000**. Hacemos **TLS en el Ingress** (cert‑manager) y **HTTP** hacia el backend.
|
||||
|
||||
`ingress/dashboard.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: ceph-dashboard
|
||||
namespace: rook-ceph
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
|
||||
spec:
|
||||
ingressClassName: nginx
|
||||
tls:
|
||||
- hosts: ["ceph.c2et.net"]
|
||||
secretName: ceph-dashboard-tls
|
||||
rules:
|
||||
- host: ceph.c2et.net
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: rook-ceph-mgr-dashboard
|
||||
port:
|
||||
number: 7000
|
||||
```
|
||||
|
||||
Credenciales:
|
||||
|
||||
```bash
|
||||
# Usuario por defecto
|
||||
admin
|
||||
|
||||
# Contraseña generada
|
||||
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{.data.password}" | base64 -d; echo
|
||||
|
||||
# Cambiar contraseña (ejemplo)
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc \
|
||||
'echo -n "MiNuevaPass" | ceph dashboard ac-user-set-password admin -i -'
|
||||
```
|
||||
|
||||
> Si prefieres **HTTPS 8443** también hacia el backend, habilita TLS en el dashboard de Ceph y cambia el Ingress a `backend-protocol: "HTTPS"` y puerto `8443` (y opcionalmente `proxy-ssl-verify: "off"`).
|
||||
|
||||
---
|
||||
|
||||
## 9) Prueba rápida de PVC
|
||||
|
||||
`tests/pvc-test.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: test-rbd
|
||||
spec:
|
||||
accessModes: ["ReadWriteOnce"]
|
||||
resources:
|
||||
requests:
|
||||
storage: 5Gi
|
||||
storageClassName: ceph-rbd
|
||||
```
|
||||
|
||||
`tests/pod-test.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: rbd-tester
|
||||
spec:
|
||||
containers:
|
||||
- name: app
|
||||
image: busybox
|
||||
command: ["sh","-c","sleep 36000"]
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: vol
|
||||
volumes:
|
||||
- name: vol
|
||||
persistentVolumeClaim:
|
||||
claimName: test-rbd
|
||||
```
|
||||
|
||||
Aplicar y verificar:
|
||||
|
||||
```bash
|
||||
kubectl apply -f tests/pvc-test.yaml
|
||||
kubectl apply -f tests/pod-test.yaml
|
||||
kubectl exec -it rbd-tester -- sh -c 'df -h /data && dd if=/dev/zero of=/data/test.bin bs=1M count=100 && ls -lh /data'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10) **Ampliación futura**: modo **Stretch** con **árbitro** (2 sites + arbiter)
|
||||
|
||||
Objetivo: supervivencia a la caída completa de un site y distribución **2+2** de réplicas entre `site-a` y `site-b`.
|
||||
|
||||
1. **Añade el nodo árbitro** y etiqueta:
|
||||
|
||||
```bash
|
||||
kubectl label node <NODO_ARBITRO> topology.kubernetes.io/zone=arbiter --overwrite
|
||||
```
|
||||
|
||||
2. **Actualiza el CephCluster** a stretch (5 MON):
|
||||
|
||||
```yaml
|
||||
# parche del CephCluster (fragmento spec)
|
||||
mon:
|
||||
count: 5
|
||||
allowMultiplePerNode: false
|
||||
stretchCluster:
|
||||
failureDomainLabel: topology.kubernetes.io/zone
|
||||
subFailureDomain: host
|
||||
zones:
|
||||
- name: arbiter
|
||||
arbiter: true
|
||||
- name: site-a
|
||||
- name: site-b
|
||||
```
|
||||
|
||||
> Mantén `placement.osd` restringido a `site-a`/`site-b` para no crear OSDs en el árbitro.
|
||||
|
||||
3. **(Opcional recomendado)** Cambia el `CephBlockPool` para que el *failure domain* vuelva a **`zone`** con `size: 4` (2 por zona). Si prefieres asegurar la regla, crea una CRUSH rule específica y asígnala al pool.
|
||||
|
||||
```bash
|
||||
# Ejemplo: regla por zona
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc '
|
||||
set -e
|
||||
# Crea regla "rbd-4x-zone" (elige leaves de tipo zone)
|
||||
ceph osd crush rule create-replicated rbd-4x-zone default zone || true
|
||||
# Asigna la regla al pool y ajusta size
|
||||
ceph osd pool set rbd-2x2-sites crush_rule rbd-4x-zone
|
||||
ceph osd pool set rbd-2x2-sites size 4
|
||||
ceph osd pool get rbd-2x2-sites crush_rule
|
||||
'
|
||||
```
|
||||
|
||||
> Tras el cambio a `zone`, Ceph reubica PGs para cumplir **2+2** entre `site-a` y `site-b`. Hazlo en ventana si ya hay mucho dato.
|
||||
|
||||
---
|
||||
|
||||
## 11) Troubleshooting rápido
|
||||
|
||||
* **PGs `active+undersized` con pool size=4**: ocurre si la regla CRUSH elige `zone` y solo hay 2 zonas (sin stretch). Solución: usa `failureDomain: host` o asigna una regla a `host` (sección 6) hasta activar stretch.
|
||||
* **Ingress 503** al abrir el dashboard: el Service `rook-ceph-mgr-dashboard` usa **puerto 7000** (HTTP). Ajusta Ingress a `backend-protocol: "HTTP"` y puerto `7000`.
|
||||
* **Cert TLS no emite**: revisa ClusterIssuer, DNS público hacia el Ingress y que el solver HTTP‑01 use `class: nginx`. Evita redirecciones que interfieran `/.well-known/acme-challenge/`.
|
||||
|
||||
---
|
||||
|
||||
## 12) Apéndice – Comandos útiles
|
||||
|
||||
Estado general:
|
||||
|
||||
```bash
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df
|
||||
```
|
||||
|
||||
Ver pools y reglas:
|
||||
|
||||
```bash
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool ls detail
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool get rbd-2x2-sites crush_rule
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd crush rule dump rbd-4x-host
|
||||
```
|
||||
|
||||
Dashboard:
|
||||
|
||||
```bash
|
||||
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{.data.password}" | base64 -d; echo
|
||||
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash -lc 'echo -n "NuevaPass" | ceph dashboard ac-user-set-password admin -i -'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
> **Resumen**: despliegas Rook‑Ceph con red de almacenamiento dedicada, discos por **by‑id**, pool RBD **size 4** sobre **host** para evitar PGs undersized sin árbitro, dashboard por **Ingress** (TLS en NGINX, backend HTTP:7000) y, cuando añadas el **árbitro**, pasas el clúster a **stretch** y el pool a **`failureDomain: zone`** con **2+2** por site.
|
||||
Reference in New Issue
Block a user