Rook/Ceph Distributed Storage

References:

  • https://rook.io/docs/rook/v1.6/ceph-quickstart.html

Preparing the cluster

Each node that will have an OSD must have at least one additional block device with no partition tables or LVM pvs.

For VMs, simply create an additional virtio block device and reboot.

The disk layout should look something like this:

$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda     252:0    0   20G  0 disk 
├─vda1  252:1    0 19.9G  0 part /
├─vda14 252:14   0    4M  0 part 
└─vda15 252:15   0  106M  0 part /boot/efi
vdb     252:16   0   20G  0 disk                  <---- The OSD will live here

In my lab, there are three compute/storage nodes, and one master node:

Node Name Roles Storage
kube-admin master, control plane 10G (root)
kube-0 worker, ceph 10G (root), 20G (OSD)
kube-1 worker, ceph 10G (root), 20G (OSD)
kube-2 worker, ceph 10G (root), 20G (OSD)

Install Rook

Clone the repo and find the examples directory.

git clone --single-branch --branch v1.6.1 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph

Create the base resource set.

kubectl create -f crds.yaml -f common.yaml -f operator.yaml

Check that the pod is running before proceeding:

$ kubectl -n rook-ceph get pod -w
NAME                                 READY   STATUS              RESTARTS   AGE
rook-ceph-operator-95f44b96c-6czkh   0/1     ContainerCreating   0          25s
rook-ceph-operator-95f44b96c-6czkh   1/1     Running             0          29s

Create the cluster

Apply the cluster manifest and prepare the cluster (MONs, OSDs, etc).

kubectl create -f ./cluster.yaml

This will take approx 10 minutes to complete. Be patient while it starts up.

kubectl get pods -n rook-ceph -o wide -w

You can tell if it has successfully installed if you see the output of lsblk change. Note the ceph_bluestore filesystem that now exists on /dev/vdb.

$ lsblk -f
...
vda
├─vda1  ext4           cloudimg-rootfs 1943530c-1f82-498b-8b82-d1b474ba22c1   12.4G    35% /
├─vda14
└─vda15 vfat           UEFI            53A4-477C                              95.2M     9% /boot/efi
vdb     ceph_bluestore

When complete, the pods should look a little like this:

$ kubectl -n rook-ceph get pod 
NAME                                               READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-m6lpq                             3/3     Running     0          5m7s
csi-cephfsplugin-ngc8p                             3/3     Running     0          5m7s
csi-cephfsplugin-provisioner-58d557d5-fr724        6/6     Running     0          5m7s
csi-cephfsplugin-provisioner-58d557d5-g7r6g        6/6     Running     0          5m7s
csi-cephfsplugin-sv2ch                             3/3     Running     0          5m7s
csi-rbdplugin-4kfsq                                3/3     Running     0          5m8s
csi-rbdplugin-7f9f9                                3/3     Running     0          5m8s
csi-rbdplugin-ft7s5                                3/3     Running     0          5m8s
csi-rbdplugin-provisioner-7bcb95bc5d-gmpp6         6/6     Running     0          5m8s
csi-rbdplugin-provisioner-7bcb95bc5d-txcxg         6/6     Running     0          5m8s
rook-ceph-crashcollector-kube-0-74fc9db597-v67zh   1/1     Running     0          2m31s
rook-ceph-crashcollector-kube-1-74bf796976-zq4rk   1/1     Running     0          2m57s
rook-ceph-crashcollector-kube-2-6fbb896bf5-s9b7t   1/1     Running     0          3m14s
rook-ceph-mgr-a-848c65cc59-whh2w                   1/1     Running     0          2m46s
rook-ceph-mon-a-758bb5d598-tlc2m                   1/1     Running     0          4m25s
rook-ceph-mon-b-c446bb45d-dm8ng                    1/1     Running     0          4m14s
rook-ceph-mon-c-6485cb8bfc-nt6mr                   1/1     Running     0          2m57s
rook-ceph-operator-95f44b96c-6czkh                 1/1     Running     0          24m
rook-ceph-osd-0-69558cb546-f4tks                   1/1     Running     0          2m31s
rook-ceph-osd-1-79d7fdc4d8-zm7v9                   1/1     Running     0          2m31s
rook-ceph-osd-2-854fb46d86-7bzbr                   1/1     Running     0          2m31s
rook-ceph-osd-prepare-kube-0-bjqdj                 0/1     Completed   0          2m9s
rook-ceph-osd-prepare-kube-1-lsvlw                 0/1     Completed   0          2m7s
rook-ceph-osd-prepare-kube-2-2pkdb                 0/1     Completed   0          2m5s

Install ceph tools

Create a manifest file in ~/

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-tools
  namespace: rook-ceph
  labels:
    app: rook-ceph-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rook-ceph-tools
  template:
    metadata:
      labels:
        app: rook-ceph-tools
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: rook-ceph-tools
        image: rook/ceph:v1.6.1
        command: ["/tini"]
        args: ["-g", "--", "/usr/local/bin/toolbox.sh"]
        imagePullPolicy: IfNotPresent
        env:
          - name: ROOK_CEPH_USERNAME
            valueFrom:
              secretKeyRef:
                name: rook-ceph-mon
                key: ceph-username
          - name: ROOK_CEPH_SECRET
            valueFrom:
              secretKeyRef:
                name: rook-ceph-mon
                key: ceph-secret
        volumeMounts:
          - mountPath: /etc/ceph
            name: ceph-config
          - name: mon-endpoint-volume
            mountPath: /etc/rook
      volumes:
        - name: mon-endpoint-volume
          configMap:
            name: rook-ceph-mon-endpoints
            items:
            - key: data
              path: mon-endpoints
        - name: ceph-config
          emptyDir: {}
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 5

Apply the manifest to create the deployment:

kubectl create -f ./ceph-toolbox.yaml

Create a bash alias to exec the container in ~/.bashrc

echo 'alias ceph-tools="kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash"' >> .bashrc
source .bashrc

Then, execute ceph-tools to enter the container. You may then check the status, it should be reasonably healthy:

[root@rook-ceph-tools-57787758df-52fld /]# ceph osd status
ID  HOST     USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  kube-0  1026M  18.9G      0        0       0        0   exists,up  
 1  kube-1  1026M  18.9G      0        0       0        0   exists,up  
 2  kube-2  1026M  18.9G      0        0       0        0   exists,up  

Set up storage

Create the StorageClass, Filesystem (NFS), and Object Storage (S3) resource types. I will use EC (Erasure Coded) pools for better efficiency.

StorageClass

Apply the manifest:

kubectl apply -f cluster/examples/kubernetes/ceph/csi/rbd/storageclass-ec.yaml

Check if the storageClass exists:

$ kubectl get storageclass
NAME              PROVISIONER                  RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com   Delete          Immediate           true                   16m

Object (S3)

Apply the manifest:

kubectl apply -f cluster/examples/kubernetes/ceph/object-ec.yaml

Check the pods:

$ kubectl -n rook-ceph get pod -l app=rook-ceph-rgw
NAME                                       READY   STATUS    RESTARTS   AGE
rook-ceph-rgw-my-store-a-85d646566-4lpwq   1/1     Running   0          10m

Filesystem (NFS)

Apply the manifest:

kubectl apply -f cluster/examples/kubernetes/ceph/filesystem-ec.yaml

Check the pods:

$ kubectl -n rook-ceph get pod -l app=rook-ceph-mds
NAME                                       READY   STATUS    RESTARTS   AGE
rook-ceph-mds-myfs-ec-a-7dc5c98b9c-zstpb   1/1     Running   0          8m18s
rook-ceph-mds-myfs-ec-b-7578bd7f7c-x9qrl   1/1     Running   0          8m17s

Finishing up

If they are applied correctly, the output of ceph-df will have more entries for the created pools:

[root@rook-ceph-tools-57787758df-52fld /]# ceph df
--- RAW STORAGE ---
CLASS  SIZE    AVAIL   USED    RAW USED  %RAW USED
hdd    60 GiB  57 GiB  85 MiB   3.1 GiB       5.14
TOTAL  60 GiB  57 GiB  85 MiB   3.1 GiB       5.14

--- POOLS ---
POOL                         ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics         1    1      0 B        0      0 B      0     18 GiB
replicated-metadata-pool      2   32      0 B        0      0 B      0     27 GiB
ec-data-pool                  3   32      0 B        0      0 B      0     36 GiB
my-store.rgw.control          4    8      0 B        8      0 B      0     18 GiB
my-store.rgw.meta             5    8  1.7 KiB        7  1.1 MiB      0     18 GiB
my-store.rgw.log              6    8  3.5 KiB      178  6.2 MiB   0.01     18 GiB
my-store.rgw.buckets.index    7    8      0 B       11      0 B      0     18 GiB
my-store.rgw.buckets.non-ec   8    8      0 B        0      0 B      0     18 GiB
.rgw.root                     9    8  3.9 KiB       16  2.8 MiB      0     18 GiB
my-store.rgw.buckets.data    10   32      0 B        0      0 B      0     36 GiB
myfs-ec-metadata             11   32  2.2 KiB       22  1.5 MiB      0     18 GiB
myfs-ec-data0                12   32      0 B        0      0 B      0     36 GiB

To test the system, create a mysql/wordpress install.

kubectl apply -f  cluster/examples/kubernetes/mysql.yaml
kubectl apply -f  cluster/examples/kubernetes/wordpress.yaml

A PersistentVolumeClaim will be created on the Ceph storage pools:

$ kubectl get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
mysql-pv-claim   Bound    pvc-82ae2e99-8ac1-4d2d-9e0d-a9ea6f796985   20Gi       RWO            rook-ceph-block   73s
wp-pv-claim      Bound    pvc-d9ddabd7-ec55-4936-9a6e-b48ef221fdd0   20Gi       RWO            rook-ceph-block   44s

It will also create a service which is accessible outside the cluster:

$ kubectl get service wordpress
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
wordpress   LoadBalancer   10.98.174.239   192.168.122.241   80:32014/TCP   85s

This should increase the usage of the OSDs:

[root@rook-ceph-tools-57787758df-52fld /]# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE    RAW USE  DATA     OMAP  META   AVAIL   %USE  VAR   PGS  STATUS
 0    hdd  0.01949   1.00000  20 GiB  1.3 GiB  262 MiB   0 B  1 GiB  19 GiB  6.28  1.00  203      up
 1    hdd  0.01949   1.00000  20 GiB  1.3 GiB  262 MiB   0 B  1 GiB  19 GiB  6.28  1.00  194      up
 2    hdd  0.01949   1.00000  20 GiB  1.3 GiB  262 MiB   0 B  1 GiB  19 GiB  6.28  1.00  198      up
                       TOTAL  60 GiB  3.8 GiB  785 MiB   0 B  3 GiB  56 GiB  6.28                   
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

The Wordpress & Mysql components can be deleted after this.