Shared Filesystem

A shared filesystem can be mounted with read/write permission from multiple pods. This may be useful for applications which can be clustered using a shared filesystem.

This example runs a shared filesystem for the kube-registry.

Prerequisites

This guide assumes you have created a Rook cluster as explained in the main Kubernetes guide

Multiple Filesystems Not Supported

By default only one shared filesystem can be created with Rook. Multiple filesystem support in Ceph is still considered experimental and can be enabled with the environment variable ROOK_ALLOW_MULTIPLE_FILESYSTEMS defined in operator.yaml.

Please refer to cephfs experimental features page for more information.

Create the Filesystem

Create the filesystem by specifying the desired settings for the metadata pool, data pools, and metadata server in the CephFilesystem CRD. In this example we create the metadata pool with replication of three and a single data pool with replication of three. For more options, see the documentation on creating shared filesystems.

Save this shared filesystem definition as filesystem.yaml:

  1. apiVersion: ceph.rook.io/v1
  2. kind: CephFilesystem
  3. metadata:
  4. name: myfs
  5. namespace: rook-ceph
  6. spec:
  7. metadataPool:
  8. replicated:
  9. size: 3
  10. dataPools:
  11. - replicated:
  12. size: 3
  13. preservePoolsOnDelete: true
  14. metadataServer:
  15. activeCount: 1
  16. activeStandby: true

The Rook operator will create all the pools and other resources necessary to start the service. This may take a minute to complete.

  1. ## Create the filesystem
  2. $ kubectl create -f filesystem.yaml
  3. [...]
  4. ## To confirm the filesystem is configured, wait for the mds pods to start
  5. $ kubectl -n rook-ceph get pod -l app=rook-ceph-mds
  6. NAME READY STATUS RESTARTS AGE
  7. rook-ceph-mds-myfs-7d59fdfcf4-h8kw9 1/1 Running 0 12s
  8. rook-ceph-mds-myfs-7d59fdfcf4-kgkjp 1/1 Running 0 12s

To see detailed status of the filesystem, start and connect to the Rook toolbox. A new line will be shown with ceph status for the mds service. In this example, there is one active instance of MDS which is up, with one MDS instance in standby-replay mode in case of failover.

  1. $ ceph status
  2. ...
  3. services:
  4. mds: myfs-1/1/1 up {[myfs:0]=mzw58b=up:active}, 1 up:standby-replay

Provision Storage

Before Rook can start provisioning storage, a StorageClass needs to be created based on the filesystem. This is needed for Kubernetes to interoperate with the CSI driver to create persistent volumes.

NOTE: This example uses the CSI driver, which is the preferred driver going forward for K8s 1.13 and newer. Examples are found in the CSI CephFS directory. For an example of a volume using the flex driver (required for K8s 1.12 and earlier), see the Flex Driver section below.

Save this storage class definition as storageclass.yaml:

  1. apiVersion: storage.k8s.io/v1
  2. kind: StorageClass
  3. metadata:
  4. name: rook-cephfs
  5. # Change "rook-ceph" provisioner prefix to match the operator namespace if needed
  6. provisioner: rook-ceph.cephfs.csi.ceph.com
  7. parameters:
  8. # clusterID is the namespace where operator is deployed.
  9. clusterID: rook-ceph
  10. # CephFS filesystem name into which the volume shall be created
  11. fsName: myfs
  12. # Ceph pool into which the volume shall be created
  13. # Required for provisionVolume: "true"
  14. pool: myfs-data0
  15. # Root path of an existing CephFS volume
  16. # Required for provisionVolume: "false"
  17. # rootPath: /absolute/path
  18. # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  19. # in the same namespace as the cluster.
  20. csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  21. csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  22. csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  23. csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  24. csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  25. csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  26. reclaimPolicy: Delete

If you’ve deployed the Rook operator in a namespace other than “rook-ceph” as is common change the prefix in the provisioner to match the namespace you used. For example, if the Rook operator is running in “rook-op” the provisioner value should be “rook-op.rbd.csi.ceph.com”.

Create the storage class.

  1. kubectl create -f cluster/examples/kubernetes/ceph/csi/cephfs/storageclass.yaml

Quotas

IMPORTANT: The CephFS CSI driver uses quotas to enforce the PVC size requested. Only newer kernels support CephFS quotas (kernel version of at least 4.17). If you require quotas to be enforced and the kernel driver does not support it, you can disable the kernel driver and use the FUSE client. This can be done by setting CSI_FORCE_CEPHFS_KERNEL_CLIENT: false in the operator deployment (operator.yaml). However, it is important to know that when the FUSE client is enabled, there is an issue that during upgrade the application pods will be disconnected from the mount and will need to be restarted. See the upgrade guide for more details.

Consume the Shared Filesystem: K8s Registry Sample

As an example, we will start the kube-registry pod with the shared filesystem as the backing store. Save the following spec as kube-registry.yaml:

  1. apiVersion: v1
  2. kind: PersistentVolumeClaim
  3. metadata:
  4. name: cephfs-pvc
  5. spec:
  6. accessModes:
  7. - ReadWriteMany
  8. resources:
  9. requests:
  10. storage: 1Gi
  11. storageClassName: rook-cephfs
  12. ---
  13. apiVersion: apps/v1
  14. kind: Deployment
  15. metadata:
  16. name: kube-registry
  17. namespace: kube-system
  18. labels:
  19. k8s-app: kube-registry
  20. kubernetes.io/cluster-service: "true"
  21. spec:
  22. replicas: 3
  23. selector:
  24. matchLabels:
  25. k8s-app: kube-registry
  26. template:
  27. metadata:
  28. labels:
  29. k8s-app: kube-registry
  30. kubernetes.io/cluster-service: "true"
  31. spec:
  32. containers:
  33. - name: registry
  34. image: registry:2
  35. imagePullPolicy: Always
  36. resources:
  37. limits:
  38. cpu: 100m
  39. memory: 100Mi
  40. env:
  41. # Configuration reference: https://docs.docker.com/registry/configuration/
  42. - name: REGISTRY_HTTP_ADDR
  43. value: :5000
  44. - name: REGISTRY_HTTP_SECRET
  45. value: "Ple4seCh4ngeThisN0tAVerySecretV4lue"
  46. - name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
  47. value: /var/lib/registry
  48. volumeMounts:
  49. - name: image-store
  50. mountPath: /var/lib/registry
  51. ports:
  52. - containerPort: 5000
  53. name: registry
  54. protocol: TCP
  55. livenessProbe:
  56. httpGet:
  57. path: /
  58. port: registry
  59. readinessProbe:
  60. httpGet:
  61. path: /
  62. port: registry
  63. volumes:
  64. - name: image-store
  65. persistentVolumeClaim:
  66. claimName: cephfs-pvc
  67. readOnly: false

Create the Kube registry deployment:

  1. kubectl create -f cluster/examples/kubernetes/ceph/csi/cephfs/kube-registry.yaml

You now have a docker registry which is HA with persistent storage.

Kernel Version Requirement

If the Rook cluster has more than one filesystem and the application pod is scheduled to a node with kernel version older than 4.7, inconsistent results may arise since kernels older than 4.7 do not support specifying filesystem namespaces.

Consume the Shared Filesystem: Toolbox

Once you have pushed an image to the registry (see the instructions to expose and use the kube-registry), verify that kube-registry is using the filesystem that was configured above by mounting the shared filesystem in the toolbox pod. See the Direct Filesystem topic for more details.

Teardown

To clean up all the artifacts created by the filesystem demo:

  1. kubectl delete -f kube-registry.yaml

To delete the filesystem components and backing data, delete the Filesystem CRD.

WARNING: Data will be deleted if preservePoolsOnDelete=false.

  1. kubectl -n rook-ceph delete cephfilesystem myfs

Note: If the “preservePoolsOnDelete” filesystem attribute is set to true, the above command won’t delete the pools. Creating again the filesystem with the same CRD will reuse again the previous pools.

Flex Driver

To create a volume based on the flex driver instead of the CSI driver, see the kube-registry.yaml example manifest or refer to the complete flow in the Rook v1.0 Shared Filesystem documentation.

Advanced Example: Erasure Coded Filesystem

The Ceph filesystem example can be found here: Ceph Shared Filesystem - Samples - Erasure Coded.