Persistent Storage Using Ceph Rados Block Device (RBD)

Overview

OKD clusters can be provisioned with persistent storage using Ceph RBD.

Persistent volumes (PVs) and persistent volume claims (PVCs) can share volumes across a single project. While the Ceph RBD-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.

This topic presumes some familiarity with OKD and Ceph RBD. See the Persistent Storage concept topic for details on the OKD persistent volume (PV) framework in general.

Project and namespace are used interchangeably throughout this document. See Projects and Users for details on the relationship.

High-availability of storage in the infrastructure is left to the underlying storage provider.

Provisioning

To provision Ceph volumes, the following are required:

  • An existing storage device in your underlying infrastructure.

  • The Ceph key to be used in an OKD secret object.

  • The Ceph image name.

  • The file system type on top of the block storage (e.g., ext4).

  • ceph-common installed on each schedulable OKD node in your cluster:

    1. # yum install ceph-common

Creating the Ceph Secret

Define the authorization key in a secret configuration, which is then converted to base64 for use by OKD.

In order to use Ceph storage to back a persistent volume, the secret must be created in the same project as the PVC and pod. The secret cannot simply be in the default project.

  1. Run ceph auth get-key on a Ceph MON node to display the key value for the client.admin user:

    1. apiVersion: v1
    2. kind: Secret
    3. metadata:
    4. name: ceph-secret
    5. data:
    6. key: QVFBOFF2SlZheUJQRVJBQWgvS2cwT1laQUhPQno3akZwekxxdGc9PQ==
    7. type: kubernetes.io/rbd
  2. Save the secret definition to a file, for example ceph-secret.yaml, then create the secret:

    1. $ oc create -f ceph-secret.yaml
  3. Verify that the secret was created:

    1. # oc get secret ceph-secret
    2. NAME TYPE DATA AGE
    3. ceph-secret kubernetes.io/rbd 1 23d

Creating the Persistent Volume

Developers request Ceph RBD storage by referencing either a PVC, or the Gluster volume plug-in directly in the **volumes** section of a pod specification. A PVC exists only in the user’s namespace and can be referenced only by pods within that same namespace. Any attempt to access a PV from a different namespace causes the pod to fail.

  1. Define the PV in an object definition before creating it in OKD:

    Example 1. Persistent Volume Object Definition Using Ceph RBD

    1. apiVersion: v1
    2. kind: PersistentVolume
    3. metadata:
    4. name: ceph-pv (1)
    5. spec:
    6. capacity:
    7. storage: 2Gi (2)
    8. accessModes:
    9. - ReadWriteOnce (3)
    10. rbd: (4)
    11. monitors: (5)
    12. - 192.168.122.133:6789
    13. pool: rbd
    14. image: ceph-image
    15. user: admin
    16. secretRef:
    17. name: ceph-secret (6)
    18. fsType: ext4 (7)
    19. readOnly: false
    20. persistentVolumeReclaimPolicy: Retain
    1The name of the PV that is referenced in pod definitions or displayed in various oc volume commands.
    2The amount of storage allocated to this volume.
    3accessModes are used as labels to match a PV and a PVC. They currently do not define any form of access control. All block storage is defined to be single user (non-shared storage).
    4The volume type being used, in this case the rbd plug-in.
    5An array of Ceph monitor IP addresses and ports.
    6The Ceph secret used to create a secure connection from OKD to the Ceph server.
    7The file system type mounted on the Ceph RBD block device.

    Changing the value of the fstype parameter after the volume has been formatted and provisioned can result in data loss and pod failure.

  2. Save your definition to a file, for example ceph-pv.yaml, and create the PV:

    1. # oc create -f ceph-pv.yaml
  3. Verify that the persistent volume was created:

    1. # oc get pv
    2. NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
    3. ceph-pv <none> 2147483648 RWO Available 2s
  4. Create a PVC that will bind to the new PV:

    Example 2. PVC Object Definition

    1. kind: PersistentVolumeClaim
    2. apiVersion: v1
    3. metadata:
    4. name: ceph-claim
    5. spec:
    6. accessModes: (1)
    7. - ReadWriteOnce
    8. resources:
    9. requests:
    10. storage: 2Gi (2)
    1The accessModes do not enforce access right, but instead act as labels to match a PV to a PVC.
    2This claim looks for PVs offering 2Gi or greater capacity.
  5. Save the definition to a file, for example ceph-claim.yaml, and create the PVC:

    1. # oc create -f ceph-claim.yaml

Ceph Volume Security

See the full Volume Security topic before implementing Ceph RBD volumes.

A significant difference between shared volumes (NFS and GlusterFS) and block volumes (Ceph RBD, iSCSI, and most cloud storage), is that the user and group IDs defined in the pod definition or container image are applied to the target physical storage. This is referred to as managing ownership of the block device. For example, if the Ceph RBD mount has its owner set to 123 and its group ID set to 567, and if the pod defines its runAsUser set to 222 and its fsGroup to be 7777, then the Ceph RBD physical mount’s ownership will be changed to 222:7777.

Even if the user and group IDs are not defined in the pod specification, the resulting pod may have defaults defined for these IDs based on its matching SCC, or its project. See the full Volume Security topic which covers storage aspects of SCCs and defaults in greater detail.

A pod defines the group ownership of a Ceph RBD volume using the **fsGroup** stanza under the pod’s securityContext definition:

  1. spec:
  2. containers:
  3. - name:
  4. ...
  5. securityContext: (1)
  6. fsGroup: 7777 (2)
1The securityContext must be defined at the pod level, not under a specific container.
2All containers in the pod will have the same fsGroup ID.