Persistent Storage Using Ceph Rados Block Device (RBD)
Overview
OKD clusters can be provisioned with persistent storage using Ceph RBD.
Persistent volumes (PVs) and persistent volume claims (PVCs) can share volumes across a single project. While the Ceph RBD-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.
This topic presumes some familiarity with OKD and Ceph RBD. See the Persistent Storage concept topic for details on the OKD persistent volume (PV) framework in general.
Project and namespace are used interchangeably throughout this document. See Projects and Users for details on the relationship. |
High-availability of storage in the infrastructure is left to the underlying storage provider. |
Provisioning
To provision Ceph volumes, the following are required:
An existing storage device in your underlying infrastructure.
The Ceph key to be used in an OKD secret object.
The Ceph image name.
The file system type on top of the block storage (e.g., ext4).
ceph-common installed on each schedulable OKD node in your cluster:
# yum install ceph-common
Creating the Ceph Secret
Define the authorization key in a secret configuration, which is then converted to base64 for use by OKD.
In order to use Ceph storage to back a persistent volume, the secret must be created in the same project as the PVC and pod. The secret cannot simply be in the default project. |
Run
ceph auth get-key
on a Ceph MON node to display the key value for theclient.admin
user:apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
data:
key: QVFBOFF2SlZheUJQRVJBQWgvS2cwT1laQUhPQno3akZwekxxdGc9PQ==
type: kubernetes.io/rbd
Save the secret definition to a file, for example ceph-secret.yaml, then create the secret:
$ oc create -f ceph-secret.yaml
Verify that the secret was created:
# oc get secret ceph-secret
NAME TYPE DATA AGE
ceph-secret kubernetes.io/rbd 1 23d
Creating the Persistent Volume
Developers request Ceph RBD storage by referencing either a PVC, or the Gluster volume plug-in directly in the **volumes**
section of a pod specification. A PVC exists only in the user’s namespace and can be referenced only by pods within that same namespace. Any attempt to access a PV from a different namespace causes the pod to fail.
Define the PV in an object definition before creating it in OKD:
Example 1. Persistent Volume Object Definition Using Ceph RBD
apiVersion: v1
kind: PersistentVolume
metadata:
name: ceph-pv (1)
spec:
capacity:
storage: 2Gi (2)
accessModes:
- ReadWriteOnce (3)
rbd: (4)
monitors: (5)
- 192.168.122.133:6789
pool: rbd
image: ceph-image
user: admin
secretRef:
name: ceph-secret (6)
fsType: ext4 (7)
readOnly: false
persistentVolumeReclaimPolicy: Retain
1 The name of the PV that is referenced in pod definitions or displayed in various oc
volume commands.2 The amount of storage allocated to this volume. 3 accessModes
are used as labels to match a PV and a PVC. They currently do not define any form of access control. All block storage is defined to be single user (non-shared storage).4 The volume type being used, in this case the rbd plug-in. 5 An array of Ceph monitor IP addresses and ports. 6 The Ceph secret used to create a secure connection from OKD to the Ceph server. 7 The file system type mounted on the Ceph RBD block device. Changing the value of the
fstype
parameter after the volume has been formatted and provisioned can result in data loss and pod failure.Save your definition to a file, for example ceph-pv.yaml, and create the PV:
# oc create -f ceph-pv.yaml
Verify that the persistent volume was created:
# oc get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
ceph-pv <none> 2147483648 RWO Available 2s
Create a PVC that will bind to the new PV:
Example 2. PVC Object Definition
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-claim
spec:
accessModes: (1)
- ReadWriteOnce
resources:
requests:
storage: 2Gi (2)
1 The accessModes
do not enforce access right, but instead act as labels to match a PV to a PVC.2 This claim looks for PVs offering 2Gi
or greater capacity.Save the definition to a file, for example ceph-claim.yaml, and create the PVC:
# oc create -f ceph-claim.yaml
Ceph Volume Security
See the full Volume Security topic before implementing Ceph RBD volumes. |
A significant difference between shared volumes (NFS and GlusterFS) and block volumes (Ceph RBD, iSCSI, and most cloud storage), is that the user and group IDs defined in the pod definition or container image are applied to the target physical storage. This is referred to as managing ownership of the block device. For example, if the Ceph RBD mount has its owner set to 123 and its group ID set to 567, and if the pod defines its runAsUser
set to 222 and its fsGroup
to be 7777, then the Ceph RBD physical mount’s ownership will be changed to 222:7777.
Even if the user and group IDs are not defined in the pod specification, the resulting pod may have defaults defined for these IDs based on its matching SCC, or its project. See the full Volume Security topic which covers storage aspects of SCCs and defaults in greater detail. |
A pod defines the group ownership of a Ceph RBD volume using the **fsGroup**
stanza under the pod’s securityContext
definition:
spec:
containers:
- name:
...
securityContext: (1)
fsGroup: 7777 (2)
1 | The securityContext must be defined at the pod level, not under a specific container. |
2 | All containers in the pod will have the same fsGroup ID. |