Persistent Storage Using NFS

You are viewing documentation for a release that is no longer supported. The latest supported version of version 3 is [3.11]. For the most recent version 4, see [4]

You are viewing documentation for a release that is no longer supported. The latest supported version of version 3 is [3.11]. For the most recent version 4, see [4]

Overview

OKD clusters can be provisioned with persistent storage using NFS. Persistent volumes (PVs) and persistent volume claims (PVCs) provide a convenient method for sharing a volume across a project. While the NFS-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.

This topic covers the specifics of using the NFS persistent storage type. Some familiarity with OKD and NFS is beneficial. See the Persistent Storage concept topic for details on the OKD persistent volume (PV) framework in general.

Provisioning

Storage must exist in the underlying infrastructure before it can be mounted as a volume in OKD. To provision NFS volumes, a list of NFS servers and export paths are all that is required.

You must first create an object definition for the PV:

Example 1. PV Object Definition Using NFS

  1. apiVersion: v1
  2. kind: PersistentVolume
  3. metadata:
  4. name: pv0001 (1)
  5. spec:
  6. capacity:
  7. storage: 5Gi (2)
  8. accessModes:
  9. - ReadWriteOnce (3)
  10. nfs: (4)
  11. path: /tmp (5)
  12. server: 172.17.0.2 (6)
  13. persistentVolumeReclaimPolicy: Recycle (7)
1The name of the volume. This is the PV identity in various oc <command> pod commands.
2The amount of storage allocated to this volume.
3Though this appears to be related to controlling access to the volume, it is actually used similarly to labels and used to match a PVC to a PV. Currently, no access rules are enforced based on the accessModes.
4The volume type being used, in this case the nfs plug-in.
5The path that is exported by the NFS server.
6The host name or IP address of the NFS server.
7The reclaim policy for the PV. This defines what happens to a volume when released from its claim. Valid options are Retain (default) and Recycle. See Reclaiming Resources.

Each NFS volume must be mountable by all schedulable nodes in the cluster.

Save the definition to a file, for example nfs-pv.yaml, and create the PV:

  1. $ oc create -f nfs-pv.yaml
  2. persistentvolume "pv0001" created

Verify that the PV was created:

  1. # oc get pv
  2. NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
  3. pv0001 <none> 5368709120 RWO Available 31s

The next step can be to create a PVC, which binds to the new PV:

Example 2. PVC Object Definition

  1. apiVersion: v1
  2. kind: PersistentVolumeClaim
  3. metadata:
  4. name: nfs-claim1
  5. spec:
  6. accessModes:
  7. - ReadWriteOnce (1)
  8. resources:
  9. requests:
  10. storage: 1Gi (2)
1As mentioned above for PVs, the accessModes do not enforce security, but rather act as labels to match a PV to a PVC.
2This claim looks for PVs offering 1Gi or greater capacity.

Save the definition to a file, for example nfs-claim.yaml, and create the PVC:

  1. # oc create -f nfs-claim.yaml

Enforcing Disk Quotas

You can use disk partitions to enforce disk quotas and size constraints. Each partition can be its own export. Each export is one PV. OKD enforces unique names for PVs, but the uniqueness of the NFS volume’s server and path is up to the administrator.

Enforcing quotas in this way allows the developer to request persistent storage by a specific amount (for example, 10Gi) and be matched with a corresponding volume of equal or greater capacity.

NFS Volume Security

This section covers NFS volume security, including matching permissions and SELinux considerations. The user is expected to understand the basics of POSIX permissions, process UIDs, supplemental groups, and SELinux.

See the full Volume Security topic before implementing NFS volumes.

Developers request NFS storage by referencing, in the **volumes** section of their pod definition, either a PVC by name or the NFS volume plug-in directly.

The /etc/exports file on the NFS server contains the accessible NFS directories. The target NFS directory has POSIX owner and group IDs. The OKD NFS plug-in mounts the container’s NFS directory with the same POSIX ownership and permissions found on the exported NFS directory. However, the container is not run with its effective UID equal to the owner of the NFS mount, which is the desired behavior.

As an example, if the target NFS directory appears on the NFS server as:

  1. # ls -lZ /opt/nfs -d
  2. drwxrws---. nfsnobody 5555 unconfined_u:object_r:usr_t:s0 /opt/nfs
  3. # id nfsnobody
  4. uid=65534(nfsnobody) gid=65534(nfsnobody) groups=65534(nfsnobody)

Then the container must match SELinux labels, and either run with a UID of 65534 (nfsnobody owner) or with 5555 in its supplemental groups in order to access the directory.

The owner ID of 65534 is used as an example. Even though NFS’s root_squash maps root (0) to nfsnobody (65534), NFS exports can have arbitrary owner IDs. Owner 65534 is not required for NFS exports.

Group IDs

The recommended way to handle NFS access (assuming it is not an option to change permissions on the NFS export) is to use supplemental groups. Supplemental groups in OKD are used for shared storage, of which NFS is an example. In contrast, block storage, such as Ceph RBD or iSCSI, use the fsGroup SCC strategy and the fsGroup value in the pod’s securityContext.

It is generally preferable to use supplemental group IDs to gain access to persistent storage versus using user IDs. Supplemental groups are covered further in the full Volume Security topic.

Because the group ID on the example target NFS directory shown above is 5555, the pod can define that group ID using **supplementalGroups** under the pod-level securityContext definition. For example:

  1. spec:
  2. containers:
  3. - name:
  4. ...
  5. securityContext: (1)
  6. supplementalGroups: [5555] (2)
1securityContext must be defined at the pod level, not under a specific container.
2An array of GIDs defined for the pod. In this case, there is one element in the array; additional GIDs would be comma-separated.

Assuming there are no custom SCCs that might satisfy the pod’s requirements, the pod likely matches the restricted SCC. This SCC has the **supplementalGroups** strategy set to RunAsAny, meaning that any supplied group ID is accepted without range checking.

As a result, the above pod passes admissions and is launched. However, if group ID range checking is desired, a custom SCC, as described in pod security and custom SCCs, is the preferred solution. A custom SCC can be created such that minimum and maximum group IDs are defined, group ID range checking is enforced, and a group ID of 5555 is allowed.

To use a custom SCC, you must first add it to the appropriate service account. For example, use the default service account in the given project unless another has been specified on the pod specification. See Add an SCC to a User, Group, or Project for details.

User IDs

User IDs can be defined in the container image or in the pod definition. The full Volume Security topic covers controlling storage access based on user IDs, and should be read prior to setting up NFS persistent storage.

It is generally preferable to use supplemental group IDs to gain access to persistent storage versus using user IDs.

In the example target NFS directory shown above, the container needs its UID set to 65534 (ignoring group IDs for the moment), so the following can be added to the pod definition:

  1. spec:
  2. containers: (1)
  3. - name:
  4. ...
  5. securityContext:
  6. runAsUser: 65534 (2)
1Pods contain a securityContext specific to each container (shown here) and a pod-level securityContext which applies to all containers defined in the pod.
265534 is the nfsnobody user.

Assuming the default project and the restricted SCC, the pod’s requested user ID of 65534 is not allowed, and therefore the pod fails. The pod fails for the following reasons:

  • It requests 65534 as its user ID.

  • All SCCs available to the pod are examined to see which SCC allows a user ID of 65534 (actually, all policies of the SCCs are checked but the focus here is on user ID).

  • Because all available SCCs use MustRunAsRange for their **runAsUser** strategy, UID range checking is required.

  • 65534 is not included in the SCC or project’s user ID range.

It is generally considered a good practice not to modify the predefined SCCs. The preferred way to fix this situation is to create a custom SCC, as described in the full Volume Security topic. A custom SCC can be created such that minimum and maximum user IDs are defined, UID range checking is still enforced, and the UID of 65534 is allowed.

To use a custom SCC, you must first add it to the appropriate service account. For example, use the default service account in the given project unless another has been specified on the pod specification. See Add an SCC to a User, Group, or Project for details.

SELinux

See the full Volume Security topic for information on controlling storage access in conjunction with using SELinux.

By default, SELinux does not allow writing from a pod to a remote NFS server. The NFS volume mounts correctly, but is read-only.

To enable writing to NFS volumes with SELinux enforcing on each node, run:

  1. # setsebool -P virt_use_nfs 1

The -P option above makes the bool persistent between reboots.

The virt_use_nfs boolean is defined by the docker-selinux package. If an error is seen indicating that this bool is not defined, ensure this package has been installed.

Export Settings

In order to enable arbitrary container users to read and write the volume, each exported volume on the NFS server should conform to the following conditions:

  • Each export must be:

    1. /<example_fs> *(rw,root_squash)
  • The firewall must be configured to allow traffic to the mount point.

    • For NFSv4, configure the default port 2049 (nfs) and port 111 (portmapper).

      NFSv4

      1. # iptables -I INPUT 1 -p tcp --dport 2049 -j ACCEPT
      2. # iptables -I INPUT 1 -p tcp --dport 111 -j ACCEPT
    • For NFSv3, there are three ports to configure: 2049 (nfs), 20048 (mountd), and 111 (portmapper).

      NFSv3

      1. # iptables -I INPUT 1 -p tcp --dport 2049 -j ACCEPT
      2. # iptables -I INPUT 1 -p tcp --dport 20048 -j ACCEPT
      3. # iptables -I INPUT 1 -p tcp --dport 111 -j ACCEPT
  • The NFS export and directory must be set up so that it is accessible by the target pods. Either set the export to be owned by the container’s primary UID, or supply the pod group access using **supplementalGroups**, as shown in Group IDs above. See the full Volume Security topic for additional pod security information as well.

Reclaiming Resources

NFS implements the OKD Recyclable plug-in interface. Automatic processes handle reclamation tasks based on policies set on each persistent volume.

By default, PVs are set to Retain. NFS volumes which are set to Recycle are scrubbed (i.e., rm -rf is run on the volume) after being released from their claim (i.e, after the user’s **PersistentVolumeClaim** bound to the volume is deleted). Once recycled, the NFS volume can be bound to a new claim.

Once claim to a PV is released (that is, the PVC is deleted), the PV object should not be re-used. Instead, a new PV should be created with the same basic volume details as the original.

For example, the administrator creates a PV named nfs1:

  1. apiVersion: v1
  2. kind: PersistentVolume
  3. metadata:
  4. name: nfs1
  5. spec:
  6. capacity:
  7. storage: 1Mi
  8. accessModes:
  9. - ReadWriteMany
  10. nfs:
  11. server: 192.168.1.1
  12. path: "/"

The user creates PVC1, which binds to nfs1. The user then deletes PVC1, releasing claim to nfs1, which causes nfs1 to be Released. If the administrator wishes to make the same NFS share available, they should create a new PV with the same NFS server details, but a different PV name:

  1. apiVersion: v1
  2. kind: PersistentVolume
  3. metadata:
  4. name: nfs2
  5. spec:
  6. capacity:
  7. storage: 1Mi
  8. accessModes:
  9. - ReadWriteMany
  10. nfs:
  11. server: 192.168.1.1
  12. path: "/"

Deleting the original PV and re-creating it with the same name is discouraged. Attempting to manually change the status of a PV from Released to Available causes errors and potential data loss.

A PV with retention policy of Recycle scrubs (rm -rf) the data and marks it as Available for claim. The Recycle retention policy is deprecated starting in OKD 3.6 and should be avoided. Anyone using recycler should use dynamic provision and volume deletion instead.

Automation

Clusters can be provisioned with persistent storage using NFS in the following ways:

They are many ways that you can use scripts to automate the above tasks. You can use an example Ansible playbook to help you get started.

Additional Configuration and Troubleshooting

Depending on what version of NFS is being used and how it is configured, there may be additional configuration steps needed for proper export and security mapping. The following are some that may apply:

NFSv4 mount incorrectly shows all files with ownership of nobody:nobody

Disabling ID mapping on NFSv4

  • On both the NFS client and server, run:

    1. # echo ‘Y’ > /sys/module/nfsd/parameters/nfs4_disable_idmapping