Volume Security

Overview

This topic provides a general guide on pod security as it relates to volume security. For information on pod-level security in general, see Managing Security Context Constraints (SCC) and the Security Context Constraint concept topic. For information on the OKD persistent volume (PV) framework in general, see the Persistent Storage concept topic.

Accessing persistent storage requires coordination between the cluster and/or storage administrator and the end developer. The cluster administrator creates PVs, which abstract the underlying physical storage. The developer creates pods and, optionally, PVCs, which bind to PVs, based on matching criteria, such as capacity.

Multiple persistent volume claims (PVCs) within the same project can bind to the same PV. However, once a PVC binds to a PV, that PV cannot be bound by a claim outside of the first claim’s project. If the underlying storage needs to be accessed by multiple projects, then each project needs its own PV, which can point to the same physical storage. In this sense, a bound PV is tied to a project. For a detailed PV and PVC example, see the guide for WordPress and MySQL using NFS.

For the cluster administrator, granting pods access to PVs involves:

  • knowing the group ID and/or user ID assigned to the actual storage,

  • understanding SELinux considerations, and

  • ensuring that these IDs are allowed in the range of legal IDs defined for the project and/or the SCC that matches the requirements of the pod.

Group IDs, the user ID, and SELinux values are defined in the SecurityContext section in a pod definition. Group IDs are global to the pod and apply to all containers defined in the pod. User IDs can also be global, or specific to each container. Four sections control access to volumes:

SCCs, Defaults, and Allowed Ranges

SCCs influence whether or not a pod is given a default user ID, fsGroup ID, supplemental group ID, and SELinux label. They also influence whether or not IDs supplied in the pod definition (or in the image) will be validated against a range of allowable IDs. If validation is required and fails, then the pod will also fail.

SCCs define strategies, such as runAsUser, supplementalGroups, and fsGroup. These strategies help decide whether the pod is authorized. Strategy values set to RunAsAny are essentially stating that the pod can do what it wants regarding that strategy. Authorization is skipped for that strategy and no OKD default is produced based on that strategy. Therefore, IDs and SELinux labels in the resulting container are based on container defaults instead of OKD policies.

For a quick summary of RunAsAny:

  • Any ID defined in the pod definition (or image) is allowed.

  • Absence of an ID in the pod definition (and in the image) results in the container assigning an ID, which is root (0) for Docker.

  • No SELinux labels are defined, so Docker will assign a unique label.

For these reasons, SCCs with RunAsAny for ID-related strategies should be protected so that ordinary developers do not have access to the SCC. On the other hand, SCC strategies set to MustRunAs or MustRunAsRange trigger ID validation (for ID-related strategies), and cause default values to be supplied by OKD to the container when those values are not supplied directly in the pod definition or image.

Allowing access to SCCs with a RunAsAny FSGroup strategy can also prevent users from accessing their block devices. Pods need to specify an fsGroup in order to take over their block devices. Normally, this is done when the SCC FSGroup strategy is set to MustRunAs. If a user’s pod is assigned an SCC with a RunAsAny FSGroup strategy, then the user may face permission denied errors until they discover that they need to specify an fsGroup themselves.

SCCs may define the range of allowed IDs (user or groups). If range checking is required (for example, using MustRunAs) and the allowable range is not defined in the SCC, then the project determines the ID range. Therefore, projects support ranges of allowable ID. However, unlike SCCs, projects do not define strategies, such as runAsUser.

Allowable ranges are helpful not only because they define the boundaries for container IDs, but also because the minimum value in the range becomes the default value for the ID in question. For example, if the SCC ID strategy value is MustRunAs, the minimum value of an ID range is 100, and the ID is absent from the pod definition, then 100 is provided as the default for this ID.

As part of pod admission, the SCCs available to a pod are examined (roughly, in priority order followed by most restrictive) to best match the requests of the pod. Setting a SCC’s strategy type to RunAsAny is less restrictive, whereas a type of MustRunAs is more restrictive. All of these strategies are evaluated. To see which SCC was assigned to a pod, use the oc get pod command:

  1. # oc get pod <pod_name> -o yaml
  2. ...
  3. metadata:
  4. annotations:
  5. openshift.io/scc: nfs-scc (1)
  6. name: nfs-pod1 (2)
  7. namespace: default (3)
  8. ...
1Name of the SCC that the pod used (in this case, a custom SCC).
2Name of the pod.
3Name of the project. “Namespace” is interchangeable with “project” in OKD. See Projects and Users for details.

It may not be immediately obvious which SCC was matched by a pod, so the command above can be very useful in understanding the UID, supplemental groups, and SELinux relabeling in a live container.

Any SCC with a strategy set to RunAsAny allows specific values for that strategy to be defined in the pod definition (and/or image). When this applies to the user ID (**runAsUser**) it is prudent to restrict access to the SCC to prevent a container from being able to run as root.

Because pods often match the restricted SCC, it is worth knowing the security this entails. The restricted SCC has the following characteristics:

  • User IDs are constrained due to the runAsUser strategy being set to MustRunAsRange. This forces user ID validation.

  • Because a range of allowable user IDs is not defined in the SCC (see oc get -o yaml —export scc restricted` for more details), the project’s openshift.io/sa.scc.uid-range range will be used for range checking and for a default ID, if needed.

  • A default user ID is produced when a user ID is not specified in the pod definition and the matching SCC’s runAsUser is set to MustRunAsRange.

  • An SELinux label is required (seLinuxContext set to MustRunAs), which uses the project’s default MCS label.

  • fsGroup IDs are constrained to a single value due to the FSGroup strategy being set to MustRunAs, which dictates that the value to use is the minimum value of the first range specified.

  • Because a range of allowable fsGroup IDs is not defined in the SCC, the minimum value of the project’s openshift.io/sa.scc.supplemental-groups range (or the same range used for user IDs) will be used for validation and for a default ID, if needed.

  • A default fsGroup ID is produced when a fsGroup ID is not specified in the pod and the matching SCC’s FSGroup is set to MustRunAs.

  • Arbitrary supplemental group IDs are allowed because no range checking is required. This is a result of the supplementalGroups strategy being set to RunAsAny.

  • Default supplemental groups are not produced for the running pod due to RunAsAny for the two group strategies above. Therefore, if no groups are defined in the pod definition (or in the image), the container(s) will have no supplemental groups predefined.

The following shows the default project and a custom SCC (my-custom-scc), which summarizes the interactions of the SCC and the project:

  1. $ oc get project default -o yaml (1)
  2. ...
  3. metadata:
  4. annotations: (2)
  5. openshift.io/sa.scc.mcs: s0:c1,c0 (3)
  6. openshift.io/sa.scc.supplemental-groups: 1000000000/10000 (4)
  7. openshift.io/sa.scc.uid-range: 1000000000/10000 (5)
  8. $ oc get scc my-custom-scc -o yaml
  9. ...
  10. fsGroup:
  11. type: MustRunAs (6)
  12. ranges:
  13. - min: 5000
  14. max: 6000
  15. runAsUser:
  16. type: MustRunAsRange (7)
  17. uidRangeMin: 1000100000
  18. uidRangeMax: 1000100999
  19. seLinuxContext: (8)
  20. type: MustRunAs
  21. SELinuxOptions: (9)
  22. user: <selinux-user-name>
  23. role: ...
  24. type: ...
  25. level: ...
  26. supplementalGroups:
  27. type: MustRunAs (6)
  28. ranges:
  29. - min: 5000
  30. max: 6000
1default is the name of the project.
2Default values are only produced when the corresponding SCC strategy is not RunAsAny.
3SELinux default when not defined in the pod definition or in the SCC.
4Range of allowable group IDs. ID validation only occurs when the SCC strategy is RunAsAny. There can be more than one range specified, separated by commas. See below for supported formats.
5Same as <4> but for user IDs. Also, only a single range of user IDs is supported.
6MustRunAs enforces group ID range checking and provides the container’s groups default. Based on this SCC definition, the default is 5000 (the minimum ID value). If the range was omitted from the SCC, then the default would be 1000000000 (derived from the project). The other supported type, RunAsAny, does not perform range checking, thus allowing any group ID, and produces no default groups.
7MustRunAsRange enforces user ID range checking and provides a UID default. Based on this SCC, the default UID is 1000100000 (the minimum value). If the minimum and maximum range were omitted from the SCC, the default user ID would be 1000000000 (derived from the project). MustRunAsNonRoot and RunAsAny are the other supported types. The range of allowed IDs can be defined to include any user IDs required for the target storage.
8When set to MustRunAs, the container is created with the SCC’s SELinux options, or the MCS default defined in the project. A type of RunAsAny indicates that SELinux context is not required, and, if not defined in the pod, is not set in the container.
9The SELinux user name, role name, type, and labels can be defined here.

Two formats are supported for allowed ranges:

  1. M/N, where M is the starting ID and N is the count, so the range becomes M through (and including) M+N-1.

  2. M-N, where M is again the starting ID and N is the ending ID. The default group ID is the starting ID in the first range, which is 1000000000 in this project. If the SCC did not define a minimum group ID, then the project’s default ID is applied.

Supplemental Groups

Read SCCs, Defaults, and Allowed Ranges before working with supplemental groups.

It is generally preferable to use group IDs (supplemental or fsGroup) to gain access to persistent storage versus using user IDs.

Supplemental groups are regular Linux groups. When a process runs in Linux, it has a UID, a GID, and one or more supplemental groups. These attributes can be set for a container’s main process. The supplementalGroups IDs are typically used for controlling access to shared storage, such as NFS and GlusterFS, whereas fsGroup is used for controlling access to block storage, such as Ceph RBD and iSCSI.

The OKD shared storage plug-ins mount volumes such that the POSIX permissions on the mount match the permissions on the target storage. For example, if the target storage’s owner ID is 1234 and its group ID is 5678, then the mount on the host node and in the container will have those same IDs. Therefore, the container’s main process must match one or both of those IDs in order to access the volume.

For example, consider the following NFS export.

On an OKD node:

showmount requires access to the ports used by rpcbind and rpc.mount on the NFS server

  1. # showmount -e <nfs-server-ip-or-hostname>
  2. Export list for f21-nfs.vm:
  3. /opt/nfs *

On the NFS server:

  1. # cat /etc/exports
  2. /opt/nfs *(rw,sync,root_squash)
  3. ...
  4. # ls -lZ /opt/nfs -d
  5. drwx------. 1000100001 5555 unconfined_u:object_r:usr_t:s0 /opt/nfs

The /opt/nfs/ export is accessible by UID 1000100001 and the group 5555. In general, containers should not run as root. So, in this NFS example, containers which are not run as UID 1000100001 and are not members the group 5555 will not have access to the NFS export.

Often, the SCC matching the pod does not allow a specific user ID to be specified, thus using supplemental groups is a more flexible way to grant storage access to a pod. For example, to grant NFS access to the export above, the group 5555 can be defined in the pod definition:

  1. apiVersion: v1
  2. kind: Pod
  3. ...
  4. spec:
  5. containers:
  6. - name: ...
  7. volumeMounts:
  8. - name: nfs (1)
  9. mountPath: /usr/share/... (2)
  10. securityContext: (3)
  11. supplementalGroups: [5555] (4)
  12. volumes:
  13. - name: nfs (5)
  14. nfs:
  15. server: <nfs_server_ip_or_host>
  16. path: /opt/nfs (6)
1Name of the volume mount. Must match the name in the volumes section.
2NFS export path as seen in the container.
3Pod global security context. Applies to all containers inside the pod. Each container can also define its securityContext, however group IDs are global to the pod and cannot be defined for individual containers.
4Supplemental groups, which is an array of IDs, is set to 5555. This grants group access to the export.
5Name of the volume. Must match the name in the volumeMounts section.
6Actual NFS export path on the NFS server.

All containers in the above pod (assuming the matching SCC or project allows the group 5555) will be members of the group 5555 and have access to the volume, regardless of the container’s user ID. However, the assumption above is critical. Sometimes, the SCC does not define a range of allowable group IDs but instead requires group ID validation (a result of supplementalGroups set to MustRunAs). Note that this is not the case for the restricted SCC. The project will not likely allow a group ID of 5555, unless the project has been customized to access this NFS export. So, in this scenario, the above pod will fail because its group ID of 5555 is not within the SCC’s or the project’s range of allowed group IDs.

Supplemental Groups and Custom SCCs

To remedy the situation in the previous example, a custom SCC can be created such that:

  • a minimum and max group ID are defined,

  • ID range checking is enforced, and

  • the group ID of 5555 is allowed.

It is often better to create a new SCC rather than modifying a predefined SCC, or changing the range of allowed IDs in the predefined projects.

The easiest way to create a new SCC is to export an existing SCC and customize the YAML file to meet the requirements of the new SCC. For example:

  1. Use the restricted SCC as a template for the new SCC:

    1. $ oc get -o yaml --export scc restricted > new-scc.yaml
  2. Edit the new-scc.yaml file to your desired specifications.

  3. Create the new SCC:

    1. $ oc create -f new-scc.yaml

The oc edit scc command can be used to modify an instantiated SCC.

Here is a fragment of a new SCC named nfs-scc:

  1. $ oc get -o yaml --export scc nfs-scc
  2. allowHostDirVolumePlugin: false (1)
  3. ...
  4. kind: SecurityContextConstraints
  5. metadata:
  6. ...
  7. name: nfs-scc (2)
  8. priority: 9 (3)
  9. ...
  10. supplementalGroups:
  11. type: MustRunAs (4)
  12. ranges:
  13. - min: 5000 (5)
  14. max: 6000
  15. ...
1The allow booleans are the same as for the restricted SCC.
2Name of the new SCC.
3Numerically larger numbers have greater priority. Nil or omitted is the lowest priority. Higher priority SCCs sort before lower priority SCCs and thus have a better chance of matching a new pod.
4supplementalGroups is a strategy and it is set to MustRunAs, which means group ID checking is required.
5Multiple ranges are supported. The allowed group ID range here is 5000 through 5999, with the default supplemental group being 5000.

When the same pod shown earlier runs against this new SCC (assuming, of course, the pod matches the new SCC), it will start because the group 5555, supplied in the pod definition, is now allowed by the custom SCC.

fsGroup

Read SCCs, Defaults, and Allowed Ranges before working with supplemental groups.

It is generally preferable to use group IDs (supplemental or fsGroup) to gain access to persistent storage versus using user IDs.

fsGroup defines a pod’s “file system group” ID, which is added to the container’s supplemental groups. The supplementalGroups ID applies to shared storage, whereas the fsGroup ID is used for block storage.

Block storage, such as Ceph RBD, iSCSI, and various cloud storage, is typically dedicated to a single pod which has requested the block storage volume, either directly or using a PVC. Unlike shared storage, block storage is taken over by a pod, meaning that user and group IDs supplied in the pod definition (or image) are applied to the actual, physical block device. Typically, block storage is not shared.

A fsGroup definition is shown below in the following pod definition fragment:

  1. kind: Pod
  2. ...
  3. spec:
  4. containers:
  5. - name: ...
  6. securityContext: (1)
  7. fsGroup: 5555 (2)
  8. ...
1As with supplementalGroups, fsGroup must be defined globally to the pod, not per container.
25555 will become the group ID for the volume’s group permissions and for all new files created in the volume.

As with supplementalGroups, all containers in the above pod (assuming the matching SCC or project allows the group 5555) will be members of the group 5555, and will have access to the block volume, regardless of the container’s user ID. If the pod matches the restricted SCC, whose fsGroup strategy is MustRunAs, then the pod will fail to run. However, if the SCC has its fsGroup strategy set to RunAsAny, then any fsGroup ID (including 5555) will be accepted. Note that if the SCC has its fsGroup strategy set to RunAsAny and no fsGroup ID is specified, the “taking over” of the block storage does not occur and permissions may be denied to the pod.

fsGroups and Custom SCCs

To remedy the situation in the previous example, a custom SCC can be created such that:

  • a minimum and maximum group ID are defined,

  • ID range checking is enforced, and

  • the group ID of 5555 is allowed.

It is better to create new SCCs versus modifying a predefined SCC, or changing the range of allowed IDs in the predefined projects.

Consider the following fragment of a new SCC definition:

  1. # oc get -o yaml --export scc new-scc
  2. ...
  3. kind: SecurityContextConstraints
  4. ...
  5. fsGroup:
  6. type: MustRunAs (1)
  7. ranges: (2)
  8. - max: 6000
  9. min: 5000 (3)
  10. ...
1MustRunAs triggers group ID range checking, whereas RunAsAny does not require range checking.
2The range of allowed group IDs is 5000 through, and including, 5999. Multiple ranges are supported but not used. The allowed group ID range here is 5000 through 5999, with the default fsGroup being 5000.
3The minimum value (or the entire range) can be omitted from the SCC, and thus range checking and generating a default value will defer to the project’s openshift.io/sa.scc.supplemental-groups range. fsGroup and supplementalGroups use the same group field in the project; there is not a separate range for fsGroup.

When the pod shown above runs against this new SCC (assuming, of course, the pod matches the new SCC), it will start because the group 5555, supplied in the pod definition, is allowed by the custom SCC. Additionally, the pod will “take over” the block device, so when the block storage is viewed by a process outside of the pod, it will actually have 5555 as its group ID.

A list of volumes supporting block ownership include:

  • AWS Elastic Block Store

  • OpenStack Cinder

  • Ceph RBD

  • GCE Persistent Disk

  • iSCSI

  • emptyDir

This list is potentially incomplete.

User IDs

Read SCCs, Defaults, and Allowed Ranges before working with supplemental groups.

It is generally preferable to use group IDs (supplemental or fsGroup) to gain access to persistent storage versus using user IDs.

User IDs can be defined in the container image or in the pod definition. In the pod definition, a single user ID can be defined globally to all containers, or specific to individual containers (or both). A user ID is supplied as shown in the pod definition fragment below:

  1. spec:
  2. containers:
  3. - name: ...
  4. securityContext:
  5. runAsUser: 1000100001

ID 1000100001 in the above is container-specific and matches the owner ID on the export. If the NFS export’s owner ID was 54321, then that number would be used in the pod definition. Specifying securityContext outside of the container definition makes the ID global to all containers in the pod.

Similar to group IDs, user IDs may be validated according to policies set in the SCC and/or project. If the SCC’s runAsUser strategy is set to RunAsAny, then any user ID defined in the pod definition or in the image is allowed.

This means even a UID of 0 (root) is allowed.

If, instead, the runAsUser strategy is set to MustRunAsRange, then a supplied user ID will be validated against a range of allowed IDs. If the pod supplies no user ID, then the default ID is set to the minimum value of the range of allowable user IDs.

Returning to the earlier NFS example, the container needs its UID set to 1000100001, which is shown in the pod fragment above. Assuming the default project and the restricted SCC, the pod’s requested user ID of 1000100001 will not be allowed, and therefore the pod will fail. The pod fails because:

  • it requests 1000100001 as its user ID,

  • all available SCCs use MustRunAsRange for their runAsUser strategy, so UID range checking is required, and

  • 1000100001 is not included in the SCC or in the project’s user ID range.

To remedy this situation, a new SCC can be created with the appropriate user ID range. A new project could also be created with the appropriate user ID range defined. There are also other, less-preferred options:

  • The restricted SCC could be modified to include 1000100001 within its minimum and maximum user ID range. This is not recommended as you should avoid modifying the predefined SCCs if possible.

  • The restricted SCC could be modified to use RunAsAny for the runAsUser value, thus eliminating ID range checking. This is strongly not recommended, as containers could run as root.

  • The default project’s UID range could be changed to allow a user ID of 1000100001. This is not generally advisable because only a single range of user IDs can be specified, and thus other pods may not run if the range is altered.

User IDs and Custom SCCs

It is good practice to avoid modifying the predefined SCCs if possible. The preferred approach is to create a custom SCC that better fits an organization’s security needs, or create a new project that supports the desired user IDs.

To remedy the situation in the previous example, a custom SCC can be created such that:

  • a minimum and maximum user ID is defined,

  • UID range checking is still enforced, and

  • the UID of 1000100001 is allowed.

For example:

  1. $ oc get -o yaml --export scc nfs-scc
  2. allowHostDirVolumePlugin: false (1)
  3. ...
  4. kind: SecurityContextConstraints
  5. metadata:
  6. ...
  7. name: nfs-scc (2)
  8. priority: 9 (3)
  9. requiredDropCapabilities: null
  10. runAsUser:
  11. type: MustRunAsRange (4)
  12. uidRangeMax: 1000100001 (5)
  13. uidRangeMin: 1000100001
  14. ...
1The allowXX bools are the same as for the restricted SCC.
2The name of this new SCC is nfs-scc.
3Numerically larger numbers have greater priority. Nil or omitted is the lowest priority. Higher priority SCCs sort before lower priority SCCs, and thus have a better chance of matching a new pod.
4The runAsUser strategy is set to MustRunAsRange, which means UID range checking is enforced.
5The UID range is 1000100001 through 1000100001 (a range of one value).

Now, with runAsUser: 1000100001 shown in the previous pod definition fragment, the pod matches the new nfs-scc and is able to run with a UID of 1000100001.

SELinux Options

All predefined SCCs, except for the privileged SCC, set the seLinuxContext to MustRunAs. So the SCCs most likely to match a pod’s requirements will force the pod to use an SELinux policy. The SELinux policy used by the pod can be defined in the pod itself, in the image, in the SCC, or in the project (which provides the default).

SELinux labels can be defined in a pod’s securityContext.seLinuxOptions section, and supports user, role, type, and level:

Level and MCS label are used interchangeably in this topic.

  1. ...
  2. securityContext: (1)
  3. seLinuxOptions:
  4. level: "s0:c123,c456" (2)
  5. ...
1level can be defined globally for the entire pod, or individually for each container.
2SELinux level label.

Here are fragments from an SCC and from the default project:

  1. $ oc get -o yaml --export scc scc-name
  2. ...
  3. seLinuxContext:
  4. type: MustRunAs (1)
  5. # oc get -o yaml --export namespace default
  6. ...
  7. metadata:
  8. annotations:
  9. openshift.io/sa.scc.mcs: s0:c1,c0 (2)
  10. ...
1MustRunAs causes volume relabeling.
2If the label is not provided in the pod or in the SCC, then the default comes from the project.

All predefined SCCs, except for the privileged SCC, set the seLinuxContext to MustRunAs. This forces pods to use MCS labels, which can be defined in the pod definition, the image, or provided as a default.

The SCC determines whether or not to require an SELinux label and can provide a default label. If the seLinuxContext strategy is set to MustRunAs and the pod (or image) does not define a label, OKD defaults to a label chosen from the SCC itself or from the project.

If seLinuxContext is set to RunAsAny, then no default labels are provided, and the container determines the final label. In the case of Docker, the container will use a unique MCS label, which will not likely match the labeling on existing storage mounts. Volumes which support SELinux management will be relabeled so that they are accessible by the specified label and, depending on how exclusionary the label is, only that label.

This means two things for unprivileged containers:

  • The volume is given a type that is accessible by unprivileged containers. This type is usually container_file_t in Red Hat Enterprise Linux (RHEL) version 7.5 and later. This type treats volumes as container content. In previous RHEL versions, RHEL 7.4, 7.3, and so forth, the volume is given the svirt_sandbox_file_t type.

  • If a level is specified, the volume is labeled with the given MCS label.

For a volume to be accessible by a pod, the pod must have both categories of the volume. So a pod with s0:c1,c2 will be able to access a volume with s0:c1,c2. A volume with s0 will be accessible by all pods.

If pods fail authorization, or if the storage mount is failing due to permissions errors, then there is a possibility that SELinux enforcement is interfering. One way to check for this is to run:

  1. # ausearch -m avc --start recent

This examines the log file for AVC (Access Vector Cache) errors.