Pod Security Standards
Security settings for Pods are typically applied by using security contexts. Security Contexts allow for the definition of privilege and access controls on a per-Pod basis.
The enforcement and policy-based definition of cluster requirements of security contexts has previously been achieved using Pod Security Policy. A Pod Security Policy is a cluster-level resource that controls security sensitive aspects of the Pod specification.
However, numerous means of policy enforcement have arisen that augment or replace the use of PodSecurityPolicy. The intent of this page is to detail recommended Pod security profiles, decoupled from any specific instantiation.
Policy Types
There is an immediate need for base policy definitions to broadly cover the security spectrum. These should range from highly restricted to highly flexible:
- Privileged - Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
- Baseline - Minimally restrictive policy while preventing known privilege escalations. Allows the default (minimally specified) Pod configuration.
- Restricted - Heavily restricted policy, following current Pod hardening best practices.
Policies
Privileged
The Privileged policy is purposely-open, and entirely unrestricted. This type of policy is typically aimed at system- and infrastructure-level workloads managed by privileged, trusted users.
The privileged policy is defined by an absence of restrictions. For allow-by-default enforcement mechanisms (such as gatekeeper), the privileged profile may be an absence of applied constraints rather than an instantiated policy. In contrast, for a deny-by-default mechanism (such as Pod Security Policy) the privileged policy should enable all controls (disable all restrictions).
Baseline
The Baseline policy is aimed at ease of adoption for common containerized workloads while preventing known privilege escalations. This policy is targeted at application operators and developers of non-critical applications. The following listed controls should be enforced/disallowed:
Control | Policy |
Host Namespaces | Sharing the host namespaces must be disallowed. Restricted Fields: spec.hostNetwork spec.hostPID spec.hostIPC Allowed Values: false |
Privileged Containers | Privileged Pods disable most security mechanisms and must be disallowed. Restricted Fields: spec.containers[].securityContext.privileged spec.initContainers[].securityContext.privileged Allowed Values: false, undefined/nil |
Capabilities | Adding additional capabilities beyond the default set must be disallowed. Restricted Fields: spec.containers[].securityContext.capabilities.add spec.initContainers[].securityContext.capabilities.add Allowed Values: empty (or restricted to a known list) |
HostPath Volumes | HostPath volumes must be forbidden. Restricted Fields: spec.volumes[].hostPath Allowed Values: undefined/nil |
Host Ports | HostPorts should be disallowed, or at minimum restricted to a known list. Restricted Fields: spec.containers[].ports[].hostPort spec.initContainers[].ports[].hostPort Allowed Values: 0, undefined (or restricted to a known list) |
AppArmor (optional) | On supported hosts, the ‘runtime/default’ AppArmor profile is applied by default. The baseline policy should prevent overriding or disabling the default AppArmor profile, or restrict overrides to an allowed set of profiles. Restricted Fields: metadata.annotations[‘container.apparmor.security.beta.kubernetes.io/‘] Allowed Values: ‘runtime/default’, undefined |
SELinux (optional) | Setting custom SELinux options should be disallowed. Restricted Fields: spec.securityContext.seLinuxOptions spec.containers[].securityContext.seLinuxOptions spec.initContainers[].securityContext.seLinuxOptions Allowed Values: undefined/nil |
/proc Mount Type | The default /proc masks are set up to reduce attack surface, and should be required. Restricted Fields: spec.containers[].securityContext.procMount spec.initContainers[].securityContext.procMount Allowed Values: undefined/nil, ‘Default’ |
Sysctls | Sysctls can disable security mechanisms or affect all containers on a host, and should be disallowed except for an allowed “safe” subset. A sysctl is considered safe if it is namespaced in the container or the Pod, and it is isolated from other Pods or processes on the same Node. Restricted Fields: spec.securityContext.sysctls Allowed Values: kernel.shm_rmid_forced net.ipv4.ip_local_port_range net.ipv4.tcp_syncookies net.ipv4.ping_group_range undefined/empty |
Restricted
The Restricted policy is aimed at enforcing current Pod hardening best practices, at the expense of some compatibility. It is targeted at operators and developers of security-critical applications, as well as lower-trust users.The following listed controls should be enforced/disallowed:
Control | Policy |
Everything from the baseline profile. | |
Volume Types | In addition to restricting HostPath volumes, the restricted profile limits usage of non-core volume types to those defined through PersistentVolumes. Restricted Fields: spec.volumes[].hostPath spec.volumes[].gcePersistentDisk spec.volumes[].awsElasticBlockStore spec.volumes[].gitRepo spec.volumes[].nfs spec.volumes[].iscsi spec.volumes[].glusterfs spec.volumes[].rbd spec.volumes[].flexVolume spec.volumes[].cinder spec.volumes[].cephFS spec.volumes[].flocker spec.volumes[].fc spec.volumes[].azureFile spec.volumes[].vsphereVolume spec.volumes[].quobyte spec.volumes[].azureDisk spec.volumes[].portworxVolume spec.volumes[].scaleIO spec.volumes[].storageos spec.volumes[].csi Allowed Values: undefined/nil |
Privilege Escalation | Privilege escalation (such as via set-user-ID or set-group-ID file mode) should not be allowed. Restricted Fields: spec.containers[].securityContext.allowPrivilegeEscalation spec.initContainers[].securityContext.allowPrivilegeEscalation Allowed Values: false |
Running as Non-root | Containers must be required to run as non-root users. Restricted Fields: spec.securityContext.runAsNonRoot spec.containers[].securityContext.runAsNonRoot spec.initContainers[].securityContext.runAsNonRoot Allowed Values: true |
Non-root groups (optional) | Containers should be forbidden from running with a root primary or supplementary GID. Restricted Fields: spec.securityContext.runAsGroup spec.securityContext.supplementalGroups[] spec.securityContext.fsGroup spec.containers[].securityContext.runAsGroup spec.initContainers[].securityContext.runAsGroup Allowed Values: non-zero undefined / nil (except for *.runAsGroup ) |
Seccomp | The RuntimeDefault seccomp profile must be required, or allow specific additional profiles. Restricted Fields: spec.securityContext.seccompProfile.type spec.containers[].securityContext.seccompProfile spec.initContainers[].securityContext.seccompProfile Allowed Values: ‘runtime/default’ undefined / nil |
Policy Instantiation
Decoupling policy definition from policy instantiation allows for a common understanding and consistent language of policies across clusters, independent of the underlying enforcement mechanism.
As mechanisms mature, they will be defined below on a per-policy basis. The methods of enforcement of individual policies are not defined here.
FAQ
Why isn’t there a profile between privileged and baseline?
The three profiles defined here have a clear linear progression from most secure (restricted) to least secure (privileged), and cover a broad set of workloads. Privileges required above the baseline policy are typically very application specific, so we do not offer a standard profile in this niche. This is not to say that the privileged profile should always be used in this case, but that policies in this space need to be defined on a case-by-case basis.
SIG Auth may reconsider this position in the future, should a clear need for other profiles arise.
What’s the difference between a security policy and a security context?
Security Contexts configure Pods and Containers at runtime. Security contexts are defined as part of the Pod and container specifications in the Pod manifest, and represent parameters to the container runtime.
Security policies are control plane mechanisms to enforce specific settings in the Security Context, as well as other parameters outside the Security Context. As of February 2020, the current native solution for enforcing these security policies is Pod Security Policy - a mechanism for centrally enforcing security policy on Pods across a cluster. Other alternatives for enforcing security policy are being developed in the Kubernetes ecosystem, such as OPA Gatekeeper.
What profiles should I apply to my Windows Pods?
Windows in Kubernetes has some limitations and differentiators from standard Linux-based workloads. Specifically, the Pod SecurityContext fields have no effect on Windows. As such, no standardized Pod Security profiles currently exists.
What about sandboxed Pods?
There is not currently an API standard that controls whether a Pod is considered sandboxed or not. Sandbox Pods may be identified by the use of a sandboxed runtime (such as gVisor or Kata Containers), but there is no standard definition of what a sandboxed runtime is.
The protections necessary for sandboxed workloads can differ from others. For example, the need to restrict privileged permissions is lessened when the workload is isolated from the underlying kernel. This allows for workloads requiring heightened permissions to still be isolated.
Additionally, the protection of sandboxed workloads is highly dependent on the method of sandboxing. As such, no single recommended policy is recommended for all sandboxed workloads.