Configuring an OKD cluster for pods

As an administrator, you can create and maintain an efficient cluster for pods.

By keeping your cluster efficient, you can provide a better environment for your developers using such tools as what a pod does when it exits, ensuring that the required number of pods is always running, when to restart pods designed to run only once, limit the bandwidth available to pods, and how to keep pods running during disruptions.

Configuring how pods behave after restart

A pod restart policy determines how OKD responds when Containers in that pod exit. The policy applies to all Containers in that pod.

The possible values are:

  • Always - Tries restarting a successfully exited Container on the pod continuously, with an exponential back-off delay (10s, 20s, 40s) until the pod is restarted. The default is Always.

  • OnFailure - Tries restarting a failed Container on the pod with an exponential back-off delay (10s, 20s, 40s) capped at 5 minutes.

  • Never - Does not try to restart exited or failed Containers on the pod. Pods immediately fail and exit.

After the pod is bound to a node, the pod will never be bound to another node. This means that a controller is necessary in order for a pod to survive node failure:

ConditionController TypeRestart Policy

Pods that are expected to terminate (such as batch computations)

Job

OnFailure or Never

Pods that are expected to not terminate (such as web servers)

Replication controller

Always.

Pods that must run one-per-machine

Daemon set

Any

If a Container on a pod fails and the restart policy is set to OnFailure, the pod stays on the node and the Container is restarted. If you do not want the Container to restart, use a restart policy of Never.

If an entire pod fails, OKD starts a new pod. Developers must address the possibility that applications might be restarted in a new pod. In particular, applications must handle temporary files, locks, incomplete output, and so forth caused by previous runs.

Kubernetes architecture expects reliable endpoints from cloud providers. When a cloud provider is down, the kubelet prevents OKD from restarting.

If the underlying cloud provider endpoints are not reliable, do not install a cluster using cloud provider integration. Install the cluster as if it was in a no-cloud environment. It is not recommended to toggle cloud provider integration on or off in an installed cluster.

For details on how OKD uses restart policy with failed Containers, see the Example States in the Kubernetes documentation.

Limiting the bandwidth available to pods

You can apply quality-of-service traffic shaping to a pod and effectively limit its available bandwidth. Egress traffic (from the pod) is handled by policing, which simply drops packets in excess of the configured rate. Ingress traffic (to the pod) is handled by shaping queued packets to effectively handle data. The limits you place on a pod do not affect the bandwidth of other pods.

Procedure

To limit the bandwidth on a pod:

  1. Write an object definition JSON file, and specify the data traffic speed using kubernetes.io/ingress-bandwidth and kubernetes.io/egress-bandwidth annotations. For example, to limit both pod egress and ingress bandwidth to 10M/s:

    Limited Pod Object Definition

    1. {
    2. "kind": "Pod",
    3. "spec": {
    4. "containers": [
    5. {
    6. "image": "openshift/hello-openshift",
    7. "name": "hello-openshift"
    8. }
    9. ]
    10. },
    11. "apiVersion": "v1",
    12. "metadata": {
    13. "name": "iperf-slow",
    14. "annotations": {
    15. "kubernetes.io/ingress-bandwidth": "10M",
    16. "kubernetes.io/egress-bandwidth": "10M"
    17. }
    18. }
    19. }
  2. Create the pod using the object definition:

    1. $ oc create -f <file_or_dir_path>

Understanding how to use pod disruption budgets to specify the number of pods that must be up

A pod disruption budget is part of the Kubernetes API, which can be managed with oc commands like other object types. They allow the specification of safety constraints on pods during operations, such as draining a node for maintenance.

PodDisruptionBudget is an API object that specifies the minimum number or percentage of replicas that must be up at a time. Setting these in projects can be helpful during node maintenance (such as scaling a cluster down or a cluster upgrade) and is only honored on voluntary evictions (not on node failures).

A PodDisruptionBudget object’s configuration consists of the following key parts:

  • A label selector, which is a label query over a set of pods.

  • An availability level, which specifies the minimum number of pods that must be available simultaneously, either:

    • minAvailable is the number of pods must always be available, even during a disruption.

    • maxUnavailable is the number of pods can be unavailable during a disruption.

A maxUnavailable of 0% or 0 or a minAvailable of 100% or equal to the number of replicas is permitted but can block nodes from being drained.

You can check for pod disruption budgets across all projects with the following:

  1. $ oc get poddisruptionbudget --all-namespaces

Example output

  1. NAMESPACE NAME MIN-AVAILABLE SELECTOR
  2. another-project another-pdb 4 bar=foo
  3. test-project my-pdb 2 foo=bar

The PodDisruptionBudget is considered healthy when there are at least minAvailable pods running in the system. Every pod above that limit can be evicted.

Depending on your pod priority and preemption settings, lower-priority pods might be removed despite their pod disruption budget requirements.

Specifying the number of pods that must be up with pod disruption budgets

You can use a PodDisruptionBudget object to specify the minimum number or percentage of replicas that must be up at a time.

Procedure

To configure a pod disruption budget:

  1. Create a YAML file with the an object definition similar to the following:

    1. apiVersion: policy/v1beta1 (1)
    2. kind: PodDisruptionBudget
    3. metadata:
    4. name: my-pdb
    5. spec:
    6. minAvailable: 2 (2)
    7. selector: (3)
    8. matchLabels:
    9. foo: bar
    1PodDisruptionBudget is part of the policy/v1beta1 API group.
    2The minimum number of pods that must be available simultaneously. This can be either an integer or a string specifying a percentage, for example, 20%.
    3A label query over a set of resources. The result of matchLabels and matchExpressions are logically conjoined.

    Or:

    1. apiVersion: policy/v1beta1 (1)
    2. kind: PodDisruptionBudget
    3. metadata:
    4. name: my-pdb
    5. spec:
    6. maxUnavailable: 25% (2)
    7. selector: (3)
    8. matchLabels:
    9. foo: bar
    1PodDisruptionBudget is part of the policy/v1beta1 API group.
    2The maximum number of pods that can be unavailable simultaneously. This can be either an integer or a string specifying a percentage, for example, 20%.
    3A label query over a set of resources. The result of matchLabels and matchExpressions are logically conjoined.
  2. Run the following command to add the object to project:

    1. $ oc create -f </path/to/file> -n <project_name>

Preventing pod removal using critical pods

There are a number of core components that are critical to a fully functional cluster, but, run on a regular cluster node rather than the master. A cluster might stop working properly if a critical add-on is evicted.

Pods marked as critical are not allowed to be evicted.

Procedure

To make a pod critical:

  1. Create a Pod spec or edit existing pods to include the system-cluster-critical priority class:

    1. spec:
    2. template:
    3. metadata:
    4. name: critical-pod
    5. priorityClassName: system-cluster-critical (1)
    1Default priority class for pods that should never be evicted from a node.

    Alternatively, you can specify system-node-critical for pods that are important to the cluster but can be removed if necessary.

  2. Create the pod:

    1. $ oc create -f <file-name>.yaml