Managing graceful node shutdown
- About graceful node shutdown
- Configuring graceful node shutdown

Managing graceful node shutdown

Graceful node shutdown enables the kubelet to delay forcible eviction of pods during a node shutdown. When you configure a graceful node shutdown, you can define a time period for pods to complete running workloads before shutting down. This grace period minimizes interruption to critical workloads during unexpected node shutdown events. Using priority classes, you can also specify the order of pod shutdown.

About graceful node shutdown

During a graceful node shutdown, the kubelet sends a termination signal to pods running on the node and postpones the node shutdown until all the pods evicted. If a node unexpectedly shuts down, the graceful node shutdown feature minimizes interruption to workloads running on these pods.

During a graceful node shutdown, the kubelet stops pods in two phases:

Regular pod termination
Critical pod termination

You can define shutdown grace periods for regular and critical pods by configuring the following specifications in the KubeletConfig custom resource:

shutdownGracePeriod: Specifies the total duration for pod termination for regular and critical pods.
shutdownGracePeriodCriticalPods: Specifies the duration for critical pod termination. This value must be less than the shutdownGracePeriod value.

For example, if the shutdownGracePeriod value is 30s, and the shutdownGracePeriodCriticalPods value is 10s, the kubelet delays the node shutdown by 30 seconds. During the shutdown, the first 20 (30-10) seconds are reserved for gracefully shutting down regular pods, and the last 10 seconds are reserved for gracefully shutting down critical pods.

To define a critical pod, assign a pod priority value greater than or equal to 2000000000. To define a regular pod, assign a pod priority value of less than 2000000000.

For more information about how to define a priority value for pods, see the Additional resources section.

Configuring graceful node shutdown

To configure graceful node shutdown, create a KubeletConfig custom resource (CR) to specify a shutdown grace period for pods on a set of nodes. The graceful node shutdown feature minimizes interruption to workloads that run on these pods.

If you do not configure graceful node shutdown, the default grace period is 0 and the pod is forcefully evicted from the node.

Prerequisites

You have access to the cluster with the cluster-admin role.
You have defined priority classes for pods that require critical or regular classification.

Procedure

Define shutdown grace periods in the KubeletConfig CR by saving the following YAML in the kubelet-gns.yaml file:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: graceful-shutdown
  namespace: openshift-machine-config-operator
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: "" (1)
  kubeletConfig:
    shutdownGracePeriod: "3m" (2)
    shutdownGracePeriodCriticalPods: "2m" (3)

1	This example applies shutdown grace periods to nodes with the `worker` role.
2	Define a time period for regular pods to shut down.
3	Define a time period for critical pods to shut down.

Create the KubeletConfig CR by running the following command:

$ oc create -f kubelet-gns.yaml

Example output

kubeletconfig.machineconfiguration.openshift.io/graceful-shutdown created

Verification

View the kubelet logs for a node to verify the grace period configuration by using the command line or by viewing the kubelet.conf file.

Ensure that the log messages for shutdownGracePeriodRequested and shutdownGracePeriodCriticalPods match the values set in the KubeletConfig CR.

To view the logs by using the command line, run the following command, replacing <node_name> with the name of the node:

$ oc adm node-logs <node_name> -u kubelet

Example output

Sep 12 22:13:46
ci-ln-qv5pvzk-72292-xvkd9-worker-a-dmbr4
hyperkube[22317]: I0912 22:13:46.687472
22317 nodeshutdown_manager_linux.go:134]
"Creating node shutdown manager"
shutdownGracePeriodRequested="3m0s" (1)
shutdownGracePeriodCriticalPods="2m0s"
shutdownGracePeriodByPodPriority=[
{Priority:0
ShutdownGracePeriodSeconds:1200}
{Priority:2000000000
ShutdownGracePeriodSeconds:600}]
...

1	Ensure that the log messages for `shutdownGracePeriodRequested` and `shutdownGracePeriodCriticalPods` match the values set in the `KubeletConfig` CR.

To view the logs in the kubelet.conf file on a node, run the following commands to enter a debug session on the node:

$ oc debug node/<node_name>

$ chroot /host

$ cat /etc/kubernetes/kubelet.conf

Example output

...
“memorySwap”: {},
 “containerLogMaxSize”: “50Mi”,
 “logging”: {
  “flushFrequency”: 0,
  “verbosity”: 0,
  “options”: {
   “json”: {
    “infoBufferSize”: “0”
   }
  }
 },
 “shutdownGracePeriod”: “10m0s”, (1)
 “shutdownGracePeriodCriticalPods”: “3m0s”
}

1	Ensure that the log messages for `shutdownGracePeriodRequested` and `shutdownGracePeriodCriticalPods` match the values set in the `KubeletConfig` CR.

During a graceful node shutdown, you can verify that a pod was gracefully shut down by running the following command, replacing <pod_name> with the name of the pod:
```
$ oc describe pod <pod_name>
```
Example output
```
Reason:         Terminated
Message:        Pod was terminated in response to imminent node shutdown.
```

Additional resources

Understanding pod priority