Setting a node to maintenance mode

Setting a node to maintenance mode

Place a node into maintenance from the web console, CLI, or using a NodeMaintenance custom resource.

Setting a node to maintenance mode in the web console

Set a node to maintenance mode using the Options menu found on each node in the Compute → Nodes list, or using the Actions control of the Node Details screen.

Procedure

In the OKD Virtualization console, click Compute → Nodes.
You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes in the one screen or from the Node Details screen where you can view comprehensive details of the selected node:
- Click the Options menu at the end of the node and select Start Maintenance.
- Click the node name to open the Node Details screen and click Actions → Start Maintenance.
Click Start Maintenance in the confirmation window.

The node will live migrate virtual machine instances that have the LiveMigration eviction strategy, and the node is no longer schedulable. All other pods and virtual machines on the node are deleted and recreated on another node.

Setting a node to maintenance mode in the CLI

Set a node to maintenance mode by marking it as unschedulable and using the oc adm drain command to evict or delete pods from the node.

Procedure

Mark the node as unschedulable. The node status changes to NotReady,SchedulingDisabled.
```
$ oc adm cordon <node1>
```
Drain the node in preparation for maintenance. The node live migrates virtual machine instances that have the LiveMigratable condition set to True and the spec:evictionStrategy field set to LiveMigrate. All other pods and virtual machines on the node are deleted and recreated on another node.
```
$ oc adm drain <node1> --delete-local-data --ignore-daemonsets=true --force
```
- The --delete-local-data flag removes any virtual machine instances on the node that use emptyDir volumes. Data in these volumes is ephemeral and is safe to be deleted after termination.
- The --ignore-daemonsets=true flag ensures that daemon sets are ignored and pod eviction can continue successfully.
- The --force flag is required to delete pods that are not managed by a replica set or daemon set controller.

Setting a node to maintenance mode with a NodeMaintenance custom resource

You can put a node into maintenance mode with a NodeMaintenance custom resource (CR). When you apply a NodeMaintenance CR, all allowed pods are evicted and the node is shut down. Evicted pods are queued to be moved to another node in the cluster.

Prerequisites

Install the OKD CLI oc.
Log in to the cluster as a user with cluster-admin privileges.

Procedure

Create the following node maintenance CR, and save the file as nodemaintenance-cr.yaml:
```
apiVersion: nodemaintenance.kubevirt.io/v1beta1
kind: NodeMaintenance
metadata:
  name: maintenance-example  (1)
spec:
  nodeName: node-1.example.com (2)
  reason: "Node maintenance" (3)
```
1 Node maintenance CR name
2 The name of the node to be put into maintenance mode
3 Plain text description of the reason for maintenance
Apply the node maintenance schedule by running the following command:
```
$ oc apply -f nodemaintenance-cr.yaml
```

Check the progress of the maintenance task by running the following command, replacing <node-name> with the name of your node:

$ oc describe node <node-name>

Example output

Events:
  Type     Reason                     Age                   From     Message
  ----     ------                     ----                  ----     -------
  Normal   NodeNotSchedulable         61m                   kubelet  Node node-1.example.com status is now: NodeNotSchedulable

Checking status of current NodeMaintenance CR tasks

You can check the status of current NodeMaintenance CR tasks.

Prerequisites

Install the OKD CLI oc.
Log in as a user with cluster-admin privileges.

Procedure

Check the status of current node maintenance tasks by running the following command:

$ oc get NodeMaintenance -o yaml

Example output

apiVersion: v1
items:
- apiVersion: nodemaintenance.kubevirt.io/v1beta1
  kind: NodeMaintenance
  metadata:
...
  spec:
    nodeName: node-1.example.com
    reason: Node maintenance
  status:
    evictionPods: 3   (1)
    pendingPods:
    - pod-example-workload-0
    - httpd
    - httpd-manual
    phase: Running
    lastError: "Last failure message" (2)
    totalpods: 5
...

1	`evictionPods` is the number of pods scheduled for eviction.
2	`lastError` records the latest eviction error, if any.

Additional resources:

1	Node maintenance CR name
2	The name of the node to be put into maintenance mode
3	Plain text description of the reason for maintenance