Setting a node to maintenance mode
- Setting a node to maintenance mode in the web console
- Setting a node to maintenance mode in the CLI
- Setting a node to maintenance mode with a NodeMaintenance custom resource
Place a node into maintenance from the web console, CLI, or using a NodeMaintenance
custom resource.
Setting a node to maintenance mode in the web console
Set a node to maintenance mode using the Options menu found on each node in the Compute → Nodes list, or using the Actions control of the Node Details screen.
Procedure
In the OKD Virtualization console, click Compute → Nodes.
You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes in the one screen or from the Node Details screen where you can view comprehensive details of the selected node:
Click the Options menu at the end of the node and select Start Maintenance.
Click the node name to open the Node Details screen and click Actions → Start Maintenance.
Click Start Maintenance in the confirmation window.
The node will live migrate virtual machine instances that have the LiveMigration
eviction strategy, and the node is no longer schedulable. All other pods and virtual machines on the node are deleted and recreated on another node.
Setting a node to maintenance mode in the CLI
Set a node to maintenance mode by marking it as unschedulable and using the oc adm drain
command to evict or delete pods from the node.
Procedure
Mark the node as unschedulable. The node status changes to
NotReady,SchedulingDisabled
.$ oc adm cordon <node1>
Drain the node in preparation for maintenance. The node live migrates virtual machine instances that have the
LiveMigratable
condition set toTrue
and thespec:evictionStrategy
field set toLiveMigrate
. All other pods and virtual machines on the node are deleted and recreated on another node.$ oc adm drain <node1> --delete-local-data --ignore-daemonsets=true --force
The
--delete-local-data
flag removes any virtual machine instances on the node that useemptyDir
volumes. Data in these volumes is ephemeral and is safe to be deleted after termination.The
--ignore-daemonsets=true
flag ensures that daemon sets are ignored and pod eviction can continue successfully.The
--force
flag is required to delete pods that are not managed by a replica set or daemon set controller.
Setting a node to maintenance mode with a NodeMaintenance custom resource
You can put a node into maintenance mode with a NodeMaintenance
custom resource (CR). When you apply a NodeMaintenance
CR, all allowed pods are evicted and the node is shut down. Evicted pods are queued to be moved to another node in the cluster.
Prerequisites
Install the OKD CLI
oc
.Log in to the cluster as a user with
cluster-admin
privileges.
Procedure
Create the following node maintenance CR, and save the file as
nodemaintenance-cr.yaml
:apiVersion: nodemaintenance.kubevirt.io/v1beta1
kind: NodeMaintenance
metadata:
name: maintenance-example (1)
spec:
nodeName: node-1.example.com (2)
reason: "Node maintenance" (3)
1 Node maintenance CR name 2 The name of the node to be put into maintenance mode 3 Plain text description of the reason for maintenance Apply the node maintenance schedule by running the following command:
$ oc apply -f nodemaintenance-cr.yaml
Check the progress of the maintenance task by running the following command, replacing
<node-name>
with the name of your node:$ oc describe node <node-name>
Example output
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeNotSchedulable 61m kubelet Node node-1.example.com status is now: NodeNotSchedulable
Checking status of current NodeMaintenance CR tasks
You can check the status of current NodeMaintenance
CR tasks.
Prerequisites
Install the OKD CLI
oc
.Log in as a user with
cluster-admin
privileges.
Procedure
Check the status of current node maintenance tasks by running the following command:
$ oc get NodeMaintenance -o yaml
Example output
apiVersion: v1
items:
- apiVersion: nodemaintenance.kubevirt.io/v1beta1
kind: NodeMaintenance
metadata:
...
spec:
nodeName: node-1.example.com
reason: Node maintenance
status:
evictionPods: 3 (1)
pendingPods:
- pod-example-workload-0
- httpd
- httpd-manual
phase: Running
lastError: "Last failure message" (2)
totalpods: 5
...
1 evictionPods
is the number of pods scheduled for eviction.2 lastError
records the latest eviction error, if any.
Additional resources: