Node Maintenance Guide
This section describes how to handle planned maintenance of nodes.
Updating the Node OS or Container Runtime
Cordon the node. Longhorn will automatically disable the node scheduling when a Kubernetes node is cordoned.
Drain the node to move the workload to somewhere else.
You will need to use
--ignore-daemonsets
and--pod-selector='app!=csi-attacher,app!=csi-provisioner,app!=longhorn-admission-webhook,app!=longhorn-conversion-webhook,app!=longhorn-driver-deployer'
options to drain the node. The--ignore-daemonsets
is needed because Longhorn deployed some daemonsets such asLonghorn manager
,Longhorn CSI plugin
,engine image
. The--pod-selector='app!=csi-attacher,app!=csi-provisioner,app!=longhorn-admission-webhook,app!=longhorn-conversion-webhook,app!=longhorn-driver-deployer'
is needed so that Longhorn can properly detaches Longhorn volumes (see the GitHub issue for more detail).The replica processes on the node will be stopped at this stage. Replicas on the node will be shown as
Failed
.Note: By default, if there is one last healthy replica for a volume on
the node, Longhorn will prevent the node from completing the drain
operation, to protect the last replica and prevent the disruption of the
workload. You can either override the behavior in the setting, or evict
the replica to other nodes before draining.
The engine processes on the node will be migrated with the Pod to other nodes.
Note: If there are volumes not created by Kubernetes on the node,
Longhorn will prevent the node from completing the drain operation, to
prevent the potential workload disruption.
After the
drain
is completed, there should be no engine or replica process running on the node. Two instance managers will still be running on the node, but they’re stateless and won’t cause interruption to the existing workload.Note: Normally you don't need to evict the replicas before the drain
operation, as long as you have healthy replicas on other nodes. The replicas
can be reused later, once the node back online and uncordoned.
Perform the necessary maintenance, including shutting down or rebooting the node.
Uncordon the node. Longhorn will automatically re-enable the node scheduling.
If there are existing replicas on the node, Longhorn might use those replicas to speed up the rebuilding process. You can set the
Replica Replenishment Wait Interval
setting to customize how long Longhorn should wait for potentially reusable replica to be available.
Updating Kubernetes
Follow the official Kubernetes upgrade documentation.
- If Longhorn is installed as a Rancher catalog app, follow Rancher’s Kubernetes upgrade guide to upgrade Kubernetes.
Removing a Disk
To remove a disk:
- Disable the disk scheduling.
- Evict all the replicas on the disk.
- Delete the disk.
Reusing the Node Name
These steps also apply if you’ve replaced a node using the same node name. Longhorn will recognize that the disks are different once the new node is up. You will need to remove the original disks first and add them back for the new node if it uses the same name as the previous node.
Removing a Node
To remove a node:
Disable the disk scheduling.
Evict all the replicas on the node.
Detach all the volumes on the node.
If the node has been drained, all the workloads should be migrated to another node already.
If there are any other volumes remaining attached, detach them before continuing.
Remove the node from Longhorn using the
Delete
in theNode
tab.Or, remove the node from Kubernetes, using:
kubectl delete node <node-name>
Longhorn will automatically remove the node from the cluster.