Cluster Shutdown and Restart
This document describes the process of gracefully shutting down your cluster and how to restart it. You might need to temporarily shut down your cluster for maintenance reasons.
Warning
Shutting down a cluster is very dangerous. You must fully understand the operation and its consequences. Please make an etcd backup before you proceed. Usually, it is recommended to maintain your nodes one by one instead of restarting the whole cluster.
Prerequisites
- Take an etcd backup prior to shutting down a cluster.
- SSH passwordless login is set up between hosts.
Shutting Down Cluster
Tip
- You must back up your etcd data before you shut down the cluster as your cluster can be restored if you encounter any issues when restarting the cluster.
- Using the method in this tutorial can shut down a cluster gracefully, while the possibility of data corruption still exists.
Step 1: Get Node List
nodes=$(kubectl get nodes -o name)
Step 2: Shut Down All Nodes
for node in ${nodes[@]}
do
echo "==== Shut down $node ===="
ssh $node sudo shutdown -h 1
done
Then you can shut down other cluster dependencies, such as external storage.
Restart Cluster Gracefully
You can restart a cluster gracefully after shutting down the cluster gracefully.
Prerequisites
You have shut down your cluster gracefully.
Tip
Usually, a cluster can be used after restarting, but the cluster may be unavailable due to unexpected conditions. For example:
- Etcd data corruption during the shutdown.
- Node failures.
- Unexpected network errors.
Step 1: Check All Cluster Dependencies’ Status
Ensure all cluster dependencies are ready, such as external storage.
Step 2: Power on Cluster Machines
Wait for the cluster to be up and running, which may take about 10 minutes.
Step 3: Check All Master Nodes’ Status
Check the status of core components, such as etcd services, and make sure everything is ready.
kubectl get nodes -l node-role.kubernetes.io/master
Step 4: Check All Worker Nodes’ Status
kubectl get nodes -l node-role.kubernetes.io/worker
If your cluster fails to restart, please try to restore the etcd cluster.