Manage Node-Group on GCP GKE

The following is an example to replace cluster nodes with new storage size.

Storage Expansion

GKE supports adding additional disk with local-ssd-count. However, each local SSD is fixed size to 375 GB. We suggest expanding the node size via node pool replacement.

  1. In Longhorn, set replica-replenishment-wait-interval to 0.

  2. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool.

    1. GKE_NODEPOOL_NAME_NEW=<new-nodepool-name>
    2. GKE_REGION=<gke-region>
    3. GKE_CLUSTER_NAME=<gke-cluster-name>
    4. GKE_IMAGE_TYPE=Ubuntu
    5. GKE_MACHINE_TYPE=<gcp-machine-type>
    6. GKE_DISK_SIZE_NEW=<new-disk-size-in-gb>
    7. GKE_NODE_NUM=<number-of-nodes>
    8. gcloud container node-pools create ${GKE_NODEPOOL_NAME_NEW} \
    9. --region ${GKE_REGION} \
    10. --cluster ${GKE_CLUSTER_NAME} \
    11. --image-type ${GKE_IMAGE_TYPE} \
    12. --machine-type ${GKE_MACHINE_TYPE} \
    13. --disk-size ${GKE_DISK_SIZE_NEW} \
    14. --num-nodes ${GKE_NODE_NUM}
    15. gcloud container node-pools list \
    16. --zone ${GKE_REGION} \
    17. --cluster ${GKE_CLUSTER_NAME}
  3. Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool.

  4. Cordon and drain Kubernetes nodes in the old node-pool.

    1. GKE_NODEPOOL_NAME_OLD=<old-nodepool-name>
    2. for n in `kubectl get nodes | grep ${GKE_CLUSTER_NAME}-${GKE_NODEPOOL_NAME_OLD}- | awk '{print $1}'`; do
    3. kubectl cordon $n && \
    4. kubectl drain $n --ignore-daemonsets --delete-emptydir-data
    5. done
  5. Delete old node-pool.

    1. gcloud container node-pools delete ${GKE_NODEPOOL_NAME_OLD}\
    2. --zone ${GKE_REGION} \
    3. --cluster ${GKE_CLUSTER_NAME}

