Upgrade from v1.3.1 to v1.3.2

General information

An Upgrade button appears on the Dashboard screen whenever a new Harvester version that you can upgrade to becomes available. For more information, see Start an upgrade.

For air-gapped environments, see Prepare an air-gapped upgrade.

Known issues


1. Two-node cluster upgrade stuck after the first node is pre-drained

Upgrade from v1.3.1 to v1.3.2 - 图1important

Shut down all workload VMs before upgrading two-node clusters to prevent data loss.

The worker node can falsely transition to a not-ready state when RKE2 is upgraded on the management node. Consequently, the existing pods on the worker node are evicted and new pods cannot be scheduled on any nodes. These ultimately cause a chained failure in the whole cluster and prevent completion of the upgrade process.

Check the cluster status when the following occur:

  • The upgrade process becomes stuck for some time.
  • You are unable to access the Harvester UI and receive an HTTP 503 error.
  1. Check the conditions and node statuses of the latest Upgrade custom resource.

    Proceed to the next step if the following conditions are met:

    • SystemServicesUpgraded is set to True, indicating that the system services upgrade is completed.
    • In nodeStatuses, the state of the management node is either Pre-drained or Waiting Reboot.
    • In nodeStatuses, the state of the worker node is Images preloaded.

    Example:

    1. # Find out the latest Upgrade custom resource
    2. $ kubectl -n harvester-system get upgrades.harvesterhci -l harvesterhci.io/latestUpgrade=true
    3. NAME AGE
    4. hvst-upgrade-szlg8 48m
    5. # Check the conditions and node statuses
    6. $ kubectl -n harvester-system get upgrades hvst-upgrade-szlg8 -o yaml
    7. apiVersion: harvesterhci.io/v1beta1
    8. kind: Upgrade
    9. metadata:
    10. ...
    11. labels:
    12. harvesterhci.io/latestUpgrade: "true"
    13. harvesterhci.io/upgradeState: UpgradingNodes
    14. name: hvst-upgrade-szlg8
    15. namespace: harvester-system
    16. ...
    17. spec:
    18. image: ""
    19. logEnabled: false
    20. version: v1.3.2-rc2
    21. status:
    22. conditions:
    23. - status: Unknown
    24. type: Completed
    25. - lastUpdateTime: "2024-09-02T11:57:04Z"
    26. message: Upgrade observability is administratively disabled
    27. reason: Disabled
    28. status: "False"
    29. type: LogReady
    30. - lastUpdateTime: "2024-09-02T11:58:01Z"
    31. status: "True"
    32. type: ImageReady
    33. - lastUpdateTime: "2024-09-02T12:02:31Z"
    34. status: "True"
    35. type: RepoReady
    36. - lastUpdateTime: "2024-09-02T12:18:44Z"
    37. status: "True"
    38. type: NodesPrepared
    39. - lastUpdateTime: "2024-09-02T12:31:25Z"
    40. status: "True"
    41. type: SystemServicesUpgraded
    42. - status: Unknown
    43. type: NodesUpgraded
    44. imageID: harvester-system/hvst-upgrade-szlg8
    45. nodeStatuses:
    46. harvester-c6phd:
    47. state: Pre-drained
    48. harvester-jkqhq:
    49. state: Images preloaded
    50. previousVersion: v1.3.1
    51. ...
  2. Check the node status.

    Proceed to the next step if the following conditions are met:

    • The status of the worker node is NotReady.
    • The status of the management node is Ready,SchedulingDisabled.

    Example:

    1. $ kubectl get nodes
    2. NAME STATUS ROLES AGE VERSION
    3. harvester-c6phd Ready,SchedulingDisabled control-plane,etcd,master 174m v1.28.12+rke2r1
    4. harvester-jkqhq NotReady <none> 166m v1.27.13+rke2r1
  3. Check the pods on the worker node.

    The issue exists in the cluster if the status of most pods is Terminating.

    Example:

    1. # Assume harvester-jkqhq is the worker node
    2. $ kubectl get pods -A --field-selector spec.nodeName=harvester-jkqhq
    3. NAMESPACE NAME READY STATUS RESTARTS AGE
    4. cattle-fleet-local-system fleet-agent-6779fb5dd9-dkpjz 1/1 Terminating 0 18m
    5. cattle-fleet-system fleet-agent-86db8d9954-qgcpq 1/1 Terminating 2 (18m ago) 61m
    6. cattle-fleet-system fleet-controller-696d4b8878-ddctd 1/1 Terminating 1 (19m ago) 29m
    7. cattle-fleet-system gitjob-694dd97686-s4z68 1/1 Terminating 1 (19m ago) 29m
    8. cattle-provisioning-capi-system capi-controller-manager-6f497d5574-wkrnf 1/1 Terminating 0 20m
    9. cattle-system cattle-cluster-agent-76db9cf9fc-5hhsx 1/1 Terminating 0 20m
    10. cattle-system cattle-cluster-agent-76db9cf9fc-dnr6m 1/1 Terminating 0 20m
    11. cattle-system harvester-cluster-repo-7458c7c69d-p982g 1/1 Terminating 0 27m
    12. cattle-system rancher-7d65df9bd4-77n7w 1/1 Terminating 0 31m
    13. cattle-system rancher-webhook-cfc66d5d7-fd6gm 1/1 Terminating 0 28m
    14. harvester-system harvester-85ff674986-wxkl4 1/1 Terminating 0 26m
    15. harvester-system harvester-load-balancer-54cd9754dc-cwtxg 1/1 Terminating 0 20m
    16. harvester-system harvester-load-balancer-webhook-c8699b786-x6clw 1/1 Terminating 0 20m
    17. harvester-system harvester-network-controller-manager-b69bf6b69-9f99x 1/1 Terminating 0 178m
    18. harvester-system harvester-network-controller-vs4jg 1/1 Running 0 178m
    19. harvester-system harvester-network-webhook-7b98f8cd98-gjl8b 1/1 Terminating 0 20m
    20. harvester-system harvester-node-disk-manager-tbh4b 1/1 Running 0 26m
    21. harvester-system harvester-node-manager-7pqcp 1/1 Running 0 178m
    22. harvester-system harvester-node-manager-webhook-9cfccc84c-68tgp 1/1 Running 0 20m
    23. harvester-system harvester-node-manager-webhook-9cfccc84c-6bbvg 1/1 Running 0 20m
    24. harvester-system harvester-webhook-565dc698b6-np89r 1/1 Terminating 0 26m
    25. harvester-system hvst-upgrade-szlg8-apply-manifests-4rmjw 0/1 Completed 0 33m
    26. harvester-system virt-api-6fb7d97b68-cbc5m 1/1 Terminating 0 20m
    27. harvester-system virt-api-6fb7d97b68-gqg5c 1/1 Terminating 0 23m
    28. harvester-system virt-controller-67d8b4c75c-5qz9x 1/1 Terminating 0 24m
    29. harvester-system virt-controller-67d8b4c75c-bdf8w 1/1 Terminating 2 (18m ago) 23m
    30. harvester-system virt-handler-xw98h 1/1 Running 0 24m
    31. harvester-system virt-operator-6c98db546-brgnx 1/1 Terminating 2 (18m ago) 26m
    32. kube-system harvester-snapshot-validation-webhook-b75f94bcb-95zlb 1/1 Terminating 0 20m
    33. kube-system harvester-snapshot-validation-webhook-b75f94bcb-xfrmf 1/1 Terminating 0 20m
    34. kube-system harvester-whereabouts-tdr5g 1/1 Running 1 (178m ago) 178m
    35. kube-system helm-install-rke2-ingress-nginx-4wt4j 0/1 Terminating 0 15m
    36. kube-system helm-install-rke2-metrics-server-jn58m 0/1 Terminating 0 15m
    37. kube-system kube-proxy-harvester-jkqhq 1/1 Running 0 178m
    38. kube-system rke2-canal-wfpch 2/2 Running 0 178m
    39. kube-system rke2-coredns-rke2-coredns-864fbd7785-t7k6t 1/1 Terminating 0 178m
    40. kube-system rke2-coredns-rke2-coredns-autoscaler-6c87968579-rg6g4 1/1 Terminating 0 20m
    41. kube-system rke2-ingress-nginx-controller-d4h25 1/1 Running 0 178m
    42. kube-system rke2-metrics-server-7f745dbddf-2mp5j 1/1 Terminating 0 20m
    43. kube-system rke2-multus-fsp94 1/1 Running 0 178m
    44. kube-system snapshot-controller-65d5f465d9-5b2sb 1/1 Terminating 0 20m
    45. kube-system snapshot-controller-65d5f465d9-c264r 1/1 Terminating 0 20m
    46. longhorn-system backing-image-manager-c16a-7c90 1/1 Terminating 0 54m
    47. longhorn-system csi-attacher-5fbd66cf8-674vc 1/1 Terminating 0 20m
    48. longhorn-system csi-attacher-5fbd66cf8-725mn 1/1 Terminating 0 20m
    49. longhorn-system csi-attacher-5fbd66cf8-85k5d 1/1 Terminating 0 20m
    50. longhorn-system csi-provisioner-5b6ff8f4d4-97wsf 1/1 Terminating 0 20m
    51. longhorn-system csi-provisioner-5b6ff8f4d4-cbpm9 1/1 Terminating 0 20m
    52. longhorn-system csi-provisioner-5b6ff8f4d4-q7z58 1/1 Terminating 0 19m
    53. longhorn-system csi-resizer-74c5555748-6rmbf 1/1 Terminating 0 20m
    54. longhorn-system csi-resizer-74c5555748-fw2cw 1/1 Terminating 0 20m
    55. longhorn-system csi-resizer-74c5555748-p4nph 1/1 Terminating 0 20m
    56. longhorn-system csi-snapshotter-6bc4bcf4c5-6858b 1/1 Terminating 0 20m
    57. longhorn-system csi-snapshotter-6bc4bcf4c5-cqkbw 1/1 Terminating 0 20m
    58. longhorn-system csi-snapshotter-6bc4bcf4c5-mkqtg 1/1 Terminating 0 20m
    59. longhorn-system engine-image-ei-b0369a5d-2t4k4 1/1 Running 0 178m
    60. longhorn-system instance-manager-a5bd20597b82bcf3ba9d314620b7e670 1/1 Terminating 0 178m
    61. longhorn-system longhorn-csi-plugin-x6bdg 3/3 Running 0 178m
    62. longhorn-system longhorn-driver-deployer-85cf4b4849-5lc52 1/1 Terminating 0 20m
    63. longhorn-system longhorn-loop-device-cleaner-hhvgv 1/1 Running 0 178m
    64. longhorn-system longhorn-manager-5h2zw 1/1 Running 0 178m
    65. longhorn-system longhorn-ui-6b677889f8-hrg8j 1/1 Terminating 0 20m
    66. longhorn-system longhorn-ui-6b677889f8-w5hng 1/1 Terminating 0 20m

To resolve the issue, you must restart the rke2-agent service on the worker node.

  1. # On the worker node
  2. sudo systemctl restart rke2-agent.service

The upgrade should resume after the rke2-agent service is fully restarted.

Upgrade from v1.3.1 to v1.3.2 - 图2note

This issue occurs because the agent load balancer on the worker node is unable to connect to the API server on the management node after the rke2-server service is restarted. Because the rke2-server service can be restarted multiple times when nodes are upgraded, the upgrade process is likely to become stuck again. You may need to restart the rke2-agent service multiple times.

To determine if the agent load balancer is functioning, run the following commands:

  1. # On the management node, check if the `rke2-server` service is running.
  2. sudo systemctl status rke2-server.service
  3. # On the worker node, check if the agent load balancer is functioning.
  4. sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig get nodes

If the kubectl command does not return a response, the kubelet is unable to access the API server via the agent load balancer. You must restart the rke2-agent service.

For more information, see Issue #6432.


2. Automatic image cleanup is not functioning

Because the published Harvester ISO contains an incomplete image list, automatic image cleanup cannot be performed during an upgrade from v1.3.1 to v1.3.2. This issue does not block the upgrade, and you can use this script to manually clean up container images after the upgrade is completed. For more information, see issue #6620.


3. The upgrade process becomes stuck in the “Pre-draining” state.

A virtual machine with a container disk cannot be migrated because of a limitation of the Live Migration feature. This causes the upgrade process to become stuck in the “Pre-draining” state.

Upgrade from v1.3.1 to v1.3.2 - 图3tip

Manually stop the virtual machines to continue the upgrade process.

For more information, see Issue #7005.