升级 kubeadm 集群

本页介绍了如何将 kubeadm 创建的 Kubernetes 集群从 1.16.x 版本升级到 1.17.x 版本,以及从版本 1.17.x 升级到 1.17.y ,其中 y > x

要查看 kubeadm 创建的有关旧版本集群升级的信息,请参考以下页面:

高版本升级工作流如下:

  1. 升级主控制平面节点。
  2. 升级其他控制平面节点。
  3. 升级工作节点。

准备开始

  • 您需要有一个由 kubeadm 创建并运行着 1.16.0 或更高版本的 Kubernetes 集群。
  • 禁用 Swap
  • 集群应使用静态的控制平面和 etcd pod 或者 外部 etcd。
  • 务必仔细认真阅读发行说明
  • 务必备份所有重要组件,例如存储在数据库中应用层面的状态。 kubeadm upgrade 不会影响您的工作负载,只会涉及 Kubernetes 内部的组件,但备份终究是好的。

附加信息

  • 升级后,因为容器 spec 哈希值已更改,所以所有容器都会重新启动。
  • 您只能从一个次版本升级到下一个次版本,或者同样次版本的补丁版。也就是说,升级时无法跳过版本。 例如,您只能从 1.y 升级到 1.y+1,而不能从 from 1.y 升级到 1.y+2。

确定要升级到哪个版本

  1. 找到最新的稳定版 1.17:
  1. apt update
  2. apt-cache policy kubeadm
  3. # 在列表中查找最新的 1.17 版本
  4. # 它看起来应该是 1.17.x-00 ,其中 x 是最新的补丁
  1. yum list --showduplicates kubeadm --disableexcludes=kubernetes
  2. # 在列表中查找最新的 1.17 版本
  3. # 它看起来应该是 1.17.x-0 ,其中 x 是最新的补丁

升级第一个控制平面节点

  1. 在第一个控制平面节点上,升级 kubeadm :
  1. # 用最新的修补程序版本替换 1.17.x-00 中的 x
  2. apt-mark unhold kubeadm && \
  3. apt-get update && apt-get install -y kubeadm=1.17.x-00 && \
  4. apt-mark hold kubeadm
  1. # 用最新的修补程序版本替换 1.17.x-0 中的 x
  2. yum install -y kubeadm-1.17.x-0 --disableexcludes=kubernetes
  1. 验证 kubeadm 版本:

    1. kubeadm version
  1. 腾空控制平面节点:

    1. kubectl drain $CP_NODE --ignore-daemonsets
  1. 在主节点上,运行:

    1. sudo kubeadm upgrade plan

    您应该可以看到与下面类似的输出:

    1. [preflight] Running pre-flight checks.
    2. [upgrade] Making sure the cluster is healthy:
    3. [upgrade/config] Making sure the configuration is correct:
    4. [upgrade/config] Reading configuration from the cluster...
    5. [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    6. [upgrade] Fetching available versions to upgrade to
    7. [upgrade/versions] Cluster version: v1.16.0
    8. [upgrade/versions] kubeadm version: v1.17.0
    9. Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
    10. COMPONENT CURRENT AVAILABLE
    11. Kubelet 1 x v1.16.0 v1.17.0
    12. Upgrade to the latest version in the v1.13 series:
    13. COMPONENT CURRENT AVAILABLE
    14. API Server v1.16.0 v1.17.0
    15. Controller Manager v1.16.0 v1.17.0
    16. Scheduler v1.16.0 v1.17.0
    17. Kube Proxy v1.16.0 v1.17.0
    18. CoreDNS 1.6.2 1.6.5
    19. Etcd 3.3.15 3.4.3-0
    20. You can now apply the upgrade by executing the following command:
    21. kubeadm upgrade apply v1.17.0
    22. _____________________________________________________________________

    此命令检查您的集群是否可以升级,并可以获取到升级的版本。

  1. 选择要升级到的版本,然后运行相应的命令。例如:

    1. sudo kubeadm upgrade apply v1.17.x
    • x 替换为您为此升级选择的修补程序版本。

    您应该可以看见与下面类似的输出:

    1. [preflight] Running pre-flight checks.
    2. [upgrade] Making sure the cluster is healthy:
    3. [upgrade/config] Making sure the configuration is correct:
    4. [upgrade/config] Reading configuration from the cluster...
    5. [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    6. [upgrade/version] You have chosen to change the cluster version to "v1.17.0"
    7. [upgrade/versions] Cluster version: v1.16.0
    8. [upgrade/versions] kubeadm version: v1.17.0
    9. [upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
    10. [upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
    11. [upgrade/prepull] Prepulling image for component etcd.
    12. [upgrade/prepull] Prepulling image for component kube-scheduler.
    13. [upgrade/prepull] Prepulling image for component kube-apiserver.
    14. [upgrade/prepull] Prepulling image for component kube-controller-manager.
    15. [apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
    16. [apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
    17. [apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
    18. [apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
    19. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
    20. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
    21. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
    22. [apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
    23. [upgrade/prepull] Prepulled image for component etcd.
    24. [upgrade/prepull] Prepulled image for component kube-apiserver.
    25. [upgrade/prepull] Prepulled image for component kube-scheduler.
    26. [upgrade/prepull] Prepulled image for component kube-controller-manager.
    27. [upgrade/prepull] Successfully prepulled the images for all the control plane components
    28. [upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.17.0"...
    29. Static pod: kube-apiserver-myhost hash: 6436b0d8ee0136c9d9752971dda40400
    30. Static pod: kube-controller-manager-myhost hash: 8ee730c1a5607a87f35abb2183bf03f2
    31. Static pod: kube-scheduler-myhost hash: 4b52d75cab61380f07c0c5a69fb371d4
    32. [upgrade/etcd] Upgrading to TLS for etcd
    33. Static pod: etcd-myhost hash: 877025e7dd7adae8a04ee20ca4ecb239
    34. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-03-14-20-52-44/etcd.yaml"
    35. [upgrade/staticpods] Waiting for the kubelet to restart the component
    36. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    37. Static pod: etcd-myhost hash: 877025e7dd7adae8a04ee20ca4ecb239
    38. Static pod: etcd-myhost hash: 877025e7dd7adae8a04ee20ca4ecb239
    39. Static pod: etcd-myhost hash: 64a28f011070816f4beb07a9c96d73b6
    40. [apiclient] Found 1 Pods for label selector component=etcd
    41. [upgrade/staticpods] Component "etcd" upgraded successfully!
    42. [upgrade/etcd] Waiting for etcd to become available
    43. [upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests043818770"
    44. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-03-14-20-52-44/kube-apiserver.yaml"
    45. [upgrade/staticpods] Waiting for the kubelet to restart the component
    46. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    47. Static pod: kube-apiserver-myhost hash: 6436b0d8ee0136c9d9752971dda40400
    48. Static pod: kube-apiserver-myhost hash: 6436b0d8ee0136c9d9752971dda40400
    49. Static pod: kube-apiserver-myhost hash: 6436b0d8ee0136c9d9752971dda40400
    50. Static pod: kube-apiserver-myhost hash: b8a6533e241a8c6dab84d32bb708b8a1
    51. [apiclient] Found 1 Pods for label selector component=kube-apiserver
    52. [upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
    53. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-03-14-20-52-44/kube-controller-manager.yaml"
    54. [upgrade/staticpods] Waiting for the kubelet to restart the component
    55. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    56. Static pod: kube-controller-manager-myhost hash: 8ee730c1a5607a87f35abb2183bf03f2
    57. Static pod: kube-controller-manager-myhost hash: 6f77d441d2488efd9fc2d9a9987ad30b
    58. [apiclient] Found 1 Pods for label selector component=kube-controller-manager
    59. [upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
    60. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-03-14-20-52-44/kube-scheduler.yaml"
    61. [upgrade/staticpods] Waiting for the kubelet to restart the component
    62. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    63. Static pod: kube-scheduler-myhost hash: 4b52d75cab61380f07c0c5a69fb371d4
    64. Static pod: kube-scheduler-myhost hash: a24773c92bb69c3748fcce5e540b7574
    65. [apiclient] Found 1 Pods for label selector component=kube-scheduler
    66. [upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
    67. [upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    68. [kubelet] Creating a ConfigMap "kubelet-config-1.17" in namespace kube-system with the configuration for the kubelets in the cluster
    69. [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
    70. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    71. [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
    72. [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
    73. [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
    74. [addons] Applied essential addon: CoreDNS
    75. [addons] Applied essential addon: kube-proxy
    76. [upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.17.0". Enjoy!
    77. [upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
  1. 手动升级你的 CNI 供应商插件。

    您的容器网络接口(CNI)应该提供了程序自身的升级说明。 检查插件页面查找您 CNI 所提供的程序,并查看是否需要其他升级步骤。

    如果 CNI 提供程序作为 DaemonSet 运行,则在其他控制平面节点上不需要此步骤。

  1. 取消对控制面节点的保护

    1. kubectl uncordon $CP_NODE
  1. 升级控制平面节点上的 kubelet 和 kubectl :
  1. # 用最新的修补程序版本替换 1.17.x-00 中的 x
  2. apt-mark unhold kubelet kubectl && \
  3. apt-get update && apt-get install -y kubelet=1.17.x-00 kubectl=1.17.x-00 && \
  4. apt-mark hold kubelet kubectl
  1. # 用最新的修补程序版本替换 1.17.x-00 中的 x
  2. yum install -y kubelet-1.17.x-0 kubectl-1.17.x-0 --disableexcludes=kubernetes
  1. 重启 kubelet

    1. sudo systemctl restart kubelet

升级其他控制平面节点

  1. 与第一个控制平面节点相同,但使用:

    1. sudo kubeadm upgrade node experimental-control-plane

而不是:

  1. sudo kubeadm upgrade apply

也不需要 sudo kubeadm upgrade plan

升级工作节点

工作节点上的升级过程应该一次执行一个节点,或者一次执行几个节点,以不影响运行工作负载所需的最小容量。

升级 kubeadm

  1. 在所有工作节点升级 kubeadm :
  1. # 用最新的修补程序版本替换 1.17.x-00 中的 x
  2. apt-mark unhold kubeadm && \
  3. apt-get update && apt-get install -y kubeadm=1.17.x-00 && \
  4. apt-mark hold kubeadm
  1. # 用最新的修补程序版本替换 1.17.x-00 中的 x
  2. yum install -y kubeadm-1.17.x-0 --disableexcludes=kubernetes

保护节点

  1. 通过将节点标记为不可调度并逐出工作负载,为维护做好准备。运行:

    1. kubectl drain $NODE --ignore-daemonsets

    您应该可以看见与下面类似的输出:

    1. node/ip-172-31-85-18 cordoned
    2. WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-dj7d7, kube-system/weave-net-z65qx
    3. node/ip-172-31-85-18 drained

升级 kubelet 配置

  1. 升级 kubelet 配置:

    1. sudo kubeadm upgrade node config --kubelet-version v1.14.x

    用最新的修补程序版本替换 1.14.x-00 中的 x

升级 kubelet 与 kubectl

  1. 通过运行适用于您的 Linux 发行版包管理器升级 Kubernetes 软件包版本:
  1. # 用最新的修补程序版本替换 1.17.x-00 中的 xs
  2. apt-mark unhold kubelet kubectl && \
  3. apt-get update && apt-get install -y kubelet=1.17.x-00 kubectl=1.17.x-00 && \
  4. apt-mark hold kubelet kubectl
  1. # 用最新的修补程序版本替换 1.17.x-00 中的 x
  2. yum install -y kubelet-1.17.x-0 kubectl-1.17.x-0 --disableexcludes=kubernetes
  1. 重启 kubelet

    1. sudo systemctl restart kubelet

取消对节点的保护

  1. 通过将节点标记为可调度,让节点重新上线:

    1. kubectl uncordon $NODE

验证集群的状态

在所有节点上升级 kubelet 后,通过从 kubectl 可以访问集群的任何位置运行以下命令,验证所有节点是否再次可用:

  1. kubectl get nodes

STATUS 应显示所有节点为 Ready 状态,并且版本号已经被更新。

从故障状态恢复

如果 kubeadm upgrade 失败并且没有回滚,例如由于执行期间意外关闭,您可以再次运行 kubeadm upgrade。 此命令是幂等的,并最终确保实际状态是您声明的所需状态。 要从故障状态恢复,您还可以运行 kubeadm upgrade --force 而不去更改集群正在运行的版本。

它是怎么工作的

kubeadm upgrade apply 做了以下工作:

  • 检查您的集群是否处于可升级状态:
    • API 服务器是可访问的
    • 所有节点处于 Ready 状态
    • 控制平面是健康的
  • 强制执行版本 skew 策略。
  • 确保控制平面的镜像是可用的或可拉取到服务器上。
  • 升级控制平面组件或回滚(如果其中任何一个组件无法启动)。
  • 应用新的 kube-dnskube-proxy 清单,并强制创建所有必需的 RBAC 规则。
  • 如果旧文件在 180 天后过期,将创建 API 服务器的新证书和密钥文件并备份旧文件。

kubeadm upgrade node experimental-control-plane 在其他控制平面节点上执行以下操作: - 从集群中获取 kubeadm ClusterConfiguration。 - 可选地备份 kube-apiserver 证书。 - 升级控制平面组件的静态 Pod 清单。