Migrate Replicated Control Plane To Use Cloud Controller Manager
FEATURE STATE: Kubernetes v1.24 [stable]
The cloud-controller-manager is a Kubernetes control plane component that embeds cloud-specific control logic. The cloud controller manager lets you link your cluster into your cloud provider’s API, and separates out the components that interact with that cloud platform from components that only interact with your cluster.
By decoupling the interoperability logic between Kubernetes and the underlying cloud infrastructure, the cloud-controller-manager component enables cloud providers to release features at a different pace compared to the main Kubernetes project.
Background
As part of the cloud provider extraction effort, all cloud specific controllers must be moved out of the kube-controller-manager
. All existing clusters that run cloud controllers in the kube-controller-manager
must migrate to instead run the controllers in a cloud provider specific cloud-controller-manager
.
Leader Migration provides a mechanism in which HA clusters can safely migrate “cloud specific” controllers between the kube-controller-manager
and the cloud-controller-manager
via a shared resource lock between the two components while upgrading the replicated control plane. For a single-node control plane, or if unavailability of controller managers can be tolerated during the upgrade, Leader Migration is not needed and this guide can be ignored.
Leader Migration can be enabled by setting --enable-leader-migration
on kube-controller-manager
or cloud-controller-manager
. Leader Migration only applies during the upgrade and can be safely disabled or left enabled after the upgrade is complete.
This guide walks you through the manual process of upgrading the control plane from kube-controller-manager
with built-in cloud provider to running both kube-controller-manager
and cloud-controller-manager
. If you use a tool to deploy and manage the cluster, please refer to the documentation of the tool and the cloud provider for specific instructions of the migration.
Before you begin
It is assumed that the control plane is running Kubernetes version N and to be upgraded to version N + 1. Although it is possible to migrate within the same version, ideally the migration should be performed as part of an upgrade so that changes of configuration can be aligned to each release. The exact versions of N and N + 1 depend on each cloud provider. For example, if a cloud provider builds a cloud-controller-manager
to work with Kubernetes 1.24, then N can be 1.23 and N + 1 can be 1.24.
The control plane nodes should run kube-controller-manager
with Leader Election enabled, which is the default. As of version N, an in-tree cloud provider must be set with --cloud-provider
flag and cloud-controller-manager
should not yet be deployed.
The out-of-tree cloud provider must have built a cloud-controller-manager
with Leader Migration implementation. If the cloud provider imports k8s.io/cloud-provider
and k8s.io/controller-manager
of version v0.21.0 or later, Leader Migration will be available. However, for version before v0.22.0, Leader Migration is alpha and requires feature gate ControllerManagerLeaderMigration
to be enabled in cloud-controller-manager
.
This guide assumes that kubelet of each control plane node starts kube-controller-manager
and cloud-controller-manager
as static pods defined by their manifests. If the components run in a different setting, please adjust the steps accordingly.
For authorization, this guide assumes that the cluster uses RBAC. If another authorization mode grants permissions to kube-controller-manager
and cloud-controller-manager
components, please grant the needed access in a way that matches the mode.
Grant access to Migration Lease
The default permissions of the controller manager allow only accesses to their main Lease. In order for the migration to work, accesses to another Lease are required.
You can grant kube-controller-manager
full access to the leases API by modifying the system::leader-locking-kube-controller-manager
role. This task guide assumes that the name of the migration lease is cloud-provider-extraction-migration
.
kubectl patch -n kube-system role 'system::leader-locking-kube-controller-manager' -p '{"rules": [ {"apiGroups":[ "coordination.k8s.io"], "resources": ["leases"], "resourceNames": ["cloud-provider-extraction-migration"], "verbs": ["create", "list", "get", "update"] } ]}' --type=merge
Do the same to the system::leader-locking-cloud-controller-manager
role.
kubectl patch -n kube-system role 'system::leader-locking-cloud-controller-manager' -p '{"rules": [ {"apiGroups":[ "coordination.k8s.io"], "resources": ["leases"], "resourceNames": ["cloud-provider-extraction-migration"], "verbs": ["create", "list", "get", "update"] } ]}' --type=merge
Initial Leader Migration configuration
Leader Migration optionally takes a configuration file representing the state of controller-to-manager assignment. At this moment, with in-tree cloud provider, kube-controller-manager
runs route
, service
, and cloud-node-lifecycle
. The following example configuration shows the assignment.
Leader Migration can be enabled without a configuration. Please see Default Configuration for details.
kind: LeaderMigrationConfiguration
apiVersion: controllermanager.config.k8s.io/v1
leaderName: cloud-provider-extraction-migration
controllerLeaders:
- name: route
component: kube-controller-manager
- name: service
component: kube-controller-manager
- name: cloud-node-lifecycle
component: kube-controller-manager
Alternatively, because the controllers can run under either controller managers, setting component
to *
for both sides makes the configuration file consistent between both parties of the migration.
# wildcard version
kind: LeaderMigrationConfiguration
apiVersion: controllermanager.config.k8s.io/v1
leaderName: cloud-provider-extraction-migration
controllerLeaders:
- name: route
component: *
- name: service
component: *
- name: cloud-node-lifecycle
component: *
On each control plane node, save the content to /etc/leadermigration.conf
, and update the manifest of kube-controller-manager
so that the file is mounted inside the container at the same location. Also, update the same manifest to add the following arguments:
--enable-leader-migration
to enable Leader Migration on the controller manager--leader-migration-config=/etc/leadermigration.conf
to set configuration file
Restart kube-controller-manager
on each node. At this moment, kube-controller-manager
has leader migration enabled and is ready for the migration.
Deploy Cloud Controller Manager
In version N + 1, the desired state of controller-to-manager assignment can be represented by a new configuration file, shown as follows. Please note component
field of each controllerLeaders
changing from kube-controller-manager
to cloud-controller-manager
. Alternatively, use the wildcard version mentioned above, which has the same effect.
kind: LeaderMigrationConfiguration
apiVersion: controllermanager.config.k8s.io/v1
leaderName: cloud-provider-extraction-migration
controllerLeaders:
- name: route
component: cloud-controller-manager
- name: service
component: cloud-controller-manager
- name: cloud-node-lifecycle
component: cloud-controller-manager
When creating control plane nodes of version N + 1, the content should be deployed to /etc/leadermigration.conf
. The manifest of cloud-controller-manager
should be updated to mount the configuration file in the same manner as kube-controller-manager
of version N. Similarly, add --enable-leader-migration
and --leader-migration-config=/etc/leadermigration.conf
to the arguments of cloud-controller-manager
.
Create a new control plane node of version N + 1 with the updated cloud-controller-manager
manifest, and with the --cloud-provider
flag set to external
for kube-controller-manager
. kube-controller-manager
of version N + 1 MUST NOT have Leader Migration enabled because, with an external cloud provider, it does not run the migrated controllers anymore, and thus it is not involved in the migration.
Please refer to Cloud Controller Manager Administration for more detail on how to deploy cloud-controller-manager
.
Upgrade Control Plane
The control plane now contains nodes of both version N and N + 1. The nodes of version N run kube-controller-manager
only, and these of version N + 1 run both kube-controller-manager
and cloud-controller-manager
. The migrated controllers, as specified in the configuration, are running under either kube-controller-manager
of version N or cloud-controller-manager
of version N + 1 depending on which controller manager holds the migration lease. No controller will ever be running under both controller managers at any time.
In a rolling manner, create a new control plane node of version N + 1 and bring down one of version N + 1 until the control plane contains only nodes of version N + 1. If a rollback from version N + 1 to N is required, add nodes of version N with Leader Migration enabled for kube-controller-manager
back to the control plane, replacing one of version N + 1 each time until there are only nodes of version N.
(Optional) Disable Leader Migration
Now that the control plane has been upgraded to run both kube-controller-manager
and cloud-controller-manager
of version N + 1, Leader Migration has finished its job and can be safely disabled to save one Lease resource. It is safe to re-enable Leader Migration for the rollback in the future.
In a rolling manager, update manifest of cloud-controller-manager
to unset both --enable-leader-migration
and --leader-migration-config=
flag, also remove the mount of /etc/leadermigration.conf
, and finally remove /etc/leadermigration.conf
. To re-enable Leader Migration, recreate the configuration file and add its mount and the flags that enable Leader Migration back to cloud-controller-manager
.
Default Configuration
Starting Kubernetes 1.22, Leader Migration provides a default configuration suitable for the default controller-to-manager assignment. The default configuration can be enabled by setting --enable-leader-migration
but without --leader-migration-config=
.
For kube-controller-manager
and cloud-controller-manager
, if there are no flags that enable any in-tree cloud provider or change ownership of controllers, the default configuration can be used to avoid manual creation of the configuration file.
Special case: migrating the Node IPAM controller
If your cloud provider provides an implementation of Node IPAM controller, you should switch to the implementation in cloud-controller-manager
. Disable Node IPAM controller in kube-controller-manager
of version N + 1 by adding --controllers=*,-nodeipam
to its flags. Then add nodeipam
to the list of migrated controllers.
# wildcard version, with nodeipam
kind: LeaderMigrationConfiguration
apiVersion: controllermanager.config.k8s.io/v1
leaderName: cloud-provider-extraction-migration
controllerLeaders:
- name: route
component: *
- name: service
component: *
- name: cloud-node-lifecycle
component: *
- name: nodeipam
- component: *
What’s next
- Read the Controller Manager Leader Migration enhancement proposal.