Control plane load balancing

For clusters that don’t have an externally managed load balancer for the k0s control plane, there is another option to get a highly available control plane called control plane load balancing (CPLB).

CPLB has two features that are independent, but normally will be used together: VRRP Instances, which allows automatic assignation of predefined IP addresses using VRRP across control plane nodes. VirtualServers allows to do Load Balancing to the other control plane nodes.

This feature is intended to be used for external traffic. This feature is fully compatible with node-local load balancing (NLLB) which means CPLB can be used for external traffic and NLLB for internal traffic at the same time.

Technical functionality

The k0s control plane load balancer provides k0s with virtual IPs and TCP load Balancing on each controller node. This allows the control plane to be highly available using VRRP (Virtual Router Redundancy Protocol) and IPVS long as the network infrastructure allows multicast and GARP.

Keepalived is the only load balancer that is supported so far. Currently there are no plans to support other alternatives.

VRRP Instances

VRRP, or Virtual Router Redundancy Protocol, is a protocol that allows several routers to utilize the same virtual IP address. A VRRP instance refers to a specific configuration of this protocol.

Each VRRP instance must have a unique virtualRouterID, at least one IP address, one unique password (which is sent in plain text across your network, this is to prevent accidental conflicts between VRRP instances) and one network interface.

Except for the network interface, all the fields of a VRRP instance must have the same value on all the control plane nodes.

Usually, users will define multiple VRRP instances when they need k0s to be highly available on multiple network interfaces.

Enabling in a cluster

In order to use control plane load balancing, the cluster needs to comply with the following:

  • K0s isn’t running as a single node, i.e. it isn’t started using the --single flag.
  • The cluster should have multiple controller nodes. Technically CPLB also works with a single controller node, but is only useful in conjunction with a highly available control plane.
  • Unique virtualRouterID and authPass for each VRRP Instance in the same broadcast domain. These do not provide any sort of security against ill-intentioned attacks, they are safety features to prevent accidental conflicts between VRRP instances in the same network segment.
  • If VirtualServers are used, the cluster configuration mustn’t specify a non-empty spec.api.externalAddress. If only VRRPInstances are specified, a non-empty spec.api.externalAddress may be specified.

Add the following to the cluster configuration (k0s.yaml):

  1. spec:
  2. network:
  3. controlPlaneLoadBalancing:
  4. enabled: true
  5. type: Keepalived
  6. keepalived:
  7. vrrpInstances:
  8. - virtualIPs: ["<External address IP>/<external address IP netmask"]
  9. authPass: <password>
  10. virtualServers:
  11. - ipAddress: "ipAddress"

Or alternatively, if using k0sctl, add the following to the k0sctl configuration (k0sctl.yaml):

  1. spec:
  2. k0s:
  3. config:
  4. spec:
  5. network:
  6. controlPlaneLoadBalancing:
  7. enabled: true
  8. type: Keepalived
  9. keepalived:
  10. vrrpInstances:
  11. - virtualIPs: ["<External address IP>/<external address IP netmask>"]
  12. authPass: <password>
  13. virtualServers:
  14. - ipAddress: "<External ip address>"

Because this is a feature intended to configure the apiserver, CPLB noes not support dynamic configuration and in order to make changes you need to restart the k0s controllers to make changes.

Full example using k0sctl

The following example shows a full k0sctl configuration file featuring three controllers and three workers with control plane load balancing enabled.

  1. apiVersion: k0sctl.k0sproject.io/v1beta1
  2. kind: Cluster
  3. metadata:
  4. name: k0s-cluster
  5. spec:
  6. hosts:
  7. - role: controller
  8. ssh:
  9. address: controller-0.k0s.lab
  10. user: root
  11. keyPath: ~/.ssh/id_rsa
  12. k0sBinaryPath: /opt/k0s
  13. uploadBinary: true
  14. - role: controller
  15. ssh:
  16. address: controller-1.k0s.lab
  17. user: root
  18. keyPath: ~/.ssh/id_rsa
  19. k0sBinaryPath: /opt/k0s
  20. uploadBinary: true
  21. - role: controller
  22. ssh:
  23. address: controller-2.k0s.lab
  24. user: root
  25. keyPath: ~/.ssh/id_rsa
  26. k0sBinaryPath: /opt/k0s
  27. uploadBinary: true
  28. - role: worker
  29. ssh:
  30. address: worker-0.k0s.lab
  31. user: root
  32. keyPath: ~/.ssh/id_rsa
  33. k0sBinaryPath: /opt/k0s
  34. uploadBinary: true
  35. - role: worker
  36. ssh:
  37. address: worker-1.k0s.lab
  38. user: root
  39. keyPath: ~/.ssh/id_rsa
  40. k0sBinaryPath: /opt/k0s
  41. uploadBinary: true
  42. - role: worker
  43. ssh:
  44. address: worker-2.k0s.lab
  45. user: root
  46. keyPath: ~/.ssh/id_rsa
  47. k0sBinaryPath: /opt/k0s
  48. uploadBinary: true
  49. k0s:
  50. version: v1.31.1+k0s.0
  51. config:
  52. spec:
  53. api:
  54. sans:
  55. - 192.168.122.200
  56. network:
  57. controlPlaneLoadBalancing:
  58. enabled: true
  59. type: Keepalived:
  60. keepalived:
  61. vrrpInstances:
  62. - virtualIPs: ["192.168.122.200/24"]
  63. authPass: Example
  64. virtualServers:
  65. - ipAddress: "<External ip address>"

Save the above configuration into a file called k0sctl.yaml and apply it in order to bootstrap the cluster:

  1. $ k0sctl apply
  2. ⠀⣿⣿⡇⠀⠀⢀⣴⣾⣿⠟⠁⢸⣿⣿⣿⣿⣿⣿⣿⡿⠛⠁⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀█████████ █████████ ███
  3. ⠀⣿⣿⡇⣠⣶⣿⡿⠋⠀⠀⠀⢸⣿⡇⠀⠀⠀⣠⠀⠀⢀⣠⡆⢸⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀███ ███ ███
  4. ⠀⣿⣿⣿⣿⣟⠋⠀⠀⠀⠀⠀⢸⣿⡇⠀⢰⣾⣿⠀⠀⣿⣿⡇⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀███ ███ ███
  5. ⠀⣿⣿⡏⠻⣿⣷⣤⡀⠀⠀⠀⠸⠛⠁⠀⠸⠋⠁⠀⠀⣿⣿⡇⠈⠉⠉⠉⠉⠉⠉⠉⠉⢹⣿⣿⠀███ ███ ███
  6. ⠀⣿⣿⡇⠀⠀⠙⢿⣿⣦⣀⠀⠀⠀⣠⣶⣶⣶⣶⣶⣶⣿⣿⡇⢰⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⠀█████████ ███ ██████████
  7. k0sctl Copyright 2023, k0sctl authors.
  8. Anonymized telemetry of usage will be sent to the authors.
  9. By continuing to use k0sctl you agree to these terms:
  10. https://k0sproject.io/licenses/eula
  11. level=info msg="==> Running phase: Connect to hosts"
  12. level=info msg="[ssh] worker-2.k0s.lab:22: connected"
  13. level=info msg="[ssh] controller-2.k0s.lab:22: connected"
  14. level=info msg="[ssh] worker-1.k0s.lab:22: connected"
  15. level=info msg="[ssh] worker-0.k0s.lab:22: connected"
  16. level=info msg="[ssh] controller-0.k0s.lab:22: connected"
  17. level=info msg="[ssh] controller-1.k0s.lab:22: connected"
  18. level=info msg="==> Running phase: Detect host operating systems"
  19. level=info msg="[ssh] worker-2.k0s.lab:22: is running Fedora Linux 38 (Cloud Edition)"
  20. level=info msg="[ssh] controller-2.k0s.lab:22: is running Fedora Linux 38 (Cloud Edition)"
  21. level=info msg="[ssh] controller-0.k0s.lab:22: is running Fedora Linux 38 (Cloud Edition)"
  22. level=info msg="[ssh] controller-1.k0s.lab:22: is running Fedora Linux 38 (Cloud Edition)"
  23. level=info msg="[ssh] worker-0.k0s.lab:22: is running Fedora Linux 38 (Cloud Edition)"
  24. level=info msg="[ssh] worker-1.k0s.lab:22: is running Fedora Linux 38 (Cloud Edition)"
  25. level=info msg="==> Running phase: Acquire exclusive host lock"
  26. level=info msg="==> Running phase: Prepare hosts"
  27. level=info msg="==> Running phase: Gather host facts"
  28. level=info msg="[ssh] worker-2.k0s.lab:22: using worker-2.k0s.lab as hostname"
  29. level=info msg="[ssh] controller-0.k0s.lab:22: using controller-0.k0s.lab as hostname"
  30. level=info msg="[ssh] controller-2.k0s.lab:22: using controller-2.k0s.lab as hostname"
  31. level=info msg="[ssh] controller-1.k0s.lab:22: using controller-1.k0s.lab as hostname"
  32. level=info msg="[ssh] worker-1.k0s.lab:22: using worker-1.k0s.lab as hostname"
  33. level=info msg="[ssh] worker-0.k0s.lab:22: using worker-0.k0s.lab as hostname"
  34. level=info msg="[ssh] worker-2.k0s.lab:22: discovered eth0 as private interface"
  35. level=info msg="[ssh] controller-0.k0s.lab:22: discovered eth0 as private interface"
  36. level=info msg="[ssh] controller-2.k0s.lab:22: discovered eth0 as private interface"
  37. level=info msg="[ssh] controller-1.k0s.lab:22: discovered eth0 as private interface"
  38. level=info msg="[ssh] worker-1.k0s.lab:22: discovered eth0 as private interface"
  39. level=info msg="[ssh] worker-0.k0s.lab:22: discovered eth0 as private interface"
  40. level=info msg="[ssh] worker-2.k0s.lab:22: discovered 192.168.122.210 as private address"
  41. level=info msg="[ssh] controller-0.k0s.lab:22: discovered 192.168.122.37 as private address"
  42. level=info msg="[ssh] controller-2.k0s.lab:22: discovered 192.168.122.87 as private address"
  43. level=info msg="[ssh] controller-1.k0s.lab:22: discovered 192.168.122.185 as private address"
  44. level=info msg="[ssh] worker-1.k0s.lab:22: discovered 192.168.122.81 as private address"
  45. level=info msg="[ssh] worker-0.k0s.lab:22: discovered 192.168.122.219 as private address"
  46. level=info msg="==> Running phase: Validate hosts"
  47. level=info msg="==> Running phase: Validate facts"
  48. level=info msg="==> Running phase: Download k0s binaries to local host"
  49. level=info msg="==> Running phase: Upload k0s binaries to hosts"
  50. level=info msg="[ssh] controller-0.k0s.lab:22: uploading k0s binary from /opt/k0s"
  51. level=info msg="[ssh] controller-2.k0s.lab:22: uploading k0s binary from /opt/k0s"
  52. level=info msg="[ssh] worker-0.k0s.lab:22: uploading k0s binary from /opt/k0s"
  53. level=info msg="[ssh] controller-1.k0s.lab:22: uploading k0s binary from /opt/k0s"
  54. level=info msg="[ssh] worker-1.k0s.lab:22: uploading k0s binary from /opt/k0s"
  55. level=info msg="[ssh] worker-2.k0s.lab:22: uploading k0s binary from /opt/k0s"
  56. level=info msg="==> Running phase: Install k0s binaries on hosts"
  57. level=info msg="[ssh] controller-0.k0s.lab:22: validating configuration"
  58. level=info msg="[ssh] controller-1.k0s.lab:22: validating configuration"
  59. level=info msg="[ssh] controller-2.k0s.lab:22: validating configuration"
  60. level=info msg="==> Running phase: Configure k0s"
  61. level=info msg="[ssh] controller-0.k0s.lab:22: installing new configuration"
  62. level=info msg="[ssh] controller-2.k0s.lab:22: installing new configuration"
  63. level=info msg="[ssh] controller-1.k0s.lab:22: installing new configuration"
  64. level=info msg="==> Running phase: Initialize the k0s cluster"
  65. level=info msg="[ssh] controller-0.k0s.lab:22: installing k0s controller"
  66. level=info msg="[ssh] controller-0.k0s.lab:22: waiting for the k0s service to start"
  67. level=info msg="[ssh] controller-0.k0s.lab:22: waiting for kubernetes api to respond"
  68. level=info msg="==> Running phase: Install controllers"
  69. level=info msg="[ssh] controller-2.k0s.lab:22: validating api connection to https://192.168.122.200:6443"
  70. level=info msg="[ssh] controller-1.k0s.lab:22: validating api connection to https://192.168.122.200:6443"
  71. level=info msg="[ssh] controller-0.k0s.lab:22: generating token"
  72. level=info msg="[ssh] controller-1.k0s.lab:22: writing join token"
  73. level=info msg="[ssh] controller-1.k0s.lab:22: installing k0s controller"
  74. level=info msg="[ssh] controller-1.k0s.lab:22: starting service"
  75. level=info msg="[ssh] controller-1.k0s.lab:22: waiting for the k0s service to start"
  76. level=info msg="[ssh] controller-1.k0s.lab:22: waiting for kubernetes api to respond"
  77. level=info msg="[ssh] controller-0.k0s.lab:22: generating token"
  78. level=info msg="[ssh] controller-2.k0s.lab:22: writing join token"
  79. level=info msg="[ssh] controller-2.k0s.lab:22: installing k0s controller"
  80. level=info msg="[ssh] controller-2.k0s.lab:22: starting service"
  81. level=info msg="[ssh] controller-2.k0s.lab:22: waiting for the k0s service to start"
  82. level=info msg="[ssh] controller-2.k0s.lab:22: waiting for kubernetes api to respond"
  83. level=info msg="==> Running phase: Install workers"
  84. level=info msg="[ssh] worker-2.k0s.lab:22: validating api connection to https://192.168.122.200:6443"
  85. level=info msg="[ssh] worker-1.k0s.lab:22: validating api connection to https://192.168.122.200:6443"
  86. level=info msg="[ssh] worker-0.k0s.lab:22: validating api connection to https://192.168.122.200:6443"
  87. level=info msg="[ssh] controller-0.k0s.lab:22: generating a join token for worker 1"
  88. level=info msg="[ssh] controller-0.k0s.lab:22: generating a join token for worker 2"
  89. level=info msg="[ssh] controller-0.k0s.lab:22: generating a join token for worker 3"
  90. level=info msg="[ssh] worker-2.k0s.lab:22: writing join token"
  91. level=info msg="[ssh] worker-0.k0s.lab:22: writing join token"
  92. level=info msg="[ssh] worker-1.k0s.lab:22: writing join token"
  93. level=info msg="[ssh] worker-2.k0s.lab:22: installing k0s worker"
  94. level=info msg="[ssh] worker-1.k0s.lab:22: installing k0s worker"
  95. level=info msg="[ssh] worker-0.k0s.lab:22: installing k0s worker"
  96. level=info msg="[ssh] worker-2.k0s.lab:22: starting service"
  97. level=info msg="[ssh] worker-1.k0s.lab:22: starting service"
  98. level=info msg="[ssh] worker-0.k0s.lab:22: starting service"
  99. level=info msg="[ssh] worker-2.k0s.lab:22: waiting for node to become ready"
  100. level=info msg="[ssh] worker-0.k0s.lab:22: waiting for node to become ready"
  101. level=info msg="[ssh] worker-1.k0s.lab:22: waiting for node to become ready"
  102. level=info msg="==> Running phase: Release exclusive host lock"
  103. level=info msg="==> Running phase: Disconnect from hosts"
  104. level=info msg="==> Finished in 2m20s"
  105. level=info msg="k0s cluster version v1.31.1+k0s.0 is now installed"
  106. level=info msg="Tip: To access the cluster you can now fetch the admin kubeconfig using:"
  107. level=info msg=" k0sctl kubeconfig"

The cluster with the two nodes should be available by now. Setup the kubeconfig file in order to interact with it:

  1. k0sctl kubeconfig > k0s-kubeconfig
  2. export KUBECONFIG=$(pwd)/k0s-kubeconfig

All three worker nodes are ready:

  1. $ kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. worker-0.k0s.lab Ready <none> 8m51s v1.31.1+k0s
  4. worker-1.k0s.lab Ready <none> 8m51s v1.31.1+k0s
  5. worker-2.k0s.lab Ready <none> 8m51s v1.31.1+k0s

Each controller node has a dummy interface with the VIP and /32 netmask, but only one has it in the real nic:

  1. $ for i in controller-{0..2} ; do echo $i ; ssh $i -- ip -4 --oneline addr show | grep -e eth0 -e dummyvip0; done
  2. controller-0
  3. 2: eth0 inet 192.168.122.37/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0\ valid_lft 2381sec preferred_lft 2381sec
  4. 2: eth0 inet 192.168.122.200/24 scope global secondary eth0\ valid_lft forever preferred_lft forever
  5. 3: dummyvip0 inet 192.168.122.200/32 scope global dummyvip0\ valid_lft forever preferred_lft forever
  6. controller-1
  7. 2: eth0 inet 192.168.122.185/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0\ valid_lft 2390sec preferred_lft 2390sec
  8. 3: dummyvip0 inet 192.168.122.200/32 scope global dummyvip0\ valid_lft forever preferred_lft forever
  9. controller-2
  10. 2: eth0 inet 192.168.122.87/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0\ valid_lft 2399sec preferred_lft 2399sec
  11. 3: dummyvip0 inet 192.168.122.200/32 scope global dummyvip0\ valid_lft forever preferred_lft forever

The cluster is using control plane load balancing and is able to tolerate the outage of one controller node. Shutdown the first controller to simulate a failure condition:

  1. $ ssh controller-0 'sudo poweroff'
  2. Connection to 192.168.122.37 closed by remote host.

Control plane load balancing provides high availability, the VIP will have moved to a different node:

  1. $ for i in controller-{0..2} ; do echo $i ; ssh $i -- ip -4 --oneline addr show | grep -e eth0 -e dummyvip0; done
  2. controller-1
  3. 2: eth0 inet 192.168.122.185/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0\ valid_lft 2173sec preferred_lft 2173sec
  4. 2: eth0 inet 192.168.122.200/24 scope global secondary eth0\ valid_lft forever preferred_lft forever
  5. 3: dummyvip0 inet 192.168.122.200/32 scope global dummyvip0\ valid_lft forever preferred_lft forever
  6. controller-2
  7. 2: eth0 inet 192.168.122.87/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0\ valid_lft 2182sec preferred_lft 2182sec
  8. 3: dummyvip0 inet 192.168.122.200/32 scope global dummyvip0\ valid_lft forever preferred_lft forever
  9. $ for i in controller-{0..2} ; do echo $i ; ipvsadm --save -n; done
  10. IP Virtual Server version 1.2.1 (size=4096)
  11. Prot LocalAddress:Port Scheduler Flags
  12. -> RemoteAddress:Port Forward Weight ActiveConn InActConn
  13. TCP 192.168.122.200:6443 rr persistent 360
  14. -> 192.168.122.185:6443 Route 1 0 0
  15. -> 192.168.122.87:6443 Route 1 0 0
  16. -> 192.168.122.122:6443 Route 1 0 0
  17. ````
  18. And the cluster will be working normally:
  19. ```console
  20. $ kubectl get nodes
  21. NAME STATUS ROLES AGE VERSION
  22. worker-0.k0s.lab Ready <none> 8m51s v1.31.1+k0s
  23. worker-1.k0s.lab Ready <none> 8m51s v1.31.1+k0s
  24. worker-2.k0s.lab Ready <none> 8m51s v1.31.1+k0s