搭建Kubernetes高可用集群

在上一节,我们介绍了Kubernetes集群的搭建,我们说这是一个“准生产”级别的集群。

原因是,他不支持高可用。

设想下,假设Master节点挂掉,会出现什么情况?

由于只有一个主节点,所以集群会直接瘫痪。

本节,我们将借助KeepAlived搭建一个高可用的集群。

我们需要4台机器(物理机 or 虚拟机均可)。假设,这4台机器的IP分别为:

  • h1:192.168.1.12

  • h2:192.168.1.10

  • h3:192.168.1.9

  • h4:192.168.1.16

同时我们需要一个不冲突的VIP(Virtual IP),当发生主备切换时,KeepAlive会让VIP从主Master切换到备Master上。

注意,如果你使用云主机,由于网络安全性的原因,是无法自由使用云主机的,需要单独HAVIP(高可用VIP),申请地址如下:腾讯云),阿里云)。

这里假设你已经有了可用的VIP,其地址为192.168.1.8。

1 部署KeepAlived

这里我们选用h1、h2做为Master节点的主机和备机。

则需要在这两台机器上安装keepalived

  1. yum install -y keepalived

两台机器的配置文件分别如下:

h1:

  1. ! Configuration File for keepalived
  2. global_defs {
  3. router_id LVS_DEVEL
  4. }
  5. vrrp_script check_apiserver {
  6. script "</dev/tcp/127.0.0.1/6443"
  7. interval 1
  8. weight -2
  9. }
  10. vrrp_instance VI-kube-master {
  11. state MASTER # 定义节点角色
  12. interface eth0 # 网卡名称
  13. virtual_router_id 68
  14. priority 100
  15. dont_track_primary
  16. advert_int 3
  17. authentication {
  18. auth_type PASS
  19. auth_pass mypass
  20. }
  21. unicast_src_ip 192.168.1.12 #当前ECS的ip
  22. unicast_peer {
  23. 192.168.1.10 #对端ECS的ip
  24. }
  25. virtual_ipaddress {
  26. 192.168.1.8 # havip
  27. }
  28. track_script {
  29. check_apiserver
  30. }
  31. }

h2:

  1. ! Configuration File for keepalived
  2. global_defs {
  3. router_id LVS_DEVEL
  4. }
  5. vrrp_script check_apiserver {
  6. script "</dev/tcp/127.0.0.1/6443"
  7. interval 1
  8. weight -2
  9. }
  10. vrrp_instance VI-kube-master {
  11. state BACKUP # 定义节点角色
  12. interface eth0 # 网卡名称
  13. virtual_router_id 68
  14. priority 99
  15. dont_track_primary
  16. advert_int 3
  17. unicast_src_ip 192.168.1.10 #当前ECS的ip
  18. authentication {
  19. auth_type PASS
  20. auth_pass mypass
  21. }
  22. unicast_peer {
  23. 192.168.1.12 #对端ECS的ip
  24. }
  25. virtual_ipaddress {
  26. 192.168.1.8 # havip
  27. }
  28. track_script {
  29. check_apiserver
  30. }
  31. }

解释如下:

  • h1做为主机,state是MASTER,h2备机,状态为BACKUP

  • h1和h2通过unicast方式发现,互相设置了unicast_peer为对方的IP

  • virtual_ipaddress中设置了相同的VIP地址

  • 检查是否可用使用了check_apiserver这个方法,他会检查TCP端口的6443是否开启。这实际是Kubernetes的API Server地址。

配置完成后,记得重启两台机器的keepalived服务。

  1. systemctl enable keepalived
  2. service keepalived start

2 准备Kubernetes环境

这里与上一节的准备工作完全一致,不再赘述。

请参考《搭建Kubernetes集群》一节中的步骤2~4。

注意这里是4台机器都要安装。

3 启动主节点

我们首先在h1上操作,命令如下:

  1. kubeadm init --kubernetes-version v1.22.1 --control-plane-endpoint=192.168.1.8:6443 --apiserver-advertise-address=192.168.1.8 --pod-network-cidr=10.6.0.0/16 --upload-certs

说明如下:

  • 这里的control-plane-endpoint / apiserver-advertise-address填写的是VIP地址,会被VIP转发流量到h1 or h2上(取决于谁的状态是MASTER)

  • upload-certs:自动上传证书,高可用集群需要

执行成功后,结果如下:

  1. Your Kubernetes control-plane has initialized successfully!
  2. To start using your cluster, you need to run the following as a regular user:
  3. mkdir -p $HOME/.kube
  4. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  5. sudo chown $(id -u):$(id -g) $HOME/.kube/config
  6. Alternatively, if you are the root user, you can run:
  7. export KUBECONFIG=/etc/kubernetes/admin.conf
  8. You should now deploy a pod network to the cluster.
  9. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  10. https://kubernetes.io/docs/concepts/cluster-administration/addons/
  11. You can now join any number of the control-plane node running the following command on each as root:
  12. kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  13. --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d \
  14. --control-plane --certificate-key 23474fd4262f1bf8849c5cea160fd3309621f79460266c43dfca1d7cc390f1af
  15. Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
  16. As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
  17. "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
  18. Then you can join any number of worker nodes by running the following on each as root:
  19. kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  20. --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d

上述有两个join命令,长的那个是master用的,短的是slave用的。

我们将h2和h3也以master方式加入(因为Kubernetes要求至少有两个Master存活,才能正常工作),也即在h2和h3上执行:

  1. kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  2. --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d \
  3. --control-plane --certificate-key 23474fd4262f1bf8849c5cea160fd3309621f79460266c43dfca1d7cc390f1af

4 启动普通节点

在h4上以slave身份加入

  1. kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  2. --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d

5 安装网络插件

回到h1 or h2 or h3上执行(因为他们三个都是Master):

  1. wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
  2. # 修改cidr匹配后
  3. kubectl apply -f ./kube-flannel.yml

6 测试高可用

我们对h1执行关机

  1. poweroff

然后查看h2上的keepalived日志,可以观察到切换:

  1. 9 18 7:59:28 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Changing effective priority from 97 to 99
  2. 9 18 8:03:22 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Transition to MASTER STATE
  3. 9 18 8:03:25 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Entering MASTER STATE
  4. 9 18 8:03:25 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) setting protocol VIPs.

然后立即在h2上查看集群状态,全部正常:

  1. kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. h1 Ready control-plane,master 6m16s v1.22.2
  4. h2 Ready control-plane,master 5m51s v1.22.2
  5. h3 Ready control-plane,master 4m52s v1.22.2
  6. h4 Ready <none> 3m38s v1.22.2

再等一会后,发现h1挂掉了:

  1. kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. h1 NotReady control-plane,master 6m16s v1.22.2
  4. h2 Ready control-plane,master 5m51s v1.22.2
  5. h3 Ready control-plane,master 4m52s v1.22.2
  6. h4 Ready <none> 3m38s v1.22.2

至此,我们实现了Master的高可用!

7 测试高可用恢复

我们重启启动h1,稍等一会,发现一切正常!

  1. kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. h1 Ready control-plane,master 8m14s v1.22.2
  4. h2 Ready control-plane,master 7m49s v1.22.2
  5. h3 Ready control-plane,master 6m50s v1.22.2
  6. h4 Ready <none> 5m36s v1.22.2

至此,你应该已经熟悉了Kubernetes集群高可用的搭建步骤。

这里提一个问题:我们将h1、h2、h3都是Master,但是只在h1和h2上设置了KeepAlived。

  • 如果h3挂掉后,集群能正常工作么?

  • 如果h3挂掉后,h2也挂掉了,集群还能正常工作么?