F. 部署 docker 组件

docker 运行和管理容器,kubelet 通过 Container Runtime Interface (CRI) 与它进行交互。

注意:

  1. 如果没有特殊指明,本文档的所有操作均在 zhangjun-k8s01 节点上执行,然后远程分发文件和执行命令;
  2. 需要先安装 flannel,请参考附件 E.部署flannel网络.md

安装依赖包

参考 07-0.部署worker节点.md

下载和分发 docker 二进制文件

docker 下载页面 下载最新发布包:

  1. cd /opt/k8s/work
  2. wget https://download.docker.com/linux/static/stable/x86_64/docker-18.09.6.tgz
  3. tar -xvf docker-18.09.6.tgz

分发二进制文件到所有 worker 节点:

  1. cd /opt/k8s/work
  2. source /opt/k8s/bin/environment.sh
  3. for node_ip in ${NODE_IPS[@]}
  4. do
  5. echo ">>> ${node_ip}"
  6. scp docker/* root@${node_ip}:/opt/k8s/bin/
  7. ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
  8. done

创建和分发 systemd unit 文件

  1. cd /opt/k8s/work
  2. cat > docker.service <<"EOF"
  3. [Unit]
  4. Description=Docker Application Container Engine
  5. Documentation=http://docs.docker.io
  6. [Service]
  7. WorkingDirectory=##DOCKER_DIR##
  8. Environment="PATH=/opt/k8s/bin:/bin:/sbin:/usr/bin:/usr/sbin"
  9. EnvironmentFile=-/run/flannel/docker
  10. ExecStart=/opt/k8s/bin/dockerd $DOCKER_NETWORK_OPTIONS
  11. ExecReload=/bin/kill -s HUP $MAINPID
  12. Restart=on-failure
  13. RestartSec=5
  14. LimitNOFILE=infinity
  15. LimitNPROC=infinity
  16. LimitCORE=infinity
  17. Delegate=yes
  18. KillMode=process
  19. [Install]
  20. WantedBy=multi-user.target
  21. EOF
  • EOF 前后有双引号,这样 bash 不会替换文档中的变量,如 $DOCKER_NETWORK_OPTIONS (这些环境变量是 systemd 负责替换的。);
  • dockerd 运行时会调用其它 docker 命令,如 docker-proxy,所以需要将 docker 命令所在的目录加到 PATH 环境变量中;
  • flanneld 启动时将网络配置写入 /run/flannel/docker 文件中,dockerd 启动前读取该文件中的环境变量 DOCKER_NETWORK_OPTIONS ,然后设置 docker0 网桥网段;
  • 如果指定了多个 EnvironmentFile 选项,则必须将 /run/flannel/docker 放在最后(确保 docker0 使用 flanneld 生成的 bip 参数);
  • docker 需要以 root 用于运行;
  • docker 从 1.13 版本开始,可能将 iptables FORWARD chain的默认策略设置为DROP,从而导致 ping 其它 Node 上的 Pod IP 失败,遇到这种情况时,需要手动设置策略为 ACCEPT

    1. $ sudo iptables -P FORWARD ACCEPT

    并且把以下命令写入 /etc/rc.local 文件中,防止节点重启iptables FORWARD chain的默认策略又还原为DROP

    1. /sbin/iptables -P FORWARD ACCEPT

分发 systemd unit 文件到所有 worker 机器:

  1. cd /opt/k8s/work
  2. source /opt/k8s/bin/environment.sh
  3. sed -i -e "s|##DOCKER_DIR##|${DOCKER_DIR}|" docker.service
  4. for node_ip in ${NODE_IPS[@]}
  5. do
  6. echo ">>> ${node_ip}"
  7. scp docker.service root@${node_ip}:/etc/systemd/system/
  8. done

配置和分发 docker 配置文件

使用国内的仓库镜像服务器以加快 pull image 的速度,同时增加下载的并发数 (需要重启 dockerd 生效):

  1. cd /opt/k8s/work
  2. source /opt/k8s/bin/environment.sh
  3. cat > docker-daemon.json <<EOF
  4. {
  5. "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","https://hub-mirror.c.163.com"],
  6. "insecure-registries": ["docker02:35000"],
  7. "max-concurrent-downloads": 20,
  8. "live-restore": true,
  9. "max-concurrent-uploads": 10,
  10. "debug": true,
  11. "data-root": "${DOCKER_DIR}/data",
  12. "exec-root": "${DOCKER_DIR}/exec",
  13. "log-opts": {
  14. "max-size": "100m",
  15. "max-file": "5"
  16. }
  17. }
  18. EOF

分发 docker 配置文件到所有 worker 节点:

  1. cd /opt/k8s/work
  2. source /opt/k8s/bin/environment.sh
  3. for node_ip in ${NODE_IPS[@]}
  4. do
  5. echo ">>> ${node_ip}"
  6. ssh root@${node_ip} "mkdir -p /etc/docker/ ${DOCKER_DIR}/{data,exec}"
  7. scp docker-daemon.json root@${node_ip}:/etc/docker/daemon.json
  8. done

启动 docker 服务

  1. source /opt/k8s/bin/environment.sh
  2. for node_ip in ${NODE_IPS[@]}
  3. do
  4. echo ">>> ${node_ip}"
  5. ssh root@${node_ip} "systemctl daemon-reload && systemctl enable docker && systemctl restart docker"
  6. done

检查服务运行状态

  1. source /opt/k8s/bin/environment.sh
  2. for node_ip in ${NODE_IPS[@]}
  3. do
  4. echo ">>> ${node_ip}"
  5. ssh root@${node_ip} "systemctl status docker|grep Active"
  6. done

确保状态为 active (running),否则查看日志,确认原因:

  1. journalctl -u docker

检查 docker0 网桥

  1. source /opt/k8s/bin/environment.sh
  2. for node_ip in ${NODE_IPS[@]}
  3. do
  4. echo ">>> ${node_ip}"
  5. ssh root@${node_ip} "/usr/sbin/ip addr show flannel.1 && /usr/sbin/ip addr show docker0"
  6. done

确认各 worker 节点的 docker0 网桥和 flannel.1 接口的 IP 处于同一个网段中(如下 172.30.80.0/32 位于 172.30.80.1/21 中):

  1. >>> 172.27.137.240
  2. 3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
  3. link/ether ce:9c:a9:08:50:03 brd ff:ff:ff:ff:ff:ff
  4. inet 172.30.80.0/32 scope global flannel.1
  5. valid_lft forever preferred_lft forever
  6. 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
  7. link/ether 02:42:5c:c1:77:03 brd ff:ff:ff:ff:ff:ff
  8. inet 172.30.80.1/21 brd 172.30.87.255 scope global docker0
  9. valid_lft forever preferred_lft forever

注意: 如果您的服务安装顺序不对或者机器环境比较复杂, docker服务早于flanneld服务安装,此时 worker 节点的 docker0 网桥和 flannel.1 接口的 IP可能不会同处同一个网段下,这个时候请先停止docker服务, 手工删除docker0网卡,重新启动docker服务后即可修复:

  1. systemctl stop docker
  2. ip link delete docker0
  3. systemctl start docker

查看 docker 的状态信息

  1. $ ps -elfH|grep docker
  2. 4 S root 116590 1 0 80 0 - 131420 futex_ 11:22 ? 00:00:01 /opt/k8s/bin/dockerd --bip=172.30.80.1/21 --ip-masq=false --mtu=1450
  3. 4 S root 116668 116590 1 80 0 - 161643 futex_ 11:22 ? 00:00:03 containerd --config /data/k8s/docker/exec/containerd/containerd.toml --log-level debug
  1. $ docker info
  2. Containers: 0
  3. Running: 0
  4. Paused: 0
  5. Stopped: 0
  6. Images: 0
  7. Server Version: 18.09.6
  8. Storage Driver: overlay2
  9. Backing Filesystem: extfs
  10. Supports d_type: true
  11. Native Overlay Diff: true
  12. Logging Driver: json-file
  13. Cgroup Driver: cgroupfs
  14. Plugins:
  15. Volume: local
  16. Network: bridge host macvlan null overlay
  17. Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
  18. Swarm: inactive
  19. Runtimes: runc
  20. Default Runtime: runc
  21. Init Binary: docker-init
  22. containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
  23. runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
  24. init version: fec3683
  25. Security Options:
  26. apparmor
  27. seccomp
  28. Profile: default
  29. Kernel Version: 4.14.110-0.el7.4pd.x86_64
  30. Operating System: CentOS Linux 7 (Core)
  31. OSType: linux
  32. Architecture: x86_64
  33. CPUs: 8
  34. Total Memory: 15.64GiB
  35. Name: zhangjun-k8s01
  36. ID: VJYK:3T6T:EPHU:65SM:3OZD:DMNE:MT5J:O22I:TCG2:F3JR:MZ76:B3EF
  37. Docker Root Dir: /data/k8s/docker/data
  38. Debug Mode (client): false
  39. Debug Mode (server): true
  40. File Descriptors: 22
  41. Goroutines: 43
  42. System Time: 2019-05-26T11:26:21.2494815+08:00
  43. EventsListeners: 0
  44. Registry: https://index.docker.io/v1/
  45. Labels:
  46. Experimental: false
  47. Insecure Registries:
  48. docker02:35000
  49. 127.0.0.0/8
  50. Registry Mirrors:
  51. https://docker.mirrors.ustc.edu.cn/
  52. https://hub-mirror.c.163.com/
  53. Live Restore Enabled: true
  54. Product License: Community Engine
  55. WARNING: No swap limit support

更新 kubelet 配置并重启服务(每个节点上都操作)

需要删除 kubelet 的 systemd unit 文件(/etc/systemd/system/kubelet.service),删除下面 4 行:

  1. --network-plugin=cni \\
  2. --cni-conf-dir=/etc/cni/net.d \\
  3. --container-runtime=remote \\
  4. --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \\

然后重启 kubelet 服务:

  1. systemctl restart kubelet