wx 私有化部署最佳实践

wx tkestack 私有化部署最佳实践

背景

随着私有化项目越来越多,简单快捷部署交付需求日益强烈。在开源协同大行其道,私有化一键部署如何合理利用开源协同力量做到用好80%,做好20% 值得思而深行而简。本文将揭秘私有化一键部署结合tkestack 实现私有化一键部署最佳实践面纱。

方案选型

虽然开源协同大行其道,但合适才是最好基本原则仍然需要贯彻执行 — 量体裁衣;结合开发难度,开发效率,后期代码维护,组件维护,实施人员使用难度,实施人员现场修改难度等多维度进行考量。方案如下:

  • kubeadm+ansible: **优点:**对kubernetes版本,网络插件,docker 版本自主可控; **缺点:**维护成本高,特别kubernetes升级,,无法专注于业务层面的一键部署;
  • tkestack+ansible: **优点:**对于kubernetes的集成,维护无需关心,只需要用好tkestack也就掌握tkestack的基本原理做好应急;专注于业务层面的一键部署集成即可。通过ansible进行机器批量初始化,部署方便快捷对于实施人员要求低,可以随时现场修改; 缺点: kubernetes 版本,网络插件,docker 版本不自主可控;
  • tkestack+operator: **优点:**私有化一键部署产品化,平台化; **缺点:**operator开发成本高,客户环境多变复杂,出现问题没法现场修改;

综合上述方案考虑维护kubernetes成本有点高,另外tkestack+ansible通过hooks方式进行扩展,能实现快速集成,可以专注于业务组件集成即可;ansible 入手容易,降低实施人员学习成本,并且可以随时根据现场环境随时修改适配;综合考量选择tkestack+ansible 模式。

需求

功能性需求:

功能说明
主机初始化安装前进行主机初始化,比如添加域名hosts,安装压测工具,离线yum源等
主机检查检查当前主机的性能是否符合需求,磁盘大小是否符合需求,操作系统版本,内核版本,性能压测等
tkestack部署部署kuberntes和tkestack
业务依赖组件部署部署业务依赖组件,比如redis,mysql,部署运维组件elk,prometheus等
业务部署部署业务服务

非功能性需求:

功能说明
解耦针对一些已有kubernetes/tke 平台,此时需要只部署业务依赖组件及业务,所以需要和tkestack解耦
扩展性业务依赖组件不同项目需要采用不同的依赖组件,需要快捷集成新的组件
幂等部署及卸载时可以重复执行

实现

1. 走进tkesack

从tkestack git 获取到的架构图可以看出tkestack分为installer, Global,cluster这三种角色;其中installer 负责tkestack Global集群的安装,当前提供命令行安装模式和图形化安装模式;cluster 角色是作为业务集群,通Global集群纳管。当前我们只需要部署一个Global集群作为业务集群即可满足需求,cluster集群只是为了提供给客户使用tkestack多集群管理使用。

tkestack 在installer 以hooks 方式实现用户自定义扩展, 有如下hook脚本:

  • pre-installer: 主要集群部署前的一些自定义初始化操作
  • post-cluster-ready: 种子集群ready后针对tkestack 部署前的初始化操作
  • post-install: tkestack 部署完毕,部署自定义扩展

默认tkestack部署流程如下:

由于installer 节点在tkestack 设计上计划安装完毕直接废弃,所以tkestack 会在global集群重新部署一个镜像仓库作为后续业务使用,当然也会将tkestack 平台的镜像重新repush到集群内的镜像仓库。所以容器化的自定义扩展主件的部署需要放到post-install 脚本进行触发。

2. 魔改部署配置

  • tkestack git 使用手册给出了两种部署模式一种是web页面配置模式,一种是命令行模式;经过使用发现tkestack有个亮点特性就是配置文件记录了一个step 安装步骤,可以在安装失败后解决问失败原因直接重启tke-installer 即可根据当前step 步骤继续进行安装部署;我们利用这特性实现web页面配置模式也可以命令行模式部署。具体操作是先通过页面配置得到配置文件,把配置文件做成模板; 部署时候通过ansible templet 模块进行渲染。 当前抽取出来配置模板有:
  • tke-ha-lb.json.j2 对应web页面的使用已有,也就是采用负载均衡ip地址作为tkestack集群高可用
  • tke-ha-keepalived.json.j2 对应web页面的TKE提供,采用vip通过keepalived 浮动漂移实现高可用
  • tke-sigle.json.j2 对应web页面的不设置场景, 也就是单master版场景 以下以tke-ha-lb.json.j2 为例:
  1. {
  2. "config": {
  3. "ServerName": "tke-installer",
  4. "ListenAddr": ":8080",
  5. "NoUI": false,
  6. "Config": "conf/tke.json",
  7. "Force": false,
  8. "SyncProjectsWithNamespaces": false,
  9. "Replicas": {{ tke_replicas }}
  10. },
  11. "para": {
  12. "cluster": {
  13. "kind": "Cluster",
  14. "apiVersion": "platform.tkestack.io/v1",
  15. "metadata": {
  16. "name": "global"
  17. },
  18. "spec": {
  19. "finalizers": [
  20. "cluster"
  21. ],
  22. "tenantID": "default",
  23. "displayName": "TKE",
  24. "type": "Baremetal",
  25. "version": "{{ k8s_version }}",
  26. "networkDevice": "{{ net_interface }}",
  27. "clusterCIDR": "{{ cluster_cidr }}",
  28. "dnsDomain": "cluster.local",
  29. "features": {
  30. "ipvs": {{ ipvs }},
  31. "enableMasterSchedule": true,
  32. "ha": {
  33. "thirdParty": {
  34. "vip": "{{ tke_vip }}",
  35. "vport": {{ tke_vport}}
  36. }
  37. }
  38. },
  39. "properties": {
  40. "maxClusterServiceNum": {{ max_cluster_service_num }},
  41. "maxNodePodNum": {{ max_node_pod_num }}
  42. },
  43. "machines": [
  44. {
  45. "ip": "{{ groups['masters'][0] }}",
  46. "port": {{ ansible_port }},
  47. "username": "{{ ansible_ssh_user }}",
  48. "password": "{{ ansible_ssh_pass_base64 }}"
  49. },
  50. {
  51. "ip": "{{ groups['masters'][1] }}",
  52. "port": {{ ansible_port }},
  53. "username": "{{ ansible_ssh_user }}",
  54. "password": "{{ ansible_ssh_pass_base64 }}"
  55. },
  56. {
  57. "ip": "{{ groups['masters'][2] }}",
  58. "port": {{ ansible_port }},
  59. "username": "{{ ansible_ssh_user }}",
  60. "password": "{{ ansible_ssh_pass_base64 }}"
  61. }
  62. ],
  63. "dockerExtraArgs": {
  64. "data-root": "{{ docker_data_root }}"
  65. },
  66. "kubeletExtraArgs": {
  67. "root-dir": "{{ kubelet_root_dir }}"
  68. },
  69. "apiServerExtraArgs": {
  70. "runtime-config": "apps/v1beta1=true,apps/v1beta2=true,extensions/v1beta1/daemonsets=true,extensions/v1beta1/deployments=true,extensions/v1beta1/replicasets=true,extensions/v1beta1/networkpolicies=true,extensions/v1beta1/podsecuritypolicies=true"
  71. }
  72. }
  73. },
  74. "Config": {
  75. "basic": {
  76. "username": "{{ tke_admin_user }}",
  77. "password": "{{ tke_pwd_base64 }}"
  78. },
  79. "auth": {
  80. "tke": {
  81. "tenantID": "default",
  82. "username": "{{ tke_admin_user }}",
  83. "password": "{{ tke_pwd_base64 }}"
  84. }
  85. },
  86. "registry": {
  87. "tke": {
  88. "domain": "{{ tke_registry_domain }}",
  89. "namespace": "library",
  90. "username": "{{ tke_admin_user }}",
  91. "password": "{{ tke_pwd_base64 }}"
  92. }
  93. },
  94. "business": {},
  95. "monitor": {
  96. "influxDB": {
  97. "local": {}
  98. }
  99. },
  100. "ha": {
  101. "thirdParty": {
  102. "vip": "{{ tke_vip }}",
  103. "vport": {{ tke_vport}}
  104. }
  105. },
  106. "gateway": {
  107. "domain": "{{ tke_console_domain }}",
  108. "cert": {
  109. "selfSigned": {}
  110. }
  111. }
  112. }
  113. },
  114. "cluster": {
  115. "kind": "Cluster",
  116. "apiVersion": "platform.tkestack.io/v1",
  117. "metadata": {
  118. "name": "global"
  119. },
  120. "spec": {
  121. "finalizers": [
  122. "cluster"
  123. ],
  124. "tenantID": "default",
  125. "displayName": "TKE",
  126. "type": "Baremetal",
  127. "version": "{{ k8s_version }}",
  128. "networkDevice": "{{ net_interface }}",
  129. "clusterCIDR": "{{ cluster_cidr }}",
  130. "dnsDomain": "cluster.local",
  131. "features": {
  132. "ipvs": {{ ipvs }},
  133. "enableMasterSchedule": true,
  134. "ha": {
  135. "thirdParty": {
  136. "vip": "{{ tke_vip }}",
  137. "vport": {{ tke_vport}}
  138. }
  139. }
  140. },
  141. "properties": {
  142. "maxClusterServiceNum": {{ max_cluster_service_num }},
  143. "maxNodePodNum": {{ max_node_pod_num }}
  144. },
  145. "machines": [
  146. {
  147. "ip": "{{ groups['masters'][0] }}",
  148. "port": {{ ansible_port }},
  149. "username": "{{ ansible_ssh_user }}",
  150. "password": "{{ ansible_ssh_pass_base64 }}"
  151. },
  152. {
  153. "ip": "{{ groups['masters'][1] }}",
  154. "port": {{ ansible_port }},
  155. "username": "{{ ansible_ssh_user }}",
  156. "password": "{{ ansible_ssh_pass_base64 }}"
  157. },
  158. {
  159. "ip": "{{ groups['masters'][2] }}",
  160. "port": {{ ansible_port }},
  161. "username": "{{ ansible_ssh_user }}",
  162. "password": "{{ ansible_ssh_pass_base64 }}"
  163. }
  164. ],
  165. "dockerExtraArgs": {
  166. "data-root": "{{ docker_data_root }}"
  167. },
  168. "kubeletExtraArgs": {
  169. "root-dir": "{{ kubelet_root_dir }}"
  170. },
  171. "apiServerExtraArgs": {
  172. "runtime-config": "apps/v1beta1=true,apps/v1beta2=true,extensions/v1beta1/daemonsets=true,extensions/v1beta1/deployments=true,extensions/v1beta1/replicasets=true,extensions/v1beta1/networkpolicies=true,extensions/v1beta1/podsecuritypolicies=true"
  173. }
  174. }
  175. },
  176. "step": 0 # 重启tke-installer 后会按此步骤执行继续的安装,当前设置为0意味着从零开始
  177. }

为了实现此方式安装,我们的安装脚本如下:

  1. #!/bin/bash
  2. # Author: yhchen
  3. set -e
  4. BASE_DIR=$(cd `dirname $0` && pwd)
  5. cd $BASE_DIR
  6. # get offline-pot parent dir
  7. OFFLINE_POT_PDIR=`echo ${BASE_DIR} | awk -Foffline-pot '{print $1}'`
  8. INSTALL_DIR=/opt/tke-installer
  9. DATA_DIR=${INSTALL_DIR}/data
  10. HOOKS=${OFFLINE_POT_PDIR}offline-pot
  11. IMAGES_DIR="${OFFLINE_POT_PDIR}offline-pot-images"
  12. TGZ_DIR="${OFFLINE_POT_PDIR}offline-pot-tgz"
  13. REPORTS_DIR="${OFFLINE_POT_PDIR}perfor-reports"
  14. version=v1.2.4
  15. init_tke_installer(){
  16. if [ `docker images | grep tke-installer | grep ${version} | wc -l` -eq 0 ]; then
  17. if [ `docker ps -a | grep tke-installer | wc -l` -gt 0 ]; then
  18. docker rm -f tke-installer
  19. fi
  20. if [ `docker images | grep tke-installer | wc -l` -gt 0 ]; then
  21. docker rmi -f `docker images | grep tke-installer | awk '{print $3}'`
  22. fi
  23. cd ${OFFLINE_POT_PDIR}tkestack
  24. if [ -d "${OFFLINE_POT_PDIR}tkestack/tke-installer-x86_64-${version}.run.tmp" ]; then
  25. rm -rf ${OFFLINE_POT_PDIR}tkestack/tke-installer-x86_64-${version}.run.tmp
  26. fi
  27. sha256sum --check --status tke-installer-x86_64-$version.run.sha256 && \
  28. chmod +x tke-installer-x86_64-$version.run && ./tke-installer-x86_64-$version.run
  29. fi
  30. }
  31. reinstall_tke_installer(){
  32. if [ -d "${REPORTS_DIR}" ]; then
  33. mkdir -p ${REPORTS_DIR}
  34. fi
  35. if [ `docker ps -a | grep tke-installer | wc -l` -eq 1 ]; then
  36. docker rm -f tke-installer
  37. rm -rf /opt/tke-installer/data
  38. fi
  39. docker run --restart=always --name tke-installer -d --privileged --net=host -v/etc/hosts:/app/hosts \
  40. -v/etc/docker:/etc/docker -v/var/run/docker.sock:/var/run/docker.sock -v$DATA_DIR:/app/data \
  41. -v$INSTALL_DIR/conf:/app/conf -v$HOOKS:/app/hooks -v$IMAGES_DIR:${IMAGES_DIR} -v${TGZ_DIR}:${TGZ_DIR} \
  42. -v${REPORTS_DIR}:${REPORTS_DIR} tkestack/tke-installer:$version
  43. if [ -f "hosts" ]; then
  44. # set hosts file's dpl_dir variable
  45. sed -i 's#^dpl_dir=.*#dpl_dir=\"'"${HOOKS}"'\"#g' hosts
  46. installer_ip=`cat hosts | grep -A 1 '\[installer\]' | grep -v installer`
  47. echo "please exec install-offline-pot.sh or access http://${installer_ip}:8080 to install offline-pot"
  48. fi
  49. }
  50. main(){
  51. init_tke_installer # 此函数是为了实现当前节点尚未安装过tke-installer, 进行第一次安装实现初始化
  52. reinstall_tke_installer # 此函数是实现自定义安装tke-installer, 主要是为了将扩展的hooks脚本挂载到tke-installer,以及hooks脚本调用到的整个一键部署脚本。
  53. }
  54. main

最终实现开始部署tkestack脚本如下:

  1. #!/bin/bash
  2. # Author: yhchen
  3. set -e
  4. BASE_DIR=$(cd `dirname $0` && pwd)
  5. cd $BASE_DIR
  6. CALL_FUN="defaut"
  7. help(){
  8. echo "show usage:"
  9. echo "init_and_check: will be init hosts, inistall tke-installer and hosts check"
  10. echo "dpl_offline_pot: init tke config and deploy offline-pot"
  11. echo "init_keepalived: just tmp use, when tkestack fix keepalived issue will be remove"
  12. echo "only_install_tkestack: if you want only install tkestack, please -f parameter pass only_install_tkestack"
  13. echo "defualt: will be exec dpl_offline_pot and init_keepalived"
  14. echo "all_func: execute init_and_check, dpl_offline_pot, init_keepalived"
  15. exit 0
  16. }
  17. while getopts ":f:h:" opt
  18. do
  19. case $opt in
  20. f)
  21. CALL_FUN="${OPTARG}"
  22. ;;
  23. h)
  24. hosts="${OPTARG}"
  25. ;;
  26. ?)
  27. echo "unkown args! just suport -f[call function] and -h[ansible hosts group] arg!!!"
  28. exit 0;;
  29. esac
  30. done
  31. INSTALL_DATA_DIR=/opt/tke-installer/data/
  32. init_and_check(){
  33. sh ./init-and-check.sh
  34. }
  35. # init tke config and deploy offline-pot
  36. dpl_offline_pot(){
  37. echo "###### deploy offline-pot start ######"
  38. if [ `docker ps | grep tke-installer | wc -l` -eq 1 ]; then
  39. # deploy tkestack , base commons and business
  40. sh ./offline-pot-cmd.sh -s init-tke-config.sh -f init
  41. docker restart tke-installer
  42. if [ -f "hosts" ]; then
  43. installer_ip=`cat hosts | grep -A 1 '\[installer\]' | grep -v installer`
  44. echo "please exec tail -f ${INSTALL_DATA_DIR}/tke.log or access http://${installer_ip}:8080 check install progress..."
  45. fi
  46. elif [ ! -d "../tkestack" ]; then
  47. # deploy base commons and business on other kubernetes plat
  48. sh ./post-install
  49. else
  50. echo "if first install,please exec init-and-check.sh script, else exec reinstall-offline-pot.sh script" && exit 0
  51. fi
  52. echo "###### deploy offline-pot end ######"
  53. }
  54. # just tmp use, when tkestack fix keepalived issue will be remove
  55. init_keepalived(){
  56. echo "###### init keepalived start ######"
  57. if [ -f "${INSTALL_DATA_DIR}/tke.json" ]; then
  58. if [ `cat ${INSTALL_DATA_DIR}/tke.json | grep -i '"ha"' | wc -l` -gt 0 ]; then
  59. nohup sh ./init_keepalived.sh 2>&1 > ${INSTALL_DATA_DIR}/dpl-keepalived.log &
  60. fi
  61. fi
  62. echo "###### init keepalived end ######"
  63. }
  64. # only install tkestack
  65. only_install_tkestack(){
  66. echo "###### install tkestack start ######"
  67. # change tke components's replicas number
  68. if [ -f "hosts" ]; then
  69. sed -i 's/tke_replicas="1"/tke_replicas="2"/g' hosts
  70. fi
  71. # hosts init
  72. if [ `docker ps | grep tke-installer | wc -l` -eq 1 ]; then
  73. sh ./offline-pot-cmd.sh -s host-init.sh -f sshd_init
  74. sh ./offline-pot-cmd.sh -s host-init.sh -f selinux_init
  75. sh ./offline-pot-cmd.sh -s host-init.sh -f remove_devnet_proxy
  76. sh ./offline-pot-cmd.sh -s host-init.sh -f add_domains
  77. sh ./offline-pot-cmd.sh -s host-init.sh -f data_disk_init
  78. sh ./offline-pot-cmd.sh -s host-init.sh -f check_iptables
  79. else
  80. echo "please exec install-tke-installer.sh to start tke-installer" && exit 0
  81. fi
  82. # start install tkestack
  83. dpl_offline_pot
  84. init_keepalived
  85. echo "###### install tkestack end ######"
  86. }
  87. defaut(){
  88. # change tke components's replicas number
  89. if [ -f "hosts" ]; then
  90. sed -i 's/tke_replicas="2"/tke_replicas="1"/g' hosts
  91. fi
  92. # only deploy tkestack
  93. if [ -d '../tkestack' ] && [ ! -d "../offline-pot-images" ] && [ ! -d "../offline-pot-tgz" ]; then
  94. only_install_tkestack
  95. fi
  96. dpl_offline_pot
  97. # when deploy tkestack will be init keepalived config
  98. if [ -d '../tkestack' ]; then
  99. init_keepalived
  100. fi
  101. }
  102. all_func(){
  103. # change tke components's replicas number
  104. if [ -f "hosts" ]; then
  105. sed -i 's/tke_replicas="2"/tke_replicas="1"/g' hosts
  106. fi
  107. init_and_check
  108. defaut
  109. }
  110. main(){
  111. $CALL_FUN || help
  112. }
  113. main

此脚本主要是判断当前部署是否需要部署tkestack或者是否单独部署tkestack,若是部署tkestack则生成tkestack 所需的配置文件,然后通过docker restart tke-installer 即可出发tkestack部署以及业务依赖组件,业务部署。

  • 添加worker节点
  • 增加自定义参数使集群更稳,更强。主要增加自定义参数如下:

    1. 1. dockerExtraArgs data-root 制定docker 目录到数据盘,避免系统盘太小导致节点磁盘使用率很快到达节点压力阈值以至于节点处于not ready状态
    2. 2. kubeletExtraArgs kubelete自定义参数 root-dir docker data-root 参数作用一致
    3. 3. kubeletExtraArgs kube-apiserver runtime-config apps/v1beta1=true,apps/v1beta2=true,extensions/v1beta1/daemonsets=true,extensions/v1beta1/deployments=true,extensions/v1beta1/replicasets=true,extensions/v1beta1/networkpolicies=true,extensions/v1beta1/podsecuritypolicies=true 增加工作负载deployment

,statefulset的api version兼容性

  1. ```

当前通过ansible set facts 方式,ansible when 条件执行,以及shell 命令增加判断方式实现幂等;通过设置开关+hooks+ansible tag方式实现扩展性和解耦。
最终私有化一键部署流程如下:

3. 业务应用

业务通过helmfile release(release名称必须以${中心名}-${客户名简称})来组织不同客户部署不同的业务组件,不同release对应到不同的业务组件的helm chart value,当打包业务helm时会根据release ${中心名}-${客户名简称}.yaml文件定义的业务组件进行过滤打包,完成业务按需部署。

私有化一键部署时会通过helmfile 工具进行部署,如上图所示。

更新过程私有化一键部署当前不纳入管控,只负责将 agent 部署到master1 节点作为cicd执行更新agent,具体流程如上图所示。

合理利用tkestack特性(用好80%),结合自身业务场景做出满足需求私有化一键部署(做好20%)。

不足

  • 当前所有镜像都是打成tar附件模式打包安装包,使得安装包有点大;同时部署集群镜像仓库时还需要从installer节点的镜像仓库重新将镜像推送至集群镜像仓库,这个耗时很大;建议将出包时将镜像推送到离线镜像仓库,然后将离线镜像仓库持久化目录打包这样合理利用镜像特性缩减安装包大小;部署时拷贝镜像仓库持久化数据到对应目录并挂载,加速部署。