Raven

本文将介绍如何使用Raven来增强边缘集群中的边-边和边-云网络打通能力。

假设你已经有了一个边缘kubernetes集群,节点分布在不同的物理区域如图所示,并且已经在这个集群中部署了Yurt ManagerRaven Agent,如果没有部署可以参照安装教程raven_deploy

1. 节点打标区分不同网络域

如下所示,假设你的边缘集群中有五个节点,分布在三个不同的物理(网络)区域,其中节点master节点同样也是云端节点。

  1. $ kubectl get nodes -o wide
  2. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
  3. izbp15inok0kbfkg3in52rz Ready Edge-HZ-1 27h v1.22.11 172.16.2.103 <none> CentOS Linux 7 (Core) 3.10.0-1160.81.1.el7.x86_64 docker://19.3.15
  4. izbp15inok0kbfkg3in52sz Ready Edge-HZ-2 26h v1.22.11 172.16.2.104 <none> CentOS Linux 7 (Core) 3.10.0-1160.81.1.el7.x86_64 docker://19.3.15
  5. izm5eb24dmjfimuaybpnqzz Ready Edge-QD-1 29h v1.22.11 172.16.1.89 <none> CentOS Linux 7 (Core) 3.10.0-1160.80.1.el7.x86_64 docker://19.3.15
  6. izm5eb24dmjfimuaybpnr0z Ready Edge-QD-2 29h v1.22.11 172.16.1.90 <none> CentOS Linux 7 (Core) 3.10.0-1160.80.1.el7.x86_64 docker://19.3.15
  7. izwz9dohcv74iegqecp4axz Ready control-plane,master 5d21h v1.22.11 192.168.0.195 <none> CentOS Linux 7 (Core) 3.10.0-1160.80.1.el7.x86_64 docker://20.10.2
  8. izwz9ey0js5z7mornclpd6z Ready cloud 3h3m v1.22.11 192.168.0.196 <none> CentOS Linux 7 (Core) 3.10.0-1160.80.1.el7.x86_64 docker://20.10.2

我们对位于不同物理(网络)区域节点,分别使用一个Gateway CR来进行管理。通过给节点打标的方式,来标识节点由哪个Gateway管理。

通过如下命令,我们给位于hangzhou的节点打gw-hangzhou的标签,来表明这些节点是由gw-hangzhou这个Gateway CR来管理的。

  1. $ kubectl label nodes izbp15inok0kbfkg3in52rz izbp15inok0kbfkg3in52sz raven.openyurt.io/gateway=gw-hangzhou
  2. node/izbp15inok0kbfkg3in52rz not labeled
  3. node/izbp15inok0kbfkg3in52sz not labeled

同样地,我们分别为位于云端节点和master节点打上gw-cloud,和给位于qingdao的节点打gw-qingdao的标签。

  1. $ kubectl label nodes izwz9dohcv74iegqecp4axz izwz9ey0js5z7mornclpd6z raven.openyurt.io/gateway=gw-cloud
  2. node/izwz9dohcv74iegqecp4axz labeled
  3. node/izwz9ey0js5z7mornclpd6z labeled
  1. $ kubectl label nodes izm5eb24dmjfimuaybpnqzz izm5eb24dmjfimuaybpnr0z raven.openyurt.io/gateway=gw-qingdao
  2. node/izm5eb24dmjfimuaybpnqzz labeled
  3. node/izm5eb24dmjfimuaybpnr0z labeled

运行如下命令,检查相应的Raven Agent的Pod是否成功运行。

  1. $ kubectl get pod -n kube-system | grep raven-agent-ds
  2. raven-agent-ds-4b587 1/1 Running 0 25h
  3. raven-agent-ds-dmh66 1/1 Running 0 25h
  4. raven-agent-ds-gb5qj 1/1 Running 0 25h
  5. raven-agent-ds-gzpfh 1/1 Running 0 170m
  6. raven-agent-ds-ksxq6 1/1 Running 0 25h
  7. raven-agent-ds-qhjtb 1/1 Running 0 25h

2. 如何使用

2.1 Gateways

  • 创建的Gateway CR
  1. $ cat <<EOF | kubectl apply -f -
  2. apiVersion: raven.openyurt.io/v1beta1
  3. kind: Gateway
  4. metadata:
  5. name: gw-hangzhou
  6. spec:
  7. proxyConfig:
  8. Replicas: 1
  9. tunnelConfig:
  10. Replicas: 1
  11. endpoints:
  12. - nodeName: izbp15inok0kbfkg3in52rz
  13. underNAT: true
  14. port: 10262
  15. type: proxy
  16. - nodeName: izbp15inok0kbfkg3in52rz
  17. underNAT: true
  18. port: 4500
  19. underNAT: true
  20. type: tunnel
  21. ---
  22. apiVersion: raven.openyurt.io/v1alpha1
  23. kind: Gateway
  24. metadata:
  25. name: gw-cloud
  26. spec:
  27. exposeType: PublicIP
  28. proxyConfig:
  29. Replicas: 1
  30. proxyHTTPPort: 10255,9445
  31. proxyHTTPSPort: 10250,9100
  32. tunnelConfig:
  33. Replicas: 1
  34. endpoints:
  35. - nodeName: izwz9dohcv74iegqecp4axz
  36. underNAT: false
  37. port: 10262
  38. type: proxy
  39. publicIP: 120.79.xxx.xxx
  40. - nodeName: izwz9dohcv74iegqecp4axz
  41. underNAT: false
  42. port: 4500
  43. type: tunnel
  44. publicIP: 120.79.xxx.xxx
  45. ---
  46. apiVersion: raven.openyurt.io/v1alpha1
  47. kind: Gateway
  48. metadata:
  49. name: gw-qingdao
  50. spec:
  51. proxyConfig:
  52. Replicas: 1
  53. tunnelConfig:
  54. Replicas: 1
  55. endpoints:
  56. - nodeName: izm5eb24dmjfimuaybpnqzz
  57. underNAT: true
  58. port: 10262
  59. type: proxy
  60. - nodeName: izm5eb24dmjfimuaybpnr0z
  61. underNAT: true
  62. port: 4500
  63. type: tunnel
  64. EOF
  • 参数介绍:
  1. spec.exposedType: 在公网暴露的类型,LoadBalancer为采用负载均衡暴露、PublicIP为采用公网IP暴露,空为不暴露,一般云上暴露,边缘不暴露
  2. spec.endpoints: 表示一组备选的网关节点,控制面会根据节点状态在其中选取一部分作为网关节点
  3. spec.endpoints.nodeName: 网关节点名
  4. spec.endpoints.type: 网关节点的类型, 代理模式为proxy,隧道模式为tunnel
  5. spec.endpoints.port: 网关节点服务暴露的端口:代理模式一般为TCP 10262,隧道模式为UDP 4500
  6. spec.endpoints.publicIP: 网关节点的公网地址
  7. spec.endpoints.underNAT: 是否采用NAT的方式访问公网,一般云上采用false,边缘采用true
  8. spec.proxyConfig.Replicas: 支持代理模式的网关节点副本数,不得大于endpoints中节点数
  9. spec.proxyConfig.proxyHTTPPort: 云边代理模式通信代理的非安全端口, 例如kubelet监听的10255端口
  10. spec.proxyConfig.proxyHTTPPort: 云边代理模式通信代理的安全端口, 例如kubelet监听的10250端口
  11. spec.tunnelConfig.proxyHTTPPort: 支持隧道模式的网关节点的副本数,目前不支持多副本
  12. status.activeEndpoints: 从spec.endpoints 的备选网关节点中选择指定数量的网关节点,激活的网关节点上的RavenAgent作为运行实例负责隧道构建和路由管理
  13. status.nodes: 由本Gateway负责代理的节点
  • 查看各个Gateway CR的状态
  1. 确保Gateway的Status中是否选举了网关节点,选举动作由Yurt-Manager组件GatewayPickup Controller负责。
  2. 确认公网地址、暴露端口是否正确
  3. 确认开启的模式是否达到预期

开启隧道模式,设置 enable-l3-tunnel: "true" 开启代理模式,设置 enable-l7-proxy: "true"

  1. $ kubectl get cm raven-cfg -n kube-system -o yaml
  2. apiVersion: v1
  3. data:
  4. enable-l3-tunnel: "true"
  5. enable-l7-proxy: "true"
  6. kind: ConfigMap
  7. metadata:
  8. annotations:
  9. meta.helm.sh/release-name: raven-agent
  10. meta.helm.sh/release-namespace: kube-system
  11. creationTimestamp: "2023-11-24T06:44:54Z"
  12. labels:
  13. app.kubernetes.io/managed-by: Helm
  14. name: raven-cfg
  15. namespace: kube-system
  1. $ kubectl get gateways
  2. NAME AGE
  3. gw-cloud 22h
  4. gw-hangzhou 22h
  5. gw-qingdao 22h
  6. $ kubectl get gateway gw-cloud -o yaml
  7. apiVersion: raven.openyurt.io/v1alpha1
  8. kind: Gateway
  9. metadata:
  10. name: gw-cloud
  11. spec:
  12. exposeType: PublicIP
  13. proxyConfig:
  14. Replicas: 1
  15. proxyHTTPPort: 10255,9445
  16. proxyHTTPSPort: 10250,9100
  17. tunnelConfig:
  18. Replicas: 1
  19. endpoints:
  20. - nodeName: izwz9dohcv74iegqecp4axz
  21. underNAT: false
  22. port: 10262
  23. type: proxy
  24. publicIP: 120.79.xxx.xxx
  25. - nodeName: izwz9dohcv74iegqecp4axz
  26. underNAT: false
  27. port: 4500
  28. type: tunnel
  29. publicIP: 120.79.xxx.xxx
  30. status:
  31. activeEndpoints:
  32. - config:
  33. enable-l7-proxy: "true"
  34. nodeName: izwz9dohcv74iegqecp4axz
  35. port: 10262
  36. publicIP: 47.xxx.xxx.xxx
  37. type: proxy
  38. - config:
  39. enable-l3-tunnel: "true"
  40. nodeName: izwz9dohcv74iegqecp4axz
  41. port: 4500
  42. publicIP: 47.xxx.xxx.xxx
  43. type: tunnel
  44. nodes:
  45. - nodeName: izwz9dohcv74iegqecp4axz
  46. privateIP: 192.168.0.195
  47. subnets:
  48. - 10.224.0.128/26
  49. - nodeName: izwz9ey0js5z7mornclpd6z
  50. privateIP: 192.168.0.196
  51. subnets:
  52. - 10.224.0.0/26
  53. $ kubectl get gateway gw-hangzhou -o yaml
  54. apiVersion: raven.openyurt.io/v1beta1
  55. kind: Gateway
  56. metadata:
  57. name: gw-hangzhou
  58. spec:
  59. proxyConfig:
  60. Replicas: 1
  61. tunnelConfig:
  62. Replicas: 1
  63. endpoints:
  64. - nodeName: izbp15inok0kbfkg3in52rz
  65. underNAT: true
  66. port: 10262
  67. type: proxy
  68. - nodeName: izbp15inok0kbfkg3in52rz
  69. underNAT: true
  70. port: 4500
  71. underNAT: true
  72. type: tunnel
  73. status:
  74. activeEndpoints:
  75. - config:
  76. enable-l7-proxy: "true"
  77. nodeName: izbp15inok0kbfkg3in52rz
  78. port: 10262
  79. publicIP: 120.79.xxx.xxx
  80. type: proxy
  81. - config:
  82. enable-l3-tunnel: "true"
  83. nodeName: izbp15inok0kbfkg3in52rz
  84. port: 4500
  85. publicIP: 120.79.xxx.xxx
  86. type: tunnel
  87. nodes:
  88. - nodeName: izbp15inok0kbfkg3in52rz
  89. privateIP: 172.16.2.103
  90. subnets:
  91. - 10.224.1.128/26
  92. - nodeName: izbp15inok0kbfkg3in52sz
  93. privateIP: 172.16.2.104
  94. subnets:
  95. - 10.224.1.0/26

2.2 测试位于不同网络域的Pod网络联通性 (隧道模式)

  • 创建测试Pod
  1. $ cat <<EOF | kubectl apply -f -
  2. apiVersion: apps/v1
  3. kind: Deployment
  4. metadata:
  5. name: busy-box
  6. spec:
  7. replicas: 4
  8. selector:
  9. matchLabels:
  10. app: busy-box
  11. template:
  12. metadata:
  13. labels:
  14. app: busy-box
  15. spec:
  16. containers:
  17. - name: busy-box
  18. image: busybox
  19. command:
  20. - /bin/sh
  21. - -c
  22. - sleep 3000
  23. nodeSelector:
  24. openyurt.io/is-edge-worker: "true"
  25. EOF
  • 确定测试Pod正常运行
  1. $ kubectl get pod -o wide
  2. busy-box-6f46f8585b-48zb9 1/1 Running 0 76s 10.244.19.3 izbp15inok0kbfkg3in52sz <none> <none>
  3. busy-box-6f46f8585b-9nm64 1/1 Running 0 76s 10.244.16.161 izm5eb24dmjfimuaybpnqzz <none> <none>
  4. busy-box-6f46f8585b-kv4dw 1/1 Running 0 76s 10.244.17.19 izm5eb24dmjfimuaybpnr0z <none> <none>
  5. busy-box-6f46f8585b-t5v9d 1/1 Running 0 76s 10.244.18.4 izbp15inok0kbfkg3in52rz <none> <none>
  • 测试跨网络域的Pod网络联通
  1. $ kubectl exec -it busy-box-6f46f8585b-48zb9 -- sh
  2. / # ping 10.244.17.19 -c 4
  3. PING 10.244.17.19 (10.244.17.19): 56 data bytes
  4. 64 bytes from 10.244.17.19: seq=0 ttl=59 time=78.048 ms
  5. 64 bytes from 10.244.17.19: seq=1 ttl=59 time=77.424 ms
  6. 64 bytes from 10.244.17.19: seq=2 ttl=59 time=77.490 ms
  7. 64 bytes from 10.244.17.19: seq=3 ttl=59 time=77.472 ms
  8. --- 10.244.17.19 ping statistics ---
  9. 4 packets transmitted, 4 packets received, 0% packet loss
  10. round-trip min/avg/max = 77.424/77.608/78.048 ms
  • 测试跨网络域的节点网络联通: 登陆到非Gateway节点Edge-HZ-2 ping 非Gateway节点Edge-QD-2
  1. # 节点Edge-HZ-2(非Gateway节点):
  2. ping 172.16.1.90 -c 4
  3. PING 172.16.1.90 (172.16.1.90) 56(84) bytes of data.
  4. 64 bytes from 172.16.1.90: icmp_seq=1 ttl=61 time=77.5 ms
  5. 64 bytes from 172.16.1.90: icmp_seq=2 ttl=61 time=77.3 ms
  6. 64 bytes from 172.16.1.90: icmp_seq=3 ttl=61 time=78.5 ms
  7. 64 bytes from 172.16.1.90: icmp_seq=4 ttl=61 time=77.3 ms
  8. --- 172.16.1.90 ping statistics ---
  9. 4 packets transmitted, 4 received, 0% packet loss, time 3003ms
  10. rtt min/avg/max/mdev = 77.314/77.682/78.531/0.533 ms
  1. # 抓包
  2. # 节点Edge-HZ-1(Gateway节点):
  3. tcpdump -i raven0
  4. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  5. listening on raven0, link-type EN10MB (Ethernet), capture size 262144 bytes
  6. 16:13:12.132496 IP 172.16.2.104 > 172.16.1.90: ICMP echo request, id 2, seq 1, length 64
  7. 16:13:13.133606 IP 172.16.2.104 > 172.16.1.90: ICMP echo request, id 2, seq 2, length 64
  8. 16:13:14.134172 IP 172.16.2.104 > 172.16.1.90: ICMP echo request, id 2, seq 3, length 64
  9. 16:13:15.135570 IP 172.16.2.104 > 172.16.1.90: ICMP echo request, id 2, seq 4, length 64
  1. # 抓包
  2. # 节点Edge-QD-1(Gateway节点):
  3. tcpdump -i raven0
  4. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  5. listening on raven0, link-type EN10MB (Ethernet), capture size 262144 bytes
  6. 16:13:12.174023 IP 172.16.1.90 > 172.16.2.104: ICMP echo reply, id 2, seq 1, length 64
  7. 16:13:13.175096 IP 172.16.1.90 > 172.16.2.104: ICMP echo reply, id 2, seq 2, length 64
  8. 16:13:14.176813 IP 172.16.1.90 > 172.16.2.104: ICMP echo reply, id 2, seq 3, length 64
  9. 16:13:15.177024 IP 172.16.1.90 > 172.16.2.104: ICMP echo reply, id 2, seq 4, length 64
  1. # 抓包
  2. # 节点Edge-QD-2(非Gateway节点):
  3. tcpdump -i raven0
  4. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  5. listening on raven0, link-type EN10MB (Ethernet), capture size 262144 bytes
  6. 16:13:12.173087 IP iZm5eb24dmjfimuaybpnr0Z > 172.16.2.104: ICMP echo reply, id 2, seq 1, length 64
  7. 16:13:13.174148 IP iZm5eb24dmjfimuaybpnr0Z > 172.16.2.104: ICMP echo reply, id 2, seq 2, length 64
  8. 16:13:14.175884 IP iZm5eb24dmjfimuaybpnr0Z > 172.16.2.104: ICMP echo reply, id 2, seq 3, length 64
  9. 16:13:15.176090 IP iZm5eb24dmjfimuaybpnr0Z > 172.16.2.104: ICMP echo reply, id 2, seq 4, length 64

2.3 云边主机网络七层请求代理 (代理模式)

在边缘场景中,边缘设备往往处在封闭的内网环境中,因此边缘设备的内网IP地址常常会出现冲突,因此隧道模式不能支持IP冲突场景下的主机通信,因此需要开启代理模式,支持跨域的HTTP/HTTPS的请求。 开启代理模式,设置 enable-l7-proxy: "true"

注意:如果您只需要开启七层请求代理,并且边缘节点都是独立存在具有公网访问能力,只需要创建一个云上Gateway CR 即可,每个边缘节点都会主动与云上Gateway建立反向链接,对于一组边缘节点处于一个网络域,您可以为其创建Gateway CR,并且选出备选节点作为代理网关。

  1. $ kubectl get cm raven-cfg -n kube-system -o yaml
  2. apiVersion: v1
  3. data:
  4. enable-l3-tunnel: "true"
  5. enable-l7-proxy: "true"
  6. kind: ConfigMap
  7. metadata:
  8. annotations:
  9. meta.helm.sh/release-name: raven-agent
  10. meta.helm.sh/release-namespace: kube-system
  11. creationTimestamp: "2023-11-24T06:44:54Z"
  12. labels:
  13. app.kubernetes.io/managed-by: Helm
  14. name: raven-cfg
  15. namespace: kube-system
  1. $ kubectl exec -it busy-box-6f46f8585b-48zb9 -- sh
  2. echo hello word
  3. hello word

其他特性:

默认情况下,raven 使用 IPSec 作为 VPN 后端,我们还提供WireGuard作为替代方案。您可以通过以下步骤切换到 WireGuard 后端:

  • Raven 需要在集群中的网关节点上加载 WireGuard 内核模块。从 Linux 5.6 开始,内核包含 WireGuard in-tree;具有旧内核的 Linux 发行版将需要安装 WireGuard。对于大多数 Linux 发行版,这可以使用系统包管理器来完成。有关详细信息,请参阅安装 WireGuard。
  • 网关节点将需要一个开放的 UDP 端口才能进行通信。默认情况下,WireGuard 使用 UDP 端口 51820。 运行以下命令:

    1. cd raven
    2. git checkout v0.4.0
    3. VPN_DRIVER=wireguard make deploy

如何排查VPN问题:

  • 如采用IPSec隧道(libreswan方式)作为后端实现,可进入raven agent容器内,通过命令 ipsec status/look/usr/libexec/ipsec status/look 查看相关状态,并且合理运用ipsec工具排查相关问题。
  • 如采用Wiregurad隧道作为VPN后端实现,可进入raven agent容器内,安装wireguard-tools工具,参照工具说明排查相关问题。
  • Raven完全采用开源IPSec、Wireguard工具,无任何定制化,您可以参照开源社区以及相关技术博客解决日常问题。