使用 Prometheus 监控 Karmada 控制面

Prometheus 是一个云原生计算基金会(CNCF)项目,是一个系统和服务监控体系。它以给定的时间间隔从配置目标处采集指标,评估规则表达式,显示结果,还能在观测到指定的条件时触发警报。

本文举例演示如何使用 Prometheus 监控 Karmada 控制面。

启动 Karmada 集群

你只需克隆 Karmada 代码仓库,在 Karmada 目录中运行以下脚本。

  1. hack/local-up-karmada.sh

启动 Prometheus

  1. karmada-host 上下文和 karmada-apiserver 上下文中创建 RBAC 的资源

    1. apiVersion: v1
    2. kind: Namespace
    3. metadata:
    4. name: monitor
    5. labels:
    6. name: monitor
    7. ---
    8. apiVersion: rbac.authorization.k8s.io/v1
    9. kind: ClusterRole
    10. metadata:
    11. name: prometheus
    12. rules:
    13. - apiGroups: [""]
    14. resources:
    15. - nodes
    16. - nodes/proxy
    17. - services
    18. - endpoints
    19. - pods
    20. verbs: ["get", "list", "watch"]
    21. - apiGroups:
    22. - extensions
    23. resources:
    24. - ingresses
    25. verbs: ["get", "list", "watch"]
    26. - nonResourceURLs: ["/metrics"]
    27. verbs: ["get"]
    28. - apiGroups:
    29. - 'cluster.karmada.io'
    30. resources:
    31. - '*'
    32. verbs:
    33. - '*'
    34. ---
    35. apiVersion: v1
    36. kind: ServiceAccount
    37. metadata:
    38. name: prometheus
    39. namespace: monitor
    40. ---
    41. apiVersion: rbac.authorization.k8s.io/v1
    42. kind: ClusterRoleBinding
    43. metadata:
    44. name: prometheus
    45. roleRef:
    46. apiGroup: rbac.authorization.k8s.io
    47. kind: ClusterRole
    48. name: prometheus
    49. subjects:
    50. - kind: ServiceAccount
    51. name: prometheus
    52. namespace: monitor
  2. 为 ServiceAccount 创建 Secret [K8s v1.24+ 需执行此操作] (在 v1.24+ 中创建 ServiceAccount 不会自动生成 Secret)

    1. apiVersion: v1
    2. kind: Secret
    3. type: kubernetes.io/service-account-token
    4. metadata:
    5. name: prometheus
    6. namespace: monitor
    7. annotations:
    8. kubernetes.io/service-account.name: "prometheus"
  3. 获取访问 Karmada apiserver 的令牌

    1. kubectl get secret prometheus -o=jsonpath={.data.token} -n monitor --context "karmada-apiserver" | base64 -d
  4. karmada-host 上下文中创建 Prometheus 的资源对象,你还需要(在 2 处位置)将 <karmada-token> 替换为第 3 步获取到的令牌

    1. apiVersion: v1
    2. kind: ConfigMap
    3. metadata:
    4. name: prometheus-config
    5. namespace: monitor
    6. data:
    7. prometheus.yml: |-
    8. global:
    9. scrape_interval: 15s
    10. evaluation_interval: 15s
    11. scrape_configs:
    12. - job_name: 'karmada-scheduler'
    13. kubernetes_sd_configs:
    14. - role: pod
    15. scheme: http
    16. tls_config:
    17. insecure_skip_verify: true
    18. relabel_configs:
    19. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app]
    20. action: keep
    21. regex: karmada-system;karmada-scheduler
    22. - target_label: __address__
    23. source_labels: [__address__]
    24. regex: '(.*)'
    25. replacement: '${1}:10351'
    26. action: replace
    27. - job_name: 'karmada-controller-manager'
    28. kubernetes_sd_configs:
    29. - role: pod
    30. scheme: http
    31. tls_config:
    32. insecure_skip_verify: true
    33. relabel_configs:
    34. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app]
    35. action: keep
    36. regex: karmada-system;karmada-controller-manager
    37. - target_label: __address__
    38. source_labels: [__address__]
    39. regex: '(.*)'
    40. replacement: '${1}:8080'
    41. action: replace
    42. - job_name: 'kubernetes-apiserver'
    43. kubernetes_sd_configs:
    44. - role: endpoints
    45. scheme: https
    46. tls_config:
    47. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    48. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    49. relabel_configs:
    50. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    51. action: keep
    52. regex: default;kubernetes;https
    53. - target_label: __address__
    54. replacement: kubernetes.default.svc:443
    55. - job_name: 'karmada-apiserver'
    56. kubernetes_sd_configs:
    57. - role: endpoints
    58. scheme: https
    59. tls_config:
    60. insecure_skip_verify: true
    61. bearer_token: <karmada-token> # 需要真实的 karmada 令牌
    62. relabel_configs:
    63. - source_labels: [__meta_kubernetes_pod_label_app]
    64. action: keep
    65. regex: karmada-apiserver
    66. - target_label: __address__
    67. replacement: karmada-apiserver.karmada-system.svc:5443
    68. - job_name: 'karmada-aggregated-apiserver'
    69. kubernetes_sd_configs:
    70. - role: endpoints
    71. scheme: https
    72. tls_config:
    73. insecure_skip_verify: true
    74. bearer_token: <karmada-token> # 需要真实的 karmada 令牌
    75. relabel_configs:
    76. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoints_name]
    77. action: keep
    78. regex: karmada-system;karmada-aggregated-apiserver;karmada-aggregated-apiserver
    79. - target_label: __address__
    80. replacement: karmada-aggregated-apiserver.karmada-system.svc:443
    81. - job_name: 'kubernetes-cadvisor'
    82. scheme: https
    83. tls_config:
    84. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    85. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    86. kubernetes_sd_configs:
    87. - role: node
    88. relabel_configs:
    89. - target_label: __address__
    90. replacement: kubernetes.default.svc:443
    91. - source_labels: [__meta_kubernetes_node_name]
    92. regex: (.+)
    93. target_label: __metrics_path__
    94. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    95. - action: labelmap
    96. regex: __meta_kubernetes_node_label_(.+)
    97. metric_relabel_configs:
    98. - action: replace
    99. source_labels: [id]
    100. regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
    101. target_label: rkt_container_name
    102. replacement: '${2}-${1}'
    103. - action: replace
    104. source_labels: [id]
    105. regex: '^/system\.slice/(.+)\.service$'
    106. target_label: systemd_service_name
    107. replacement: '${1}'
    108. - source_labels: [pod]
    109. separator: ;
    110. regex: (.+)
    111. target_label: pod_name
    112. replacement: $1
    113. action: replace
    114. ---
    115. apiVersion: v1
    116. kind: "Service"
    117. metadata:
    118. name: prometheus
    119. namespace: monitor
    120. labels:
    121. name: prometheus
    122. spec:
    123. ports:
    124. - name: prometheus
    125. protocol: TCP
    126. port: 9090
    127. targetPort: 9090
    128. nodePort: 31801
    129. selector:
    130. app: prometheus
    131. type: NodePort
    132. ---
    133. apiVersion: apps/v1
    134. kind: Deployment
    135. metadata:
    136. labels:
    137. name: prometheus
    138. name: prometheus
    139. namespace: monitor
    140. spec:
    141. replicas: 1
    142. selector:
    143. matchLabels:
    144. app: prometheus
    145. template:
    146. metadata:
    147. labels:
    148. app: prometheus
    149. spec:
    150. serviceAccountName: prometheus
    151. containers:
    152. - name: prometheus
    153. image: prom/prometheus:latest
    154. command:
    155. - "/bin/prometheus"
    156. args:
    157. - "--config.file=/etc/prometheus/prometheus.yml"
    158. - "--storage.tsdb.path=/prom-data"
    159. - "--storage.tsdb.retention.time=180d"
    160. ports:
    161. - containerPort: 9090
    162. protocol: TCP
    163. volumeMounts:
    164. - mountPath: "/etc/prometheus"
    165. name: prometheus-config
    166. - mountPath: "/prom-data"
    167. name: prom-data
    168. initContainers:
    169. - name: prometheus-data-permission-fix
    170. image: busybox
    171. command: ["/bin/chmod","-R","777", "/data"]
    172. volumeMounts:
    173. - name: prom-data
    174. mountPath: /data
    175. volumes:
    176. - name: prometheus-config
    177. configMap:
    178. name: prometheus-config
    179. - name: prom-data
    180. hostPath:
    181. path: /var/lib/prom-data
    182. type: DirectoryOrCreate
  5. 使用控制面的任意 NodeIP 和端口号(默认为 31801)进入控制面的 Prometheus 监控页面

使用 Grafana 直观显示指标

为了提高指标显示的效果和体验,我们还在 Prometheus 的基础上使用 Grafana,以及社区提供的仪表盘

  1. 通过 Helm 安装 Grafana

    1. helm repo add grafana https://grafana.github.io/helm-charts
    2. helm repo update
    3. cat <<EOF | helm upgrade --install grafana grafana/grafana --kube-context "karmada-host" -n monitor -f -
    4. persistence:
    5. enabled: true
    6. storageClassName: local-storage
    7. service:
    8. enabled: true
    9. type: NodePort
    10. nodePort: 31802
    11. targetPort: 3000
    12. port: 80
    13. EOF
  2. 获取 Grafana 网页界面的登录密码

    1. kubectl get secret --namespace monitor grafana -o jsonpath="{.data.admin-password}" --context "karmada-host" | base64 --decode ; echo
  3. 使用控制面的任意 NodeIP 和端口号(默认为 31802)进入控制面的 Grafana 网页界面 imag

注意

在 k8s v1.24+ 中,来自 cadvisor 的指标可能遗漏图片、名称和容器标签,这会导致无法观测 Karmada 组件(例如 karmada-apisever、kamada-controller-manager)的指标。参见相关链接

参考资料