基于VictoriaMetrics + Grafana的监控系统

监控(Monitor)与度量(Metrics)是可观测性的重要环节。

在本节中,我们将使用VirtorialMetrics构建自己的监控系统。

提到监控系统的工具,你可能会想到老牌的Zabbix、Nagios,也可能听说过新星的Prometheus。

Prometheus是一个开源的监控系统,凭借开放的生态环境、云原生等特性,逐步成为了微服务架构下的事实标准。

然而,由于Prometheus设计初期并没有考虑存储扩展性,因此当监控的metrics升高到每秒百万级别后,会出现较为明显的性能瓶颈。

VictoriaMetrics是进来快速崛起的开源监控项目,其在设计之处就支持水平拓展,并且兼容了Prometheus的协议,可以应对日益增长的metrics需求。

Grafana是一款开源的可视化分析工具,通过丰富的仪表盘,让用户能够更直观的理解Metrics。

本节,我们将基于Victoria-Metrics + Grafana搭建监控系统。

安装VictoriaMetrics

在下面的章节,我们将演示搭建vm的single版本,由于VM出色的性能,single已经足以应对中小企业的监控需求。你可以根据实际的需要,部署集群版本)。

首先添加helm源

  1. helm repo add vm https://victoriametrics.github.io/helm-charts/
  2. helm repo update
  1. helm search repo vm/
  2. NAME CHART VERSION APP VERSION DESCRIPTION
  3. vm/victoria-metrics-agent 0.7.34 v1.69.0 Victoria Metrics Agent - collects metrics from ...
  4. vm/victoria-metrics-alert 0.4.14 v1.69.0 Victoria Metrics Alert - executes a list of giv...
  5. vm/victoria-metrics-auth 0.2.33 1.69.0 Victoria Metrics Auth - is a simple auth proxy ...
  6. vm/victoria-metrics-cluster 0.9.12 1.69.0 Victoria Metrics Cluster version - high-perform...
  7. vm/victoria-metrics-k8s-stack 0.5.9 1.69.0 Kubernetes monitoring on VictoriaMetrics stack....
  8. vm/victoria-metrics-operator 0.4.2 0.20.3 Victoria Metrics Operator
  9. vm/victoria-metrics-single 0.8.12 1.69.0 Victoria Metrics Single version - high-performa...

我们查看所有可配置的参数选项:

  1. helm show values vm/victoria-metrics-single > values.yaml

将其修改为如下设置:

  1. server:
  2. persistentVolume:
  3. enabled: false
  4. accessModes:
  5. - ReadWriteOnce
  6. annotations: {}
  7. storageClass: ""
  8. existingClaim: ""
  9. matchLabels: {}
  10. mountPath: /storage
  11. subPath: ""
  12. size: 16Gi
  13. scrape:
  14. enabled: true
  15. configMap: ""
  16. config:
  17. global:
  18. scrape_interval: 15s
  19. scrape_configs:
  20. - job_name: victoriametrics
  21. static_configs:
  22. - targets: [ "localhost:8428" ]
  23. - job_name: "kubernetes-apiservers"
  24. kubernetes_sd_configs:
  25. - role: endpoints
  26. scheme: https
  27. tls_config:
  28. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  29. insecure_skip_verify: true
  30. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  31. relabel_configs:
  32. - source_labels:
  33. [
  34. __meta_kubernetes_namespace,
  35. __meta_kubernetes_service_name,
  36. __meta_kubernetes_endpoint_port_name,
  37. ]
  38. action: keep
  39. regex: default;kubernetes;https
  40. - job_name: "kubernetes-nodes"
  41. scheme: https
  42. tls_config:
  43. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  44. insecure_skip_verify: true
  45. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  46. kubernetes_sd_configs:
  47. - role: node
  48. relabel_configs:
  49. - action: labelmap
  50. regex: __meta_kubernetes_node_label_(.+)
  51. - target_label: __address__
  52. replacement: kubernetes.default.svc:443
  53. - source_labels: [ __meta_kubernetes_node_name ]
  54. regex: (.+)
  55. target_label: __metrics_path__
  56. replacement: /api/v1/nodes/$1/proxy/metrics
  57. - job_name: "kubernetes-nodes-cadvisor"
  58. scheme: https
  59. tls_config:
  60. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  61. insecure_skip_verify: true
  62. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  63. kubernetes_sd_configs:
  64. - role: node
  65. relabel_configs:
  66. - action: labelmap
  67. regex: __meta_kubernetes_node_label_(.+)
  68. - target_label: __address__
  69. replacement: kubernetes.default.svc:443
  70. - source_labels: [ __meta_kubernetes_node_name ]
  71. regex: (.+)
  72. target_label: __metrics_path__
  73. replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
  74. metric_relabel_configs:
  75. - action: replace
  76. source_labels: [pod]
  77. regex: '(.+)'
  78. target_label: pod_name
  79. replacement: '${1}'
  80. - action: replace
  81. source_labels: [container]
  82. regex: '(.+)'
  83. target_label: container_name
  84. replacement: '${1}'
  85. - action: replace
  86. target_label: name
  87. replacement: k8s_stub
  88. - action: replace
  89. source_labels: [id]
  90. regex: '^/system\.slice/(.+)\.service$'
  91. target_label: systemd_service_name
  92. replacement: '${1}'

如上所述:

  • 我们禁用了PV,这将默认使用local的emptydir。建议你在生产环境,根据需要自行配置可自动装配的存储插件。

  • 从Kubernetes集群抓取信息,并做了一些label上的转化。

  • 如果你熟悉Prometheus的话,会发现上述配置和Prometheus基本是兼容的。

安装vmsingle:

  1. helm install vmsingle vm/victoria-metrics-single -f ./values.yaml -n vm
  2. W1117 14:46:54.020279 26203 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
  3. W1117 14:46:54.066766 26203 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
  4. NAME: vmsingle
  5. LAST DEPLOYED: Wed Nov 17 14:46:53 2021
  6. NAMESPACE: vm
  7. STATUS: deployed
  8. REVISION: 1
  9. TEST SUITE: None
  10. NOTES:
  11. The VictoriaMetrics write api can be accessed via port 8428 on the following DNS name from within your cluster:
  12. vmsingle-victoria-metrics-single-server.vm.svc.cluster.local
  13. Metrics Ingestion:
  14. Get the Victoria Metrics service URL by running these commands in the same shell:
  15. export POD_NAME=$(kubectl get pods --namespace vm -l "app=server" -o jsonpath="{.items[0].metadata.name}")
  16. kubectl --namespace vm port-forward $POD_NAME 8428
  17. Write url inside the kubernetes cluster:
  18. http://vmsingle-victoria-metrics-single-server.vm.svc.cluster.local:8428/api/v1/write
  19. Read Data:
  20. The following url can be used as the datasource url in Grafana::
  21. http://vmsingle-victoria-metrics-single-server.vm.svc.cluster.local:8428

上述的Read Data地址,后续需要用的,请复制、保存好。 部署成功后,我们查看下Pod,运行成功:

  1. kubectl get pods
  2. default vmsingle-victoria-metrics-single-server-0 1/1 Running 0 59s

安装Grafana

首先,依然是添加helm源:

  1. helm repo add grafana https://grafana.github.io/helm-charts
  2. helm repo update

接下来,自定义参数并安装:

  1. cat <<EOF | helm install my-grafana grafana/grafana -f -
  2. datasources:
  3. datasources.yaml:
  4. apiVersion: 1
  5. datasources:
  6. - name: victoriametrics
  7. type: prometheus
  8. orgId: 1
  9. url: http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428
  10. access: proxy
  11. isDefault: true
  12. updateIntervalSeconds: 10
  13. editable: true
  14. dashboardProviders:
  15. dashboardproviders.yaml:
  16. apiVersion: 1
  17. providers:
  18. - name: 'default'
  19. orgId: 1
  20. folder: ''
  21. type: file
  22. disableDeletion: true
  23. editable: true
  24. options:
  25. path: /var/lib/grafana/dashboards/default
  26. dashboards:
  27. default:
  28. victoriametrics:
  29. gnetId: 10229
  30. revision: 21
  31. datasource: victoriametrics
  32. kubernetes:
  33. gnetId: 14205
  34. revision: 1
  35. datasource: victoriametrics
  36. EOF

在上述配置中,我们添加了默认的数据源,使用前面创建好的VM地址。

接着,我们获取Grafana的密码:

  1. kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
  2. SOnFX4CdrlyG5JACyBedk9mJk7btMz8cXjk7ZiOZ

然后代理端口到本地

  1. export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
  1. kubectl --namespace default port-forward $POD_NAME 3000

访问http://127.0.0.1:3000,使用admin / 前面的密码。

如果一切顺利,会发现已经有Kubernetes集群的数据了:

f

至此,我们搭建了基础的监控系统,你还可以做的更好:

  • 添加PV,让数据可以真正持久化

  • 部署分布式的版本

  • 在微服务中,暴露一些自定义监控指标,并将其抓取到VM的存储中