09-3.部署 metrics-server 插件

注意:

  1. 如果没有特殊指明,本文档的所有操作均在 zhangjun-k8s01 节点上执行
  2. kuberntes 自带插件的 manifests yaml 文件使用 gcr.io 的 docker registry,国内被墙,需要手动替换为其它 registry 地址(本文档未替换);
  3. 可以从微软中国提供的 gcr.io 免费代理下载被墙的镜像;

metrics-server 通过 kube-apiserver 发现所有节点,然后调用 kubelet APIs(通过 https 接口)获得各节点(Node)和 Pod 的 CPU、Memory 等资源使用情况。

从 Kubernetes 1.12 开始,kubernetes 的安装脚本移除了 Heapster,从 1.13 开始完全移除了对 Heapster 的支持,Heapster 不再被维护。

替代方案如下:

  1. 用于支持自动扩缩容的 CPU/memory HPA metrics:metrics-server;
  2. 通用的监控方案:使用第三方可以获取 Prometheus 格式监控指标的监控系统,如 Prometheus Operator;
  3. 事件传输:使用第三方工具来传输、归档 kubernetes events;

Kubernetes Dashboard 还不支持 metrics-server(PR:#3504),如果使用 metrics-server 替代 Heapster,将无法在 dashboard 中以图形展示 Pod 的内存和 CPU 情况,需要通过 Prometheus、Grafana 等监控方案来弥补。

注意:如果没有特殊指明,本文档的所有操作均在 zhangjun-k8s01 节点上执行

监控架构

monitoring_architecture.png

安装 metrics-server

从 github clone 源码:

  1. $ cd /opt/k8s/work/
  2. $ git clone https://github.com/kubernetes-incubator/metrics-server.git
  3. $ cd metrics-server/deploy/1.8+/
  4. $ ls
  5. aggregated-metrics-reader.yaml auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml

修改 metrics-server-deployment.yaml 文件,为 metrics-server 添加三个命令行参数:

  1. $ diff metrics-server-deployment.yaml.orig metrics-server-deployment.yaml
  2. 32a33,36
  3. > args:
  4. > - --metric-resolution=30s
  5. > - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
  • —metric-resolution=30s:从 kubelet 采集数据的周期;
  • —kubelet-preferred-address-types:优先使用 InternalIP 来访问 kubelet,这样可以避免节点名称没有 DNS 解析记录时,通过节点名称调用节点 kubelet API 失败的情况(未配置时默认的情况);

部署 metrics-server:

  1. $ cd /opt/k8s/work/metrics-server/deploy/1.8+/
  2. $ kubectl create -f .

查看运行情况

  1. $ kubectl -n kube-system get pods -l k8s-app=metrics-server
  2. NAME READY STATUS RESTARTS AGE
  3. metrics-server-7cffff65bc-hkfr7 1/1 Running 0 56s
  4. $ kubectl get svc -n kube-system metrics-server
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. metrics-server ClusterIP 10.254.37.139 <none> 443/TCP 94s

metrics-server 的命令行参数

  1. $ docker run -it --rm k8s.gcr.io/metrics-server-amd64:v0.3.3 --help
  2. Launch metrics-server
  3. Usage:
  4. [flags]
  5. Flags:
  6. --alsologtostderr log to standard error as well as files
  7. --authentication-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to create tokenaccessreviews.authentication.k8s.io.
  8. --authentication-skip-lookup If false, the authentication-kubeconfig will be used to lookup missing authentication configuration from the clust
  9. er.
  10. --authentication-token-webhook-cache-ttl duration The duration to cache responses from the webhook token authenticator. (default 10s)
  11. --authentication-tolerate-lookup-failure If true, failures to look up missing authentication configuration from the cluster are not considered fatal. Note that this can result in authentication that treats all requests as anonymous.
  12. --authorization-always-allow-paths strings A list of HTTP paths to skip during authorization, i.e. these are authorized without contacting the 'core' kubernetes server.
  13. --authorization-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to create subjectaccessreviews.authorization.k8s.io.
  14. --authorization-webhook-cache-authorized-ttl duration The duration to cache 'authorized' responses from the webhook authorizer. (default 10s)
  15. --authorization-webhook-cache-unauthorized-ttl duration The duration to cache 'unauthorized' responses from the webhook authorizer. (default 10s)
  16. --bind-address ip The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the
  17. rest of the cluster, and by CLI/web clients. If blank, all interfaces will be used (0.0.0.0 for all IPv4 interfaces and :: for all IPv6 interfaces). (default 0.0.0.0)
  18. --cert-dir string The directory where the TLS certs are located. If --tls-cert-file and --tls-private-key-file are provided, this flag will be ignored. (default "apiserver.local.config/certificates")
  19. --client-ca-file string If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the CommonName of the client certificate.
  20. --contention-profiling Enable lock contention profiling, if profiling is enabled
  21. -h, --help help for this command
  22. --http2-max-streams-per-connection int The limit that the server gives to clients for the maximum number of streams in an HTTP/2 connection. Zero means t
  23. o use golang's default.
  24. --kubeconfig string The path to the kubeconfig used to connect to the Kubernetes API server and the Kubelets (defaults to in-cluster c
  25. onfig)
  26. --kubelet-certificate-authority string Path to the CA to use to validate the Kubelet's serving certificates.
  27. --kubelet-insecure-tls Do not verify CA of serving certificates presented by Kubelets. For testing purposes only.
  28. --kubelet-port int The port to use to connect to Kubelets. (default 10250)
  29. --kubelet-preferred-address-types strings The priority of node address types to use when determining which address to use to connect to a particular node (d
  30. efault [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
  31. --log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
  32. --log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
  33. --log_dir string If non-empty, write log files in this directory
  34. --log_file string If non-empty, use this log file
  35. --logtostderr log to standard error instead of files (default true)
  36. --metric-resolution duration The resolution at which metrics-server will retain metrics. (default 1m0s)
  37. --profiling Enable profiling via web interface host:port/debug/pprof/ (default true)
  38. --requestheader-allowed-names strings List of client certificate common names to allow to provide usernames in headers specified by --requestheader-user
  39. name-headers. If empty, any client certificate validated by the authorities in --requestheader-client-ca-file is allowed.
  40. --requestheader-client-ca-file string Root certificate bundle to use to verify client certificates on incoming requests before trusting usernames in hea
  41. ders specified by --requestheader-username-headers. WARNING: generally do not depend on authorization being already done for incoming requests.
  42. --requestheader-extra-headers-prefix strings List of request header prefixes to inspect. X-Remote-Extra- is suggested. (default [x-remote-extra-])
  43. --requestheader-group-headers strings List of request headers to inspect for groups. X-Remote-Group is suggested. (default [x-remote-group])
  44. --requestheader-username-headers strings List of request headers to inspect for usernames. X-Remote-User is common. (default [x-remote-user])
  45. --secure-port int The port on which to serve HTTPS with authentication and authorization.If 0, don't serve HTTPS at all. (default 443)
  46. --skip_headers If true, avoid header prefixes in the log messages
  47. --stderrthreshold severity logs at or above this threshold go to stderr
  48. --tls-cert-file string File containing the default x509 Certificate for HTTPS. (CA cert, if any, concatenated after server cert). If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to the directory specified by --cert-dir.
  49. --tls-cipher-suites strings Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be use. Possible values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_RC4_128_SHA
  50. --tls-min-version string Minimum TLS version supported. Possible values: VersionTLS10, VersionTLS11, VersionTLS12
  51. --tls-private-key-file string File containing the default x509 private key matching --tls-cert-file.
  52. --tls-sni-cert-key namedCertKey A pair of x509 certificate and private key file paths, optionally suffixed with a list of domain patterns which are fully qualified domain names, possibly with prefixed wildcard segments. If no domain patterns are provided, the names of the certificate are extracted. Non-wildcard matches trump over wildcard matches, explicit domain patterns trump over extracted names. For multiple key/certificate pairs, use the --tls-sni-cert-key multiple times. Examples: "example.crt,example.key" or "foo.crt,foo.key:*.foo.com,foo.com". (default [])
  53. -v, --v Level number for the log level verbosity
  54. --vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging

查看 metrics-server 输出的 metrics

  1. 通过 kube-apiserver 或 kubectl proxy 访问:

    https://172.27.137.240:6443/apis/metrics.k8s.io/v1beta1/nodes https://172.27.137.240:6443/apis/metrics.k8s.io/v1beta1/nodes/ https://172.27.137.240:6443/apis/metrics.k8s.io/v1beta1/pods https://172.27.137.240:6443/apis/metrics.k8s.io/v1beta1/namespace//pods/

  2. 直接使用 kubectl 命令访问:

    kubectl get —raw apis/metrics.k8s.io/v1beta1/nodes kubectl get —raw apis/metrics.k8s.io/v1beta1/pods kubectl get —raw apis/metrics.k8s.io/v1beta1/nodes/ kubectl get —raw apis/metrics.k8s.io/v1beta1/namespace//pods/

  1. $ kubectl get --raw "/apis/metrics.k8s.io/v1beta1" | jq .
  2. {
  3. "kind": "APIResourceList",
  4. "apiVersion": "v1",
  5. "groupVersion": "metrics.k8s.io/v1beta1",
  6. "resources": [
  7. {
  8. "name": "nodes",
  9. "singularName": "",
  10. "namespaced": false,
  11. "kind": "NodeMetrics",
  12. "verbs": [
  13. "get",
  14. "list"
  15. ]
  16. },
  17. {
  18. "name": "pods",
  19. "singularName": "",
  20. "namespaced": true,
  21. "kind": "PodMetrics",
  22. "verbs": [
  23. "get",
  24. "list"
  25. ]
  26. }
  27. ]
  28. }
  29. $ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .
  30. {
  31. "kind": "NodeMetricsList",
  32. "apiVersion": "metrics.k8s.io/v1beta1",
  33. "metadata": {
  34. "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
  35. },
  36. "items": [
  37. {
  38. "metadata": {
  39. "name": "zhangjun-k8s01",
  40. "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s01",
  41. "creationTimestamp": "2019-05-26T10:55:10Z"
  42. },
  43. "timestamp": "2019-05-26T10:54:52Z",
  44. "window": "30s",
  45. "usage": {
  46. "cpu": "311155148n",
  47. "memory": "2881016Ki"
  48. }
  49. },
  50. {
  51. "metadata": {
  52. "name": "zhangjun-k8s02",
  53. "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s02",
  54. "creationTimestamp": "2019-05-26T10:55:10Z"
  55. },
  56. "timestamp": "2019-05-26T10:54:54Z",
  57. "window": "30s",
  58. "usage": {
  59. "cpu": "253796835n",
  60. "memory": "1028836Ki"
  61. }
  62. },
  63. {
  64. "metadata": {
  65. "name": "zhangjun-k8s03",
  66. "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s03",
  67. "creationTimestamp": "2019-05-26T10:55:10Z"
  68. },
  69. "timestamp": "2019-05-26T10:54:54Z",
  70. "window": "30s",
  71. "usage": {
  72. "cpu": "280441339n",
  73. "memory": "1072772Ki"
  74. }
  75. }
  76. ]
  77. }
  • /apis/metrics.k8s.io/v1beta1/nodes 和 /apis/metrics.k8s.io/v1beta1/pods 返回的 usage 包含 CPU 和 Memory;

使用 kubectl top 命令查看集群节点资源使用情况

kubectl top 命令从 metrics-server 获取集群节点基本的指标信息:

  1. $ kubectl top node
  2. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
  3. zhangjun-k8s01 296m 3% 2804Mi 17%
  4. zhangjun-k8s02 312m 3% 1161Mi 7%
  5. zhangjun-k8s03 252m 3% 1159Mi 7%

参考

  1. https://kubernetes.feisky.xyz/zh/addons/metrics.html
  2. metrics-server RBAC:https://github.com/kubernetes-incubator/metrics-server/issues/40
  3. metrics-server 参数:https://github.com/kubernetes-incubator/metrics-server/issues/25
  4. https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/
  5. metrics-server 的 APIs 文档