Monitor Calico component metrics

Big picture

Use Prometheus configured for Calico components to get valuable metrics about the health of Calico.

Value

Using the open-source Prometheus monitoring and alerting toolkit, you can view time-series metrics from Calico components in the Prometheus or Grafana interfaces.

Concepts

About Prometheus

The Prometheus monitoring tool scrapes metrics from instrumented jobs and displays time series data in a visualizer (such as Grafana). For Calico, the “jobs” that Prometheus can harvest metrics from are the Felix and Typha components.

About Calico Felix, Typha, and kube-controllers components

Felix is a daemon that runs on every machine that implements network policy. Felix is the brains of Calico. Typha is an optional set of pods that extends Felix to scale traffic between Calico nodes and the datastore. The kube-controllers pod runs a set of controllers which are responsible for a variety of control plane functions, such as resource garbage collection and synchronization with the Kubernetes API.

You can configure Felix, Typha, and/or kube-controllers to provide metrics to Prometheus.

Before you begin…

In this tutorial we assume that you have completed all other introductory tutorials and possess a running Kubernetes cluster with Calico. You can either use kubectl or calicoctl to perform the following steps. Depending on which tool you would like to use, make sure you have the necessary prerequisites as shown below.

  • kubectl
  • calicoctl

If you wish to modify Calico configurations with kubectl binary you need to make sure you have the Calico API server in your cluster. The API server allows you to manage resources within the projectcalico.org/v3 api group.

Monitor Calico component metrics - 图1note

Operator based installs include the API server by default.

For more information about the API server please use this link.

You can run calicoctl on any host with network access to the Calico datastore as either a binary or a container to manage Calico APIs in the projectcalico.org/v3 API group.

For more information about calicoctl please use this link.

How to

This tutorial will go through the necessary steps to implement basic monitoring of Calico with Prometheus.

  1. Configure Calico to enable the metrics reporting.
  2. Create the namespace and service account that Prometheus will need.
  3. Deploy and configure Prometheus.
  4. View the metrics in the Prometheus dashboard and create a simple graph.

1. Configure Calico to enable metrics reporting

Felix configuration

Felix prometheus metrics are disabled by default.

Monitor Calico component metrics - 图2note

A comprehensive list of configuration values can be found at this link.

Use the following command to enable Felix metrics.

  • kubectl
  • calicoctl
  1. kubectl patch felixconfiguration default --type merge --patch '{"spec":{"prometheusMetricsEnabled": true}}'

You should see an output like below:

  1. felixconfiguration.projectcalico.org/default patched
  1. calicoctl patch felixconfiguration default --patch '{"spec":{"prometheusMetricsEnabled": true}}'

You should see an output like below:

  1. Successfully patched 1 'FelixConfiguration' resource

Creating a service to expose Felix metrics

Prometheus uses Kubernetes services to dynamically discover endpoints. Here you will create a service named felix-metrics-svc which Prometheus will use to discover all the Felix metrics endpoints.

Monitor Calico component metrics - 图3note

Felix by default uses port 9091 TCP to publish its metrics.

  • Operator
  • Manifest
  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: felix-metrics-svc
  6. namespace: calico-system
  7. spec:
  8. clusterIP: None
  9. selector:
  10. k8s-app: calico-node
  11. ports:
  12. - port: 9091
  13. targetPort: 9091
  14. EOF

If running Calico for Windows, also create a service for Windows nodes:

  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: felix-windows-metrics-svc
  6. namespace: calico-system
  7. spec:
  8. clusterIP: None
  9. selector:
  10. k8s-app: calico-node-windows
  11. ports:
  12. - port: 9091
  13. targetPort: 9091
  14. EOF

By default, the Windows firewall blocks listening on ports. For Calico to manage the Prometheus metrics ports Windows firewall rules, enable the windowsManageFirewallRules setting in FelixConfiguration:

  1. kubectl patch felixConfiguration default --type merge --patch '{"spec":{"windowsManageFirewallRules": "Enabled"}}'

See the FelixConfiguration reference for more details. You can also add a Windows firewall rule that allows listening on the Prometheus metrics port instead of having Calico manage it.

  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: felix-metrics-svc
  6. namespace: kube-system
  7. spec:
  8. clusterIP: None
  9. selector:
  10. k8s-app: calico-node
  11. ports:
  12. - port: 9091
  13. targetPort: 9091
  14. EOF

Typha Configuration

  • Operator
  • Manifest

An Operator installation of Calico automatically deploys one or more Typha instances depending on the scale of your cluster. By default metrics for these instances are disabled.

Use the following command to instruct tigera-operator to enable Typha metrics.

  1. kubectl patch installation default --type=merge -p '{"spec": {"typhaMetricsPort":9093}}'

You should see a result similar to:

  1. installation.operator.tigera.io/default patched

Monitor Calico component metrics - 图4note

Typha implementation is optional, if you don’t have Typha in your cluster you can skip Typha configuration section.

If you are uncertain whether you have Typha in your cluster execute the following code:

  1. kubectl get pods -A | grep typha

If your result is similar to what is shown below you are using Typha in your cluster.

Monitor Calico component metrics - 图5note

The name suffix of pods shown below was dynamically generated. Your typha instance might have a different suffix.

  1. kube-system calico-typha-56fccfcdc4-z27xj 1/1 Running 0 28h
  2. kube-system calico-typha-horizontal-autoscaler-74f77cd87c-6hx27 1/1 Running 0 28h

You can enable Typha metrics to be consumed by Prometheus via two ways.

Creating a service to expose Typha metrics

Monitor Calico component metrics - 图6note

Typha uses port 9091 TCP by default to publish its metrics. However, if Calico is installed using Amazon yaml file this port will be 9093 as its set manually via TYPHA_PROMETHEUSMETRICSPORT environment variable.

  • Operator
  • Manifest
  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: typha-metrics-svc
  6. namespace: calico-system
  7. spec:
  8. clusterIP: None
  9. selector:
  10. k8s-app: calico-typha
  11. ports:
  12. - port: 9093
  13. targetPort: 9093
  14. EOF
  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: typha-metrics-svc
  6. namespace: kube-system
  7. spec:
  8. clusterIP: None
  9. selector:
  10. k8s-app: calico-typha
  11. ports:
  12. - port: 9093
  13. targetPort: 9093
  14. EOF

kube-controllers configuration

Prometheus metrics are enabled by default on TCP port 9094 for calico-kube-controllers.

  • Operator
  • Manifest

The operator automatically creates a service that exposes these metrics.

You can use the following command to verify it.

  1. kubectl get svc -n calico-system

You should see a result similar to:

  1. calico-kube-controllers-metrics ClusterIP 10.43.77.57 <none> 9094/TCP 39d

Creating a service to expose kube-controllers metrics

Create a service to expose calico-kube-controllers metrics to Prometheus.

  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: kube-controllers-metrics-svc
  6. namespace: kube-system
  7. spec:
  8. clusterIP: None
  9. selector:
  10. k8s-app: calico-kube-controllers
  11. ports:
  12. - port: 9094
  13. targetPort: 9094
  14. EOF

Optionally, you can use the following command to modify the port by changing the KubeControllersConfiguration resource if desired.

Monitor Calico component metrics - 图7note

Setting this value to zero will disable metrics in the kube-controllers pod.

  • kubectl
  • calicoctl
  1. kubectl patch kubecontrollersconfiguration default --type=merge --patch '{"spec":{"prometheusMetricsPort": 9095}}'
  1. calicoctl patch kubecontrollersconfiguration default --patch '{"spec":{"prometheusMetricsPort": 9095}}'

2. Cluster preparation

Namespace creation

Namespace isolates resources in your cluster. Here you will create a Namespace called calico-monitoring to hold your monitoring resources.

Monitor Calico component metrics - 图8note

Kubernetes namespaces guide can be found at this link.

  1. kubectl create -f -<<EOF
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: calico-monitoring
  6. labels:
  7. app: ns-calico-monitoring
  8. role: monitoring
  9. EOF

Service account creation

You need to provide Prometheus a serviceAccount with required permissions to collect information from Calico.

Monitor Calico component metrics - 图9note

A comprehensive guide to user roles and authentication can be found at this link.

  1. kubectl apply -f - <<EOF
  2. apiVersion: rbac.authorization.k8s.io/v1
  3. kind: ClusterRole
  4. metadata:
  5. name: calico-prometheus-user
  6. rules:
  7. - apiGroups: [""]
  8. resources:
  9. - endpoints
  10. - services
  11. - pods
  12. verbs: ["get", "list", "watch"]
  13. - nonResourceURLs: ["/metrics"]
  14. verbs: ["get"]
  15. ---
  16. apiVersion: v1
  17. kind: ServiceAccount
  18. metadata:
  19. name: calico-prometheus-user
  20. namespace: calico-monitoring
  21. ---
  22. apiVersion: rbac.authorization.k8s.io/v1
  23. kind: ClusterRoleBinding
  24. metadata:
  25. name: calico-prometheus-user
  26. roleRef:
  27. apiGroup: rbac.authorization.k8s.io
  28. kind: ClusterRole
  29. name: calico-prometheus-user
  30. subjects:
  31. - kind: ServiceAccount
  32. name: calico-prometheus-user
  33. namespace: calico-monitoring
  34. EOF

3. Install prometheus

Create prometheus config file

We can configure Prometheus using a ConfigMap to persistently store the desired settings.

Monitor Calico component metrics - 图10note

A comprehensive guide about configuration file can be found at this link.

  • Operator
  • manifest
  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: prometheus-config
  6. namespace: calico-monitoring
  7. data:
  8. prometheus.yml: |-
  9. global:
  10. scrape_interval: 15s
  11. external_labels:
  12. monitor: 'tutorial-monitor'
  13. scrape_configs:
  14. - job_name: 'prometheus'
  15. scrape_interval: 5s
  16. static_configs:
  17. - targets: ['localhost:9090']
  18. - job_name: 'felix_metrics'
  19. scrape_interval: 5s
  20. scheme: http
  21. kubernetes_sd_configs:
  22. - role: endpoints
  23. relabel_configs:
  24. - source_labels: [__meta_kubernetes_service_name]
  25. regex: felix-metrics-svc
  26. replacement: $1
  27. action: keep
  28. - job_name: 'felix_windows_metrics'
  29. scrape_interval: 5s
  30. scheme: http
  31. kubernetes_sd_configs:
  32. - role: endpoints
  33. relabel_configs:
  34. - source_labels: [__meta_kubernetes_service_name]
  35. regex: felix-windows-metrics-svc
  36. replacement: $1
  37. action: keep
  38. - job_name: 'typha_metrics'
  39. scrape_interval: 5s
  40. scheme: http
  41. kubernetes_sd_configs:
  42. - role: endpoints
  43. relabel_configs:
  44. - source_labels: [__meta_kubernetes_service_name]
  45. regex: typha-metrics-svc
  46. replacement: $1
  47. action: keep
  48. - source_labels: [__meta_kubernetes_pod_container_port_name]
  49. regex: calico-typha
  50. action: drop
  51. - job_name: 'kube_controllers_metrics'
  52. scrape_interval: 5s
  53. scheme: http
  54. kubernetes_sd_configs:
  55. - role: endpoints
  56. relabel_configs:
  57. - source_labels: [__meta_kubernetes_service_name]
  58. regex: calico-kube-controllers-metrics
  59. replacement: $1
  60. action: keep
  61. EOF
  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: prometheus-config
  6. namespace: calico-monitoring
  7. data:
  8. prometheus.yml: |-
  9. global:
  10. scrape_interval: 15s
  11. external_labels:
  12. monitor: 'tutorial-monitor'
  13. scrape_configs:
  14. - job_name: 'prometheus'
  15. scrape_interval: 5s
  16. static_configs:
  17. - targets: ['localhost:9090']
  18. - job_name: 'felix_metrics'
  19. scrape_interval: 5s
  20. scheme: http
  21. kubernetes_sd_configs:
  22. - role: endpoints
  23. relabel_configs:
  24. - source_labels: [__meta_kubernetes_service_name]
  25. regex: felix-metrics-svc
  26. replacement: $1
  27. action: keep
  28. - job_name: 'felix_windows_metrics'
  29. scrape_interval: 5s
  30. scheme: http
  31. kubernetes_sd_configs:
  32. - role: endpoints
  33. relabel_configs:
  34. - source_labels: [__meta_kubernetes_service_name]
  35. regex: felix-windows-metrics-svc
  36. replacement: $1
  37. action: keep
  38. - job_name: 'typha_metrics'
  39. scrape_interval: 5s
  40. scheme: http
  41. kubernetes_sd_configs:
  42. - role: endpoints
  43. relabel_configs:
  44. - source_labels: [__meta_kubernetes_service_name]
  45. regex: typha-metrics-svc
  46. replacement: $1
  47. action: keep
  48. - job_name: 'kube_controllers_metrics'
  49. scrape_interval: 5s
  50. scheme: http
  51. kubernetes_sd_configs:
  52. - role: endpoints
  53. relabel_configs:
  54. - source_labels: [__meta_kubernetes_service_name]
  55. regex: kube-controllers-metrics-svc
  56. replacement: $1
  57. action: keep
  58. EOF

Create Prometheus pod

Now that you have a serviceaccount with permissions to gather metrics and have a valid config file for your Prometheus, it’s time to create the Prometheus pod.

  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: Pod
  4. metadata:
  5. name: prometheus-pod
  6. namespace: calico-monitoring
  7. labels:
  8. app: prometheus-pod
  9. role: monitoring
  10. spec:
  11. nodeSelector:
  12. kubernetes.io/os: linux
  13. serviceAccountName: calico-prometheus-user
  14. containers:
  15. - name: prometheus-pod
  16. image: prom/prometheus
  17. resources:
  18. limits:
  19. memory: "128Mi"
  20. cpu: "500m"
  21. volumeMounts:
  22. - name: config-volume
  23. mountPath: /etc/prometheus/prometheus.yml
  24. subPath: prometheus.yml
  25. ports:
  26. - containerPort: 9090
  27. volumes:
  28. - name: config-volume
  29. configMap:
  30. name: prometheus-config
  31. EOF

Check your cluster pods to assure pod creation was successful and prometheus pod is Running.

  1. kubectl get pods prometheus-pod -n calico-monitoring

It should return something like the following.

  1. NAME READY STATUS RESTARTS AGE
  2. prometheus-pod 1/1 Running 0 16s

4. View metrics

You can access prometheus dashboard by using port-forwarding feature.

  1. kubectl port-forward pod/prometheus-pod 9090:9090 -n calico-monitoring

Browse to http://localhost:9090 you should be able to see prometheus dashboard. Type felix_active_local_endpoints in the Expression input textbox then hit the execute button. Console table should be populated with all your nodes and quantity of endpoints in each of them.

Monitor Calico component metrics - 图11note

A list of Felix metrics can be found at this link. Similar lists can be found for kube-controllers and Typha.

Push the Add Graph button, You should be able to see the metric plotted on a Graph.

Cleanup

This section will help you remove resources that you have created by following this tutorial. Please skip this step if you like to deploy Grafana to Visualize component metrics. First remove the services by executing the following command:

  • Operator
  • Manifest
  1. kubectl delete service felix-metrics-svc -n calico-system
  2. kubectl delete service typha-metrics-svc -n calico-system

If running Calico for Windows, also clean up the Windows nodes service:

  1. kubectl delete service felix-windows-metrics-svc -n calico-system
  1. kubectl delete service felix-metrics-svc -n kube-system
  2. kubectl delete service typha-metrics-svc -n kube-system
  3. kubectl delete service kube-controllers-metrics-svc -n kube-system

Return Calico configurations to their default state.

  • kubectl
  • calicoctl
  1. kubectl patch felixConfiguration default --type merge --patch '{"spec":{"prometheusMetricsEnabled": false}}'
  2. kubectl patch installation default --type=json -p '[{"op": "remove", "path":"/spec/typhaMetricsPort"}]'
  1. calicoctl patch felixConfiguration default --patch '{"spec":{"prometheusMetricsEnabled": false}}'

Finally, remove the namespace and RBAC permissions.

  1. kubectl delete namespace calico-monitoring
  2. kubectl delete ClusterRole calico-prometheus-user
  3. kubectl delete clusterrolebinding calico-prometheus-user

Best practices

If you enable Calico metrics to Prometheus, a best practice is to use network policy to limit access to the Calico metrics endpoints. For details, see Secure Calico Prometheus endpoints.

If you are not using Prometheus metrics, we recommend disabling the Prometheus ports entirely for more security.

Next Steps

Visualizing metrics via Grafana.