System components monitoring

Controller nodes are isolated by default, which thus means that a cluster user cannot schedule workloads onto controller nodes.

k0s provides a mechanism to expose system components for monitoring. System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts. You can read more about metrics for Kubernetes system components here.

Note: the mechanism is an opt-in feature, you can enable it on installation:

  1. sudo k0s install controller --enable-metrics-scraper

Once enabled, a new set of objects will appear in the cluster:

  1. ~ kubectl get all -n k0s-system
  2. NAME READY STATUS RESTARTS AGE
  3. pod/k0s-pushgateway-6c5d8c54cf-bh8sb 1/1 Running 0 43h
  4. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  5. service/k0s-pushgateway ClusterIP 10.100.11.116 <none> 9091/TCP 43h
  6. NAME READY UP-TO-DATE AVAILABLE AGE
  7. deployment.apps/k0s-pushgateway 1/1 1 1 43h
  8. NAME DESIRED CURRENT READY AGE
  9. replicaset.apps/k0s-pushgateway-6c5d8c54cf 1 1 1 43h

That’s not enough to start scraping these additional metrics. For Prometheus Operator](https://prometheus-operator.dev/) based solutions, you can create a ServiceMonitor for it like this:

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: ServiceMonitor
  3. metadata:
  4. name: k0s
  5. namespace: k0s-system
  6. spec:
  7. endpoints:
  8. - port: http
  9. selector:
  10. matchLabels:
  11. app: k0s-observability
  12. component: pushgateway
  13. k0s.k0sproject.io/stack: metrics

Note that it won’t clear alerts like “KubeControllerManagerDown” or “KubeSchedulerDown” as they are based on Prometheus’ internal “up” metrics. But you can get rid of these alerts by modifying them to detect a working component like this:

absent(apiserver_audit_event_total{job=”kube-scheduler”})

Jobs

The list of components which is scrapped by k0s:

  • kube-scheduler
  • kube-controller-manager
  • etcd
  • kine

Note: kube-apiserver metrics are not scrapped since they are accessible via kubernetes endpoint within the cluster.

Architecture

k0s metrics exposure architecture

k0s uses pushgateway with TTL to make it possible to detect issues with the metrics delivery. Default TTL is 2 minutes.