Health checks

Health is an extremely important part of the microservice architecture, load balancers rely on the health status of the service while picking endpoint from load balancer set. Also, users want the service state to be observable through the GUI or CLI. Orchestrators like Kubernetes also want to know the service status to manage lifecycle of the containers.

Kuma supports several aspects of the health checking. There are two policies which allows configuring active and passive health checks: Health Check and Circuit Breaker.

Kuma is able to track the status of the Envoy proxy. If grpc stream with Envoy is disconnected then Kuma considers this proxy as offline, but we still send the traffic regardless of that, because this status is designed to track the connection between kuma-cp and kuma-dp.

Also, every inbound in the Dataplane model has health section:

  1. type: Dataplane
  2. mesh: default
  3. name: web-01
  4. networking:
  5. address: 127.0.0.1
  6. inbound:
  7. - port: 11011
  8. servicePort: 11012
  9. health:
  10. ready: true
  11. tags:
  12. kuma.io/service: backend
  13. kuma.io/protocol: http

This health.ready status is intended to show the status of the service itself. It is set differently depending on the environment (Kubernetes or Universal), but it’s treated the same way regardless of the environment:

  • if health.ready is true or health section is missing - Kuma considers the inbound as healthy and includes it into load balancing set.
  • if health.ready is false - Kuma doesn’t include the inbound into load balancing set.

Also, health.ready is used to compute the status of the Dataplanes and Service. You can see these statuses both in Kuma GUI and Kuma CLI:

  • if proxy status is Offline, then Dataplane is Offline:
  • if proxy status is Online:
    • if all inbounds are ready then Dataplane is Online
    • if all inbounds are not ready then Dataplane is Offline
    • if at least one of the inbounds is not ready then Dataplane is Partially degraded
    • if inbound is not ready then it’s not included in the load-balancer set which means it doesn’t receive the traffic
    • if all inbounds which implement the same service are ready then service is Online
    • if all inbounds which implement the same service are not ready then service is Offline
    • if at least one of the inbounds which implement the same service is not ready then service is Partially degraded

Kubernetes probes

Even if Kubernetes probes are disabled, Kuma takes pod.status.containerStatuses.ready in order to fill dataplane.inbound.health section.

If you specify httpGet probe for the Pod, Kuma will generate a special non-MTLs listener and overrides the probe itself in the Pod resource. This feature is called Virtual probes, and it allows kubelet probing the pod status even if MTLS is enabled on the mesh. For example, if we specify the following probe:

  1. livenessProbe:
  2. httpGet:
  3. path: /metrics
  4. port: 3001
  5. initialDelaySeconds: 3
  6. periodSeconds: 3

Kuma will replace it with:

  1. livenessProbe:
  2. httpGet:
  3. path: /3001/metrics
  4. port: 9000
  5. initialDelaySeconds: 3
  6. periodSeconds: 3

Where 9000 is a default virtual probe port, which can be configured in kuma-cp.config:

  1. runtime:
  2. kubernetes:
  3. injector:
  4. virtualProbesPort: 19001

And can also be overwritten in the Pod’s annotations:

  1. annotations:
  2. kuma.io/virtual-probes-port: 19001

To disable Kuma’s probe virtualziation, we can either set it in Kuma’s configuration file kuma-cp.config:

  1. runtime:
  2. kubernetes:
  3. injector:
  4. virtualProbesEnabled: false

or in the Pod’s annotations:

  1. annotations:
  2. kuma.io/virtual-probes: disabled

The same behaviour could be configured using environment variables:

  • KUMA_RUNTIME_KUBERNETES_VIRTUAL_PROBES_ENABLED=false
  • KUMA_RUNTIME_KUBERNETES_VIRTUAL_PROBES_ENABLED=19001

Universal probes

On Universal there is no single standard for probing the service. For health checking of the service status on Universal Kuma is using Envoy’s Health Discovery Service (HDS). Envoy does health checks and reports the status back to Kuma Control Plane.

In order to configure health checking of your service you have to update inbound config with serviceProbe:

  1. type: Dataplane
  2. mesh: default
  3. name: web-01
  4. networking:
  5. address: 127.0.0.1
  6. inbound:
  7. - port: 11011
  8. servicePort: 11012
  9. serviceProbe:
  10. timeout: 2s # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_TIMEOUT)
  11. interval: 1s # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_INTERVAL)
  12. healthyThreshold: 1 # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_HEALTHY_THRESHOLD)
  13. unhealthThreshold: 1 # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_UNHEALTHY_THRESHOLD)
  14. tcp: {}
  15. tags:
  16. kuma.io/service: backend
  17. kuma.io/protocol: http

If there is a serviceProbe configured for the inbound, Kuma will automatically fill the health section and update it with interval equal to KUMA_DP_SERVER_HDS_REFRESH_INTERVAL. Alternatively, it’s possible to omit a serviceProbe section and develop custom automation that periodically updates the health of the inbound.

Comparing to HealthCheck policy serviceProbes have some advantages:

  • knowledge about health is propagated back to kuma-cp and could be seen both in Kuma GUI and Kuma CLI
  • scalable with thousands of Dataplanes

but at the same time unlike HealthCheck policy serviceProbes:

  • works only when kuma-cp is up and running
  • can’t check TLS between Envoys