Service Health Probes

Kuma is able to track the status of the Envoy proxy and the underlying app.

Compared to HealthCheck policies, Health Probes have the following advantages:

  • knowledge about health is propagated back to kuma-cp and is visible in both in Kuma GUI and Kuma CLI
  • scalable with thousands of data plane proxies

Unlike HealthCheck policies, Health Probes:

  • only updates when kuma-cp is up and running and the proxy can connect to it.
  • doesn’t check connectivity between data plane proxies.

Every inbound in the Dataplane model has a health section:

  1. type: Dataplane
  2. mesh: default
  3. name: web-01
  4. networking:
  5. address: 127.0.0.1
  6. inbound:
  7. - port: 11011
  8. servicePort: 11012
  9. health:
  10. ready: true
  11. tags:
  12. kuma.io/service: backend
  13. kuma.io/protocol: http

This health.ready status is intended to show the status of the endpoint itself. It is set differently depending on the environment (Kubernetes or Universal). It’s treated the same way regardless of the environment:

  • if health.ready is true or health section is missing - Kuma considers the inbound as healthy and includes it into load balancing set.
  • if health.ready is false - Kuma doesn’t include the inbound into the load balancing set.

Also, health.ready is used to compute the status of the data plane proxy and service. You can see these statuses both in Kuma GUI and Kuma CLI:

Data plane proxy health follows these rules:

  • if proxy status is Offline, then data plane proxy is Offline.
  • if proxy status is Online:
    • if all inbounds have health.ready=true or no health then data plane proxy is Online
    • if all inbounds have health.ready=false then data plane proxy is Offline
    • if at least one of the inbounds has health.ready=false then data plane proxy is Partially degraded

Service health is computed by aggregating the health of all data plane proxies that have one inbound with a given kuma.io/service and is set this way:

  • if all data plane proxies are Online then the service is Online
  • if no data plane proxy is Online then the service is Offline
  • if at least one of the data plane proxies is Offline are Partially degraded then the service is Partially degraded

Kubernetes

Kuma leverages container statuses within the pod to evaluate dataplane.inbound.health and will follow these rules:

  • If the sidecar container is not ready then all inbound will have health=false.
  • If the side container is ready then the health of the inbound will be whatever is the readiness of the container which exposes the port used by the inbound.

Application Probe Proxy

For better lifecycle management Kubernetes has readiness, liveness and startup probes. When using Kuma. Application Probe Proxy listens on a non-mTLS port, handles probe requests from Kubernetes and forwards them back to applications. It works by overriding the probe definitions in the Pod so that probe requests will be sent to the proxy.

For example, if we specify the following HTTPGet probe:

  1. livenessProbe:
  2. httpGet:
  3. path: /metrics
  4. port: 3001
  5. initialDelaySeconds: 3
  6. periodSeconds: 3

Kuma will replace it with:

  1. livenessProbe:
  2. httpGet:
  3. path: /3001/metrics
  4. port: 9001
  5. initialDelaySeconds: 3
  6. periodSeconds: 3

Similarly, the following TCPSocket probe

  1. readinessProbe:
  2. tcpSocket:
  3. port: 5432
  4. initialDelaySeconds: 3
  5. periodSeconds: 3

will be replaced with:

  1. readinessProbe:
  2. httpGet:
  3. path: /tcp/5432
  4. port: 9001
  5. initialDelaySeconds: 3
  6. periodSeconds: 3

Where 9001 is a default port that Application Probe Proxy listens on. To prevent potential conflicts with applications, you may configure this port using one of these methods:

With the config yaml:

  1. runtime:
  2. kubernetes:
  3. injector:
  4. applicationProbeProxyPort: 19100

or environment variable: KUMA_RUNTIME_KUBERNETES_APPLICATION_PROBE_PROXY_PORT=19100

With the Pod annotation:

  1. annotations:
  2. kuma.io/application-probe-proxy-port: 19100

You can disable Application Probe Proxy by set the port to 0. When it is disabled, Virtual Probes still works as usual before being removed.

Virtual Probes

Starting with Kuma version 2.9.0, the Virtual Probes feature is deprecated and suppressed. Application Probe Proxy is the successor and will be enabled by default.

For better lifecycle management Kubernetes has readiness, liveness and startup probes.

When using mTLS the container ports are not available outside the pod and Kubernetes will fail to check the probes. To work around this Kuma generates a special non-mTLs listener and overrides the probe definitions in the Pod to proxy them through this sidecar listener. These are called Virtual probes.

This feature currently only supports httpGet probes.

For example, if we specify the following probe:

  1. livenessProbe:
  2. httpGet:
  3. path: /metrics
  4. port: 3001
  5. initialDelaySeconds: 3
  6. periodSeconds: 3

Kuma will replace it with:

  1. livenessProbe:
  2. httpGet:
  3. path: /3001/metrics
  4. port: 9000
  5. initialDelaySeconds: 3
  6. periodSeconds: 3

Where 9000 is a default virtual probe port, this is configurable:

With the config yaml:

  1. runtime:
  2. kubernetes:
  3. injector:
  4. virtualProbesPort: 19001

or environment variable: KUMA_RUNTIME_KUBERNETES_VIRTUAL_PROBES_PORT=19001

With the Pod annotation:

  1. annotations:
  2. kuma.io/virtual-probes-port: 19001

You can also disable Kuma’s probe virtualization:

With the config yaml:

  1. runtime:
  2. kubernetes:
  3. injector:
  4. virtualProbesEnabled: false

or environment variable: KUMA_RUNTIME_KUBERNETES_VIRTUAL_PROBES_ENABLED=false

With the Pod annotation:

  1. annotations:
  2. kuma.io/virtual-probes: disabled

Universal probes

On Universal there is no single standard for probing the service. For health checking, the Dataplane status on Universal Kuma uses Envoy’s Health Discovery Service (HDS). Envoy does health checks and reports the status back to the Kuma Control Plane.

In order to configure health checks for your Dataplane you have to update inbound config with serviceProbe:

  1. type: Dataplane
  2. mesh: default
  3. name: web-01
  4. networking:
  5. address: 127.0.0.1
  6. inbound:
  7. - port: 11011
  8. servicePort: 11012
  9. serviceProbe:
  10. timeout: 2s # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_TIMEOUT)
  11. interval: 1s # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_INTERVAL)
  12. healthyThreshold: 1 # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_HEALTHY_THRESHOLD)
  13. unhealthyThreshold: 1 # optional (default value is taken from KUMA_DP_SERVER_HDS_CHECK_UNHEALTHY_THRESHOLD)
  14. tcp: {}
  15. tags:
  16. kuma.io/service: backend
  17. kuma.io/protocol: http

If there is a serviceProbe configured for the inbound, Kuma will automatically fill the inbound.health section and update it with the interval equal to KUMA_DP_SERVER_HDS_REFRESH_INTERVAL. Alternatively, it’s possible to omit a serviceProbe section and develop custom automation that periodically updates the health of the inbound.

If the grpc stream with Envoy is disconnected then Kuma considers this proxy offline. It will, however, still advertise inbounds using the final update on their health before disconnection. This is to avoid connection issues between kuma-cp and kuma-dp blocking data plane traffic.

Additionally, when serviceProbe is defined, probes takes into account a health of Envoy. When kuma-dp receives the first shutdown signal, it goes into draining state and all inbounds are considered unhealthy.