Data plane on Kubernetes

On Kubernetes the Dataplane entity is automatically created for you, and because transparent proxying is used to communicate between the service and the sidecar proxy, no code changes are required in your applications.

The Kuma control plane injects a kuma-sidecar container into your Pod’s container. If you’re not using the CNI, it also injects a kuma-init into initContainers to setup transparent proxying.

You can control whether Kuma automatically injects the data plane proxy by labeling either the Namespace or the Pod with kuma.io/sidecar-injection=enabled, e.g.

  1. apiVersion: v1
  2. kind: Namespace
  3. metadata:
  4. name: kuma-example
  5. labels:
  6. # inject Kuma sidecar into every Pod in that Namespace,
  7. # unless a user explicitly opts out on per-Pod basis
  8. kuma.io/sidecar-injection: enabled

To opt out of data-plane injection into a particular Pod, you need to label it with kuma.io/sidecar-injection=disabled, e.g.

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: example-app
  5. namespace: kuma-example
  6. spec:
  7. ...
  8. template:
  9. metadata:
  10. ...
  11. labels:
  12. # indicate to Kuma that this Pod doesn't need a sidecar
  13. kuma.io/sidecar-injection: disabled
  14. spec:
  15. containers:
  16. ...

Once your pod is running you can see the data plane CRD that matches it using kubectl:

  1. kubectl get dataplanes <podName>

Tag generation

When Dataplane entities are automatically created, all labels from Pod are converted into Dataplane tags. Labels with keys that contains kuma.io/ are not converted because they are reserved to Kuma. The following tags are added automatically and cannot be overridden using Pod labels.

  • kuma.io/service: Identifies the service name based on a Service that selects a Pod. This will be of format <name>_<namespace>_svc_<port> where <name>, <namespace> and <port> are from the Kubernetes service that is associated with this particular pod. When a pod is spawned without being associated with any Kubernetes Service resource the data plane tag will be kuma.io/service: <name>_<namespace>_svc, where <name> and<namespace> are extracted from the Pod resource metadata.
  • kuma.io/zone: Identifies the zone name in a multi-zone deployment .
  • kuma.io/protocol: Identifies the protocol that was defined by the appProtocol field on the Service that selects the Pod.
  • k8s.kuma.io/namespace: Identifies the Pod’s namespace. Example: kuma-demo.
  • k8s.kuma.io/service-name: Identifies the name of Kubernetes Service that selects the Pod. Example: demo-app.
  • k8s.kuma.io/service-port: Identifies the port of Kubernetes Service that selects the Pod. Example: 80.

  • If a Kubernetes service exposes more than 1 port, multiple inbounds will be generated all with different kuma.io/service.

  • If a pod is attached to more than one Kubernetes service, multiple inbounds will also be generated.

Example

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: my-app
  5. namespace: my-namespace
  6. labels:
  7. foo: bar
  8. app: my-app
  9. spec:
  10. # ...
  11. ---
  12. apiVersion: v1
  13. kind: Service
  14. metadata:
  15. name: my-service
  16. namespace: my-namespace
  17. spec:
  18. selector:
  19. app: my-app
  20. type: ClusterIP
  21. ports:
  22. - name: port1
  23. protocol: TCP
  24. appProtocol: http
  25. port: 80
  26. targetPort: 8080
  27. - name: port2
  28. protocol: TCP
  29. appProtocol: grpc
  30. port: 1200
  31. targetPort: 8081
  32. ---
  33. apiVersion: v1
  34. kind: Service
  35. metadata:
  36. name: my-other-service
  37. namespace: my-namespace
  38. spec:
  39. selector:
  40. foo: bar
  41. type: ClusterIP
  42. ports:
  43. - protocol: TCP
  44. appProtocol: http
  45. port: 81
  46. targetPort: 8080

Will generate the following inbounds in your Kuma dataplane:

  1. ...
  2. inbound:
  3. - port: 8080
  4. tags:
  5. kuma.io/protocol: http
  6. kuma.io/service: my-service_my-namespace_svc_80
  7. k8s.kuma.io/service-name: my-service
  8. k8s.kuma.io/service-port: "80"
  9. k8s.kuma.io/namespace: my-namespace
  10. # Labels coming from your pod
  11. app: my-app
  12. foo: bar
  13. - port: 8081
  14. tags:
  15. kuma.io/protocol: grpc
  16. kuma.io/service: my-service_my-namespace_svc_1200
  17. k8s.kuma.io/service-name: my-service
  18. k8s.kuma.io/service-port: "1200"
  19. k8s.kuma.io/namespace: my-namespace
  20. # Labels coming from your pod
  21. app: my-app
  22. foo: bar
  23. - port: 8080
  24. tags:
  25. kuma.io/protocol: http
  26. kuma.io/service: my-other-service_my-namespace_svc_81
  27. k8s.kuma.io/service-name: my-other-service
  28. k8s.kuma.io/service-port: "81"
  29. k8s.kuma.io/namespace: my-namespace
  30. # Labels coming from your pod
  31. app: my-app
  32. foo: bar

Notice how kuma.io/service is built on <serviceName>_<namespace>_svc_<port> and kuma.io/protocol is the appProtocol field of your service entry.

Capabilities

The sidecar doesn’t need any capabilities and works with drop: ["ALL"]. Use ContainerPatch to control capabilities for the sidecar.

Lifecycle

Joining the mesh

On Kubernetes, Dataplane resource is automatically created by kuma-cp. For each Pod with sidecar-injection label, a new Dataplane resource will be created.

To join the mesh in a graceful way, we need to first make sure the application is ready to serve traffic before it can be considered a valid traffic destination.

Init containers

Due to the way that Kuma implements transparent proxying and sidecars in Kubernetes, network calls from init containers while running a mesh can be a challenge.

Network calls to outside of the mesh

The common pitfall is the idea that it’s possible to order init containers so that the mesh init container is run after other init containers. However, when injecting these init containers into a Pod via webhooks, such as the Vault init container, there is no assurance of the order. The ordering of init containers also doesn’t provide a solution when the Kuma CNI is used, as traffic redirection to the sidecar occurs even before any init container runs.

To solve this issue, start the init container with a specific user ID and exclude specific ports from interception. Remember also about excluding port of DNS interception. Here is an example of annotations to enable HTTPS traffic for a container running as user id 1234.

  1. apiVersion: v1
  2. king: Deployment
  3. metadata:
  4. name: my-deployment
  5. spec:
  6. template:
  7. metadata:
  8. annotations:
  9. traffic.kuma.io/exclude-outbound-tcp-ports-for-uids: "443:1234"
  10. traffic.kuma.io/exclude-outbound-udp-ports-for-uids: "53:1234"
  11. spec:
  12. initContainers:
  13. - name: my-init-container
  14. ...
  15. securityContext:
  16. runAsUser: 1234
Network calls inside the mesh with mTLS enabled

In this scenario, using the init container is simply impossible because kuma-dp is responsible for encrypting the traffic and only runs after all init containers have exited.

Waiting for the dataplane to be ready

By default, containers start in arbitrary order, so an app container can start even though the sidecar container might not be ready to receive traffic.

Making initial requests, such as connecting to a database, can fail for a brief period after the pod starts.

To mitigate this problem try setting

  • runtime.kubernetes.injector.sidecarContainer.waitForDataplaneReady to true, or
  • kuma.io/wait-for-dataplane-ready annotation to true so that the app container waits for the dataplane container to be ready to serve traffic.

The waitForDataplaneReady setting relies on the fact that defining a postStart hook causes Kubernetes to run containers sequentially based on their order of occurrence in the containers list. This isn’t documented and could change in the future. It also depends on injecting the kuma-sidecar container as the first container in the pod, which isn’t guaranteed since other mutating webhooks can rearrange the containers.

Leaving the mesh

To leave the mesh in a graceful shutdown, we need to remove the traffic destination from all the clients before shutting it down.

When the Kuma sidecar receives a SIGTERM signal it:

  1. Starts draining Envoy listeners.
  2. Waits the entire drain time.
  3. Terminates.

While draining, Envoy can still accept connections, however:

  1. It is marked unhealthy on the Envoy Admin /ready endpoint.
  2. It sends connection: close for HTTP/1.1 requests and the GOAWAY frame for HTTP/2. This forces clients to close their connection and reconnect to the new instance.

You can read the Kubernetes docs to learn how Kubernetes handles the Pod lifecycle. Here is the summary including the parts relevant for Kuma.

Whenever a user or system deletes a Pod, Kubernetes does the following:

  1. It marks the Pod as terminated.
  2. For every container concurrently it:
    1. Executes any pre stop hook if defined.
    2. Sends a SIGTERM signal.
    3. Waits until container is terminated for maximum of graceful termination time (by default 60s).
    4. Sends a SIGKILL to the container.
  3. It removes the Pod object from the system.

When Pod is marked as terminated, Kuma, the CP marks the Dataplane object unhealthy, which triggers a configuration update to all the clients in order to remove it as a destination. This can take a couple of seconds depending on the size of the mesh, resources available to the CP, XDS configuration interval, etc.

If the application served by the Kuma sidecar quits immediately after the SIGTERM signal, there is a high chance that clients will still try to send traffic to this destination.

To mitigate this, we need to either

  • Support graceful shutdown in the application. For example, the application should wait X seconds to exit after receiving the first SIGTERM signal.
  • Add a pre-stop hook to postpone stopping the application container. Example:

    1. apiVersion: apps/v1
    2. kind: Deployment
    3. metadata:
    4. name: redis
    5. spec:
    6. template:
    7. spec:
    8. containers:
    9. - name: redis
    10. image: "redis"
    11. lifecycle:
    12. preStop:
    13. exec:
    14. command: ["/bin/sleep", "15"]

When a Pod is deleted, its matching Dataplane resource is deleted as well. This is possible thanks to the owner reference set on the Dataplane resource.

Custom Container Configuration

If you want to modify the default container configuration you can use the ContainerPatch Kubernetes CRD. It allows configuration of both sidecar and init containers. ContainerPatch resources are namespace scoped and can only be applied in a namespace where Kuma CP is running.

In the vast majority of cases you shouldn’t need to override the sidecar and init container configurations. ContainerPatch is a feature which requires good understanding of both Kuma and Kubernetes.

A ContainerPatch specification consists of the list of JSON patch strings that describe the modifications. Consult the entire resource schema.

Example

When using ContainerPath, every value field must be a string containing valid JSON.

  1. apiVersion: kuma.io/v1alpha1
  2. kind: ContainerPatch
  3. metadata:
  4. name: container-patch-1
  5. namespace: kuma-system
  6. spec:
  7. sidecarPatch:
  8. - op: add
  9. path: /securityContext/privileged
  10. value: "true"
  11. - op: add
  12. path: /resources/requests/cpu
  13. value: '"100m"'
  14. - op: add
  15. path: /resources/limits
  16. value: '{
  17. "cpu": "500m",
  18. "memory": "256Mi"
  19. }'
  20. initPatch:
  21. - op: add
  22. path: /securityContext/runAsNonRoot
  23. value: "true"
  24. - op: remove
  25. path: /securityContext/runAsUser

This will change the securityContext section of kuma-sidecar container from:

  1. securityContext:
  2. runAsGroup: 5678
  3. runAsUser: 5678

to:

  1. securityContext:
  2. runAsGroup: 5678
  3. runAsUser: 5678
  4. privileged: true

and similarly change the securityContext section of the init container from:

  1. securityContext:
  2. capabilities:
  3. add:
  4. - NET_ADMIN
  5. - NET_RAW
  6. runAsGroup: 0
  7. runAsUser: 0

to:

  1. securityContext:
  2. capabilities:
  3. add:
  4. - NET_ADMIN
  5. - NET_RAW
  6. runAsGroup: 0
  7. runAsNonRoot: true

Resources requests cpu will be changed from:

  1. requests:
  2. cpu: 50m

to:

  1. requests:
  2. cpu: 100m

Resources limits will be changed from:

  1. limits:
  2. cpu: 1000m
  3. memory: 512Mi

to:

  1. limits:
  2. cpu: 500m
  3. memory: 256Mi

Workload matching

A ContainerPatch is matched to a Pod via an kuma.io/container-patches annotation on the workload. Each annotation may be an ordered list of ContainerPatch names, which will be applied in the order specified.

If a workload refers to a ContainerPatch which does not exist, the injection will explicitly fail and log the failure.

Example

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. namespace: app-ns
  5. name: app-deployment
  6. spec:
  7. replicas: 1
  8. selector:
  9. matchLabels:
  10. app: app-deployment
  11. template:
  12. metadata:
  13. labels:
  14. app: app-deployment
  15. annotations:
  16. kuma.io/container-patches: container-patch-1,container-patch-2
  17. spec: [...]

Default patches

You can configure kuma-cp to apply the list of default patches for workloads which don’t specify their own patches by modifying the containerPatches value from the kuma-dp configuration:

  1. [...]
  2. runtime:
  3. kubernetes:
  4. injector:
  5. containerPatches: [ ]
  6. [...]

If you specify the list of default patches (i.e. ["default-patch-1", "default-patch-2]) but your workload will be annotated with its own list of patches (i.e. ["pod-patch-1", "pod-patch-2]) only the latter will be applied.

To install a CP with env vars you can do:

  1. kumactl install control-plane --env-var "KUMA_RUNTIME_KUBERNETES_INJECTOR_CONTAINER_PATCHES=patch1,patch2"

Error modes and validation

When applying ContainerPatch Kuma will validate that the rendered container spec meets the Kubernetes specification. Kuma will not validate that it is a sane configuration.

If a workload refers to a ContainerPatch which does not exist, the injection will explicitly fail and log the failure.

Direct access to services

By default, on Kubernetes data plane proxies communicate with each other by leveraging the ClusterIP address of the Service resources. Also by default, any request made to another service is automatically load balanced client-side by the data plane proxy that originates the request (they are load balanced by the local Envoy proxy sidecar proxy).

There are situations where we may want to bypass the client-side load balancing and directly access services by using their IP address (ie: in the case of Prometheus wanting to scrape metrics from services by their individual IP address).

When an originating service wants to directly consume other services by their IP address, the originating service’s Deployment resource must include the following annotation:

  1. kuma.io/direct-access-services: Service1, Service2, ServiceN

Where the value is a comma separated list of Kuma services that will be consumed directly. For example:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: example-app
  5. namespace: kuma-example
  6. spec:
  7. ...
  8. template:
  9. metadata:
  10. ...
  11. annotations:
  12. kuma.io/direct-access-services: "backend_example_svc_1234,backend_example_svc_1235"
  13. spec:
  14. containers:
  15. ...

Note: When using direct access with headless service, destination service will be accessible at: Kuma-service.pod-name.mesh

We can also use * to indicate direct access to every service in the Mesh:

  1. kuma.io/direct-access-services: *

Using * to directly access every service is a resource intensive operation, so we must use it carefully.

Schema