Debug Running Pods

This page explains how to debug Pods running (or crashing) on a Node.

Before you begin

  • Your Pod should already be scheduled and running. If your Pod is not yet running, start with Debugging Pods.
  • For some of the advanced debugging steps you need to know on which Node the Pod is running and have shell access to run commands on that Node. You don’t need that access to run the standard debug steps that use kubectl.

Using kubectl describe pod to fetch details about pods

For this example we’ll use a Deployment to create two pods, similar to the earlier example.

application/nginx-with-request.yaml

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: nginx-deployment
  5. spec:
  6. selector:
  7. matchLabels:
  8. app: nginx
  9. replicas: 2
  10. template:
  11. metadata:
  12. labels:
  13. app: nginx
  14. spec:
  15. containers:
  16. - name: nginx
  17. image: nginx
  18. resources:
  19. limits:
  20. memory: "128Mi"
  21. cpu: "500m"
  22. ports:
  23. - containerPort: 80

Create deployment by running following command:

  1. kubectl apply -f https://k8s.io/examples/application/nginx-with-request.yaml
  1. deployment.apps/nginx-deployment created

Check pod status by following command:

  1. kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. nginx-deployment-67d4bdd6f5-cx2nz 1/1 Running 0 13s
  3. nginx-deployment-67d4bdd6f5-w6kd7 1/1 Running 0 13s

We can retrieve a lot more information about each of these pods using kubectl describe pod. For example:

  1. kubectl describe pod nginx-deployment-67d4bdd6f5-w6kd7
  1. Name: nginx-deployment-67d4bdd6f5-w6kd7
  2. Namespace: default
  3. Priority: 0
  4. Node: kube-worker-1/192.168.0.113
  5. Start Time: Thu, 17 Feb 2022 16:51:01 -0500
  6. Labels: app=nginx
  7. pod-template-hash=67d4bdd6f5
  8. Annotations: <none>
  9. Status: Running
  10. IP: 10.88.0.3
  11. IPs:
  12. IP: 10.88.0.3
  13. IP: 2001:db8::1
  14. Controlled By: ReplicaSet/nginx-deployment-67d4bdd6f5
  15. Containers:
  16. nginx:
  17. Container ID: containerd://5403af59a2b46ee5a23fb0ae4b1e077f7ca5c5fb7af16e1ab21c00e0e616462a
  18. Image: nginx
  19. Image ID: docker.io/library/nginx@sha256:2834dc507516af02784808c5f48b7cbe38b8ed5d0f4837f16e78d00deb7e7767
  20. Port: 80/TCP
  21. Host Port: 0/TCP
  22. State: Running
  23. Started: Thu, 17 Feb 2022 16:51:05 -0500
  24. Ready: True
  25. Restart Count: 0
  26. Limits:
  27. cpu: 500m
  28. memory: 128Mi
  29. Requests:
  30. cpu: 500m
  31. memory: 128Mi
  32. Environment: <none>
  33. Mounts:
  34. /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bgsgp (ro)
  35. Conditions:
  36. Type Status
  37. Initialized True
  38. Ready True
  39. ContainersReady True
  40. PodScheduled True
  41. Volumes:
  42. kube-api-access-bgsgp:
  43. Type: Projected (a volume that contains injected data from multiple sources)
  44. TokenExpirationSeconds: 3607
  45. ConfigMapName: kube-root-ca.crt
  46. ConfigMapOptional: <nil>
  47. DownwardAPI: true
  48. QoS Class: Guaranteed
  49. Node-Selectors: <none>
  50. Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
  51. node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  52. Events:
  53. Type Reason Age From Message
  54. ---- ------ ---- ---- -------
  55. Normal Scheduled 34s default-scheduler Successfully assigned default/nginx-deployment-67d4bdd6f5-w6kd7 to kube-worker-1
  56. Normal Pulling 31s kubelet Pulling image "nginx"
  57. Normal Pulled 30s kubelet Successfully pulled image "nginx" in 1.146417389s
  58. Normal Created 30s kubelet Created container nginx
  59. Normal Started 30s kubelet Started container nginx

Here you can see configuration information about the container(s) and Pod (labels, resource requirements, etc.), as well as status information about the container(s) and Pod (state, readiness, restart count, events, etc.).

The container state is one of Waiting, Running, or Terminated. Depending on the state, additional information will be provided — here you can see that for a container in Running state, the system tells you when the container started.

Ready tells you whether the container passed its last readiness probe. (In this case, the container does not have a readiness probe configured; the container is assumed to be ready if no readiness probe is configured.)

Restart Count tells you how many times the container has been restarted; this information can be useful for detecting crash loops in containers that are configured with a restart policy of ‘always.’

Currently the only Condition associated with a Pod is the binary Ready condition, which indicates that the pod is able to service requests and should be added to the load balancing pools of all matching services.

Lastly, you see a log of recent events related to your Pod. The system compresses multiple identical events by indicating the first and last time it was seen and the number of times it was seen. “From” indicates the component that is logging the event, “SubobjectPath” tells you which object (e.g. container within the pod) is being referred to, and “Reason” and “Message” tell you what happened.

Example: debugging Pending Pods

A common scenario that you can detect using events is when you’ve created a Pod that won’t fit on any node. For example, the Pod might request more resources than are free on any node, or it might specify a label selector that doesn’t match any nodes. Let’s say we created the previous Deployment with 5 replicas (instead of 2) and requesting 600 millicores instead of 500, on a four-node cluster where each (virtual) machine has 1 CPU. In that case one of the Pods will not be able to schedule. (Note that because of the cluster addon pods such as fluentd, skydns, etc., that run on each node, if we requested 1000 millicores then none of the Pods would be able to schedule.)

  1. kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. nginx-deployment-1006230814-6winp 1/1 Running 0 7m
  3. nginx-deployment-1006230814-fmgu3 1/1 Running 0 7m
  4. nginx-deployment-1370807587-6ekbw 1/1 Running 0 1m
  5. nginx-deployment-1370807587-fg172 0/1 Pending 0 1m
  6. nginx-deployment-1370807587-fz9sd 0/1 Pending 0 1m

To find out why the nginx-deployment-1370807587-fz9sd pod is not running, we can use kubectl describe pod on the pending Pod and look at its events:

  1. kubectl describe pod nginx-deployment-1370807587-fz9sd
  1. Name: nginx-deployment-1370807587-fz9sd
  2. Namespace: default
  3. Node: /
  4. Labels: app=nginx,pod-template-hash=1370807587
  5. Status: Pending
  6. IP:
  7. Controllers: ReplicaSet/nginx-deployment-1370807587
  8. Containers:
  9. nginx:
  10. Image: nginx
  11. Port: 80/TCP
  12. QoS Tier:
  13. memory: Guaranteed
  14. cpu: Guaranteed
  15. Limits:
  16. cpu: 1
  17. memory: 128Mi
  18. Requests:
  19. cpu: 1
  20. memory: 128Mi
  21. Environment Variables:
  22. Volumes:
  23. default-token-4bcbi:
  24. Type: Secret (a volume populated by a Secret)
  25. SecretName: default-token-4bcbi
  26. Events:
  27. FirstSeen LastSeen Count From SubobjectPath Type Reason Message
  28. --------- -------- ----- ---- ------------- -------- ------ -------
  29. 1m 48s 7 {default-scheduler } Warning FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node
  30. fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000
  31. fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000

Here you can see the event generated by the scheduler saying that the Pod failed to schedule for reason FailedScheduling (and possibly others). The message tells us that there were not enough resources for the Pod on any of the nodes.

To correct this situation, you can use kubectl scale to update your Deployment to specify four or fewer replicas. (Or you could leave the one Pod pending, which is harmless.)

Events such as the ones you saw at the end of kubectl describe pod are persisted in etcd and provide high-level information on what is happening in the cluster. To list all events you can use

  1. kubectl get events

but you have to remember that events are namespaced. This means that if you’re interested in events for some namespaced object (e.g. what happened with Pods in namespace my-namespace) you need to explicitly provide a namespace to the command:

  1. kubectl get events --namespace=my-namespace

To see events from all namespaces, you can use the --all-namespaces argument.

In addition to kubectl describe pod, another way to get extra information about a pod (beyond what is provided by kubectl get pod) is to pass the -o yaml output format flag to kubectl get pod. This will give you, in YAML format, even more information than kubectl describe pod--essentially all of the information the system has about the Pod. Here you will see things like annotations (which are key-value metadata without the label restrictions, that is used internally by Kubernetes system components), restart policy, ports, and volumes.

  1. kubectl get pod nginx-deployment-1006230814-6winp -o yaml
  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. creationTimestamp: "2022-02-17T21:51:01Z"
  5. generateName: nginx-deployment-67d4bdd6f5-
  6. labels:
  7. app: nginx
  8. pod-template-hash: 67d4bdd6f5
  9. name: nginx-deployment-67d4bdd6f5-w6kd7
  10. namespace: default
  11. ownerReferences:
  12. - apiVersion: apps/v1
  13. blockOwnerDeletion: true
  14. controller: true
  15. kind: ReplicaSet
  16. name: nginx-deployment-67d4bdd6f5
  17. uid: 7d41dfd4-84c0-4be4-88ab-cedbe626ad82
  18. resourceVersion: "1364"
  19. uid: a6501da1-0447-4262-98eb-c03d4002222e
  20. spec:
  21. containers:
  22. - image: nginx
  23. imagePullPolicy: Always
  24. name: nginx
  25. ports:
  26. - containerPort: 80
  27. protocol: TCP
  28. resources:
  29. limits:
  30. cpu: 500m
  31. memory: 128Mi
  32. requests:
  33. cpu: 500m
  34. memory: 128Mi
  35. terminationMessagePath: /dev/termination-log
  36. terminationMessagePolicy: File
  37. volumeMounts:
  38. - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  39. name: kube-api-access-bgsgp
  40. readOnly: true
  41. dnsPolicy: ClusterFirst
  42. enableServiceLinks: true
  43. nodeName: kube-worker-1
  44. preemptionPolicy: PreemptLowerPriority
  45. priority: 0
  46. restartPolicy: Always
  47. schedulerName: default-scheduler
  48. securityContext: {}
  49. serviceAccount: default
  50. serviceAccountName: default
  51. terminationGracePeriodSeconds: 30
  52. tolerations:
  53. - effect: NoExecute
  54. key: node.kubernetes.io/not-ready
  55. operator: Exists
  56. tolerationSeconds: 300
  57. - effect: NoExecute
  58. key: node.kubernetes.io/unreachable
  59. operator: Exists
  60. tolerationSeconds: 300
  61. volumes:
  62. - name: kube-api-access-bgsgp
  63. projected:
  64. defaultMode: 420
  65. sources:
  66. - serviceAccountToken:
  67. expirationSeconds: 3607
  68. path: token
  69. - configMap:
  70. items:
  71. - key: ca.crt
  72. path: ca.crt
  73. name: kube-root-ca.crt
  74. - downwardAPI:
  75. items:
  76. - fieldRef:
  77. apiVersion: v1
  78. fieldPath: metadata.namespace
  79. path: namespace
  80. status:
  81. conditions:
  82. - lastProbeTime: null
  83. lastTransitionTime: "2022-02-17T21:51:01Z"
  84. status: "True"
  85. type: Initialized
  86. - lastProbeTime: null
  87. lastTransitionTime: "2022-02-17T21:51:06Z"
  88. status: "True"
  89. type: Ready
  90. - lastProbeTime: null
  91. lastTransitionTime: "2022-02-17T21:51:06Z"
  92. status: "True"
  93. type: ContainersReady
  94. - lastProbeTime: null
  95. lastTransitionTime: "2022-02-17T21:51:01Z"
  96. status: "True"
  97. type: PodScheduled
  98. containerStatuses:
  99. - containerID: containerd://5403af59a2b46ee5a23fb0ae4b1e077f7ca5c5fb7af16e1ab21c00e0e616462a
  100. image: docker.io/library/nginx:latest
  101. imageID: docker.io/library/nginx@sha256:2834dc507516af02784808c5f48b7cbe38b8ed5d0f4837f16e78d00deb7e7767
  102. lastState: {}
  103. name: nginx
  104. ready: true
  105. restartCount: 0
  106. started: true
  107. state:
  108. running:
  109. startedAt: "2022-02-17T21:51:05Z"
  110. hostIP: 192.168.0.113
  111. phase: Running
  112. podIP: 10.88.0.3
  113. podIPs:
  114. - ip: 10.88.0.3
  115. - ip: 2001:db8::1
  116. qosClass: Guaranteed
  117. startTime: "2022-02-17T21:51:01Z"

Examining pod logs

First, look at the logs of the affected container:

  1. kubectl logs ${POD_NAME} ${CONTAINER_NAME}

If your container has previously crashed, you can access the previous container’s crash log with:

  1. kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}

Debugging with container exec

If the container image includes debugging utilities, as is the case with images built from Linux and Windows OS base images, you can run commands inside a specific container with kubectl exec:

  1. kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}

Note: -c ${CONTAINER_NAME} is optional. You can omit it for Pods that only contain a single container.

As an example, to look at the logs from a running Cassandra pod, you might run

  1. kubectl exec cassandra -- cat /var/log/cassandra/system.log

You can run a shell that’s connected to your terminal using the -i and -t arguments to kubectl exec, for example:

  1. kubectl exec -it cassandra -- sh

For more details, see Get a Shell to a Running Container.

Debugging with an ephemeral debug container

FEATURE STATE: Kubernetes v1.23 [beta]

Ephemeral containers are useful for interactive troubleshooting when kubectl exec is insufficient because a container has crashed or a container image doesn’t include debugging utilities, such as with distroless images.

Example debugging using ephemeral containers

You can use the kubectl debug command to add ephemeral containers to a running Pod. First, create a pod for the example:

  1. kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never

The examples in this section use the pause container image because it does not contain debugging utilities, but this method works with all container images.

If you attempt to use kubectl exec to create a shell you will see an error because there is no shell in this container image.

  1. kubectl exec -it ephemeral-demo -- sh
  1. OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown

You can instead add a debugging container using kubectl debug. If you specify the -i/--interactive argument, kubectl will automatically attach to the console of the Ephemeral Container.

  1. kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
  1. Defaulting debug container name to debugger-8xzrl.
  2. If you don't see a command prompt, try pressing enter.
  3. / #

This command adds a new busybox container and attaches to it. The --target parameter targets the process namespace of another container. It’s necessary here because kubectl run does not enable process namespace sharing in the pod it creates.

Note: The --target parameter must be supported by the Container Runtime. When not supported, the Ephemeral Container may not be started, or it may be started with an isolated process namespace so that ps does not reveal processes in other containers.

You can view the state of the newly created ephemeral container using kubectl describe:

  1. kubectl describe pod ephemeral-demo
  1. ...
  2. Ephemeral Containers:
  3. debugger-8xzrl:
  4. Container ID: docker://b888f9adfd15bd5739fefaa39e1df4dd3c617b9902082b1cfdc29c4028ffb2eb
  5. Image: busybox
  6. Image ID: docker-pullable://busybox@sha256:1828edd60c5efd34b2bf5dd3282ec0cc04d47b2ff9caa0b6d4f07a21d1c08084
  7. Port: <none>
  8. Host Port: <none>
  9. State: Running
  10. Started: Wed, 12 Feb 2020 14:25:42 +0100
  11. Ready: False
  12. Restart Count: 0
  13. Environment: <none>
  14. Mounts: <none>
  15. ...

Use kubectl delete to remove the Pod when you’re finished:

  1. kubectl delete pod ephemeral-demo

Debugging using a copy of the Pod

Sometimes Pod configuration options make it difficult to troubleshoot in certain situations. For example, you can’t run kubectl exec to troubleshoot your container if your container image does not include a shell or if your application crashes on startup. In these situations you can use kubectl debug to create a copy of the Pod with configuration values changed to aid debugging.

Copying a Pod while adding a new container

Adding a new container can be useful when your application is running but not behaving as you expect and you’d like to add additional troubleshooting utilities to the Pod.

For example, maybe your application’s container images are built on busybox but you need debugging utilities not included in busybox. You can simulate this scenario using kubectl run:

  1. kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d

Run this command to create a copy of myapp named myapp-debug that adds a new Ubuntu container for debugging:

  1. kubectl debug myapp -it --image=ubuntu --share-processes --copy-to=myapp-debug
  1. Defaulting debug container name to debugger-w7xmf.
  2. If you don't see a command prompt, try pressing enter.
  3. root@myapp-debug:/#

Note:

  • kubectl debug automatically generates a container name if you don’t choose one using the --container flag.
  • The -i flag causes kubectl debug to attach to the new container by default. You can prevent this by specifying --attach=false. If your session becomes disconnected you can reattach using kubectl attach.
  • The --share-processes allows the containers in this Pod to see processes from the other containers in the Pod. For more information about how this works, see Share Process Namespace between Containers in a Pod.

Don’t forget to clean up the debugging Pod when you’re finished with it:

  1. kubectl delete pod myapp myapp-debug

Copying a Pod while changing its command

Sometimes it’s useful to change the command for a container, for example to add a debugging flag or because the application is crashing.

To simulate a crashing application, use kubectl run to create a container that immediately exits:

  1. kubectl run --image=busybox:1.28 myapp -- false

You can see using kubectl describe pod myapp that this container is crashing:

  1. Containers:
  2. myapp:
  3. Image: busybox
  4. ...
  5. Args:
  6. false
  7. State: Waiting
  8. Reason: CrashLoopBackOff
  9. Last State: Terminated
  10. Reason: Error
  11. Exit Code: 1

You can use kubectl debug to create a copy of this Pod with the command changed to an interactive shell:

  1. kubectl debug myapp -it --copy-to=myapp-debug --container=myapp -- sh
  1. If you don't see a command prompt, try pressing enter.
  2. / #

Now you have an interactive shell that you can use to perform tasks like checking filesystem paths or running the container command manually.

Note:

  • To change the command of a specific container you must specify its name using --container or kubectl debug will instead create a new container to run the command you specified.
  • The -i flag causes kubectl debug to attach to the container by default. You can prevent this by specifying --attach=false. If your session becomes disconnected you can reattach using kubectl attach.

Don’t forget to clean up the debugging Pod when you’re finished with it:

  1. kubectl delete pod myapp myapp-debug

Copying a Pod while changing container images

In some situations you may want to change a misbehaving Pod from its normal production container images to an image containing a debugging build or additional utilities.

As an example, create a Pod using kubectl run:

  1. kubectl run myapp --image=busybox:1.28 --restart=Never -- sleep 1d

Now use kubectl debug to make a copy and change its container image to ubuntu:

  1. kubectl debug myapp --copy-to=myapp-debug --set-image=*=ubuntu

The syntax of --set-image uses the same container_name=image syntax as kubectl set image. *=ubuntu means change the image of all containers to ubuntu.

Don’t forget to clean up the debugging Pod when you’re finished with it:

  1. kubectl delete pod myapp myapp-debug

Debugging via a shell on the node

If none of these approaches work, you can find the Node on which the Pod is running and create a privileged Pod running in the host namespaces. To create an interactive shell on a node using kubectl debug, run:

  1. kubectl debug node/mynode -it --image=ubuntu
  1. Creating debugging pod node-debugger-mynode-pdx84 with container debugger on node mynode.
  2. If you don't see a command prompt, try pressing enter.
  3. root@ek8s:/#

When creating a debugging session on a node, keep in mind that:

  • kubectl debug automatically generates the name of the new Pod based on the name of the Node.
  • The container runs in the host IPC, Network, and PID namespaces.
  • The root filesystem of the Node will be mounted at /host.

Don’t forget to clean up the debugging Pod when you’re finished with it:

  1. kubectl delete pod node-debugger-mynode-pdx84