Application Introspection and Debugging

Once your application is running, you’ll inevitably need to debug problems with it. Earlier we described how you can use kubectl get pods to retrieve simple status information about your pods. But there are a number of ways to get even more information about your application.

Using kubectl describe pod to fetch details about pods

For this example we’ll use a Deployment to create two pods, similar to the earlier example.

application/nginx-with-request.yaml Application Introspection and Debugging - 图1

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: nginx-deployment
  5. spec:
  6. selector:
  7. matchLabels:
  8. app: nginx
  9. replicas: 2
  10. template:
  11. metadata:
  12. labels:
  13. app: nginx
  14. spec:
  15. containers:
  16. - name: nginx
  17. image: nginx
  18. resources:
  19. limits:
  20. memory: "128Mi"
  21. cpu: "500m"
  22. ports:
  23. - containerPort: 80

Create deployment by running following command:

  1. kubectl apply -f https://k8s.io/examples/application/nginx-with-request.yaml
  1. deployment.apps/nginx-deployment created

Check pod status by following command:

  1. kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. nginx-deployment-1006230814-6winp 1/1 Running 0 11s
  3. nginx-deployment-1006230814-fmgu3 1/1 Running 0 11s

We can retrieve a lot more information about each of these pods using kubectl describe pod. For example:

  1. kubectl describe pod nginx-deployment-1006230814-6winp
  1. Name: nginx-deployment-1006230814-6winp
  2. Namespace: default
  3. Node: kubernetes-node-wul5/10.240.0.9
  4. Start Time: Thu, 24 Mar 2016 01:39:49 +0000
  5. Labels: app=nginx,pod-template-hash=1006230814
  6. Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-1956810328","uid":"14e607e7-8ba1-11e7-b5cb-fa16" ...
  7. Status: Running
  8. IP: 10.244.0.6
  9. Controllers: ReplicaSet/nginx-deployment-1006230814
  10. Containers:
  11. nginx:
  12. Container ID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
  13. Image: nginx
  14. Image ID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
  15. Port: 80/TCP
  16. QoS Tier:
  17. cpu: Guaranteed
  18. memory: Guaranteed
  19. Limits:
  20. cpu: 500m
  21. memory: 128Mi
  22. Requests:
  23. memory: 128Mi
  24. cpu: 500m
  25. State: Running
  26. Started: Thu, 24 Mar 2016 01:39:51 +0000
  27. Ready: True
  28. Restart Count: 0
  29. Environment: <none>
  30. Mounts:
  31. /var/run/secrets/kubernetes.io/serviceaccount from default-token-5kdvl (ro)
  32. Conditions:
  33. Type Status
  34. Initialized True
  35. Ready True
  36. PodScheduled True
  37. Volumes:
  38. default-token-4bcbi:
  39. Type: Secret (a volume populated by a Secret)
  40. SecretName: default-token-4bcbi
  41. Optional: false
  42. QoS Class: Guaranteed
  43. Node-Selectors: <none>
  44. Tolerations: <none>
  45. Events:
  46. FirstSeen LastSeen Count From SubobjectPath Type Reason Message
  47. --------- -------- ----- ---- ------------- -------- ------ -------
  48. 54s 54s 1 {default-scheduler } Normal Scheduled Successfully assigned nginx-deployment-1006230814-6winp to kubernetes-node-wul5
  49. 54s 54s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulling pulling image "nginx"
  50. 53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulled Successfully pulled image "nginx"
  51. 53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Created Created container with docker id 90315cc9f513
  52. 53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Started Started container with docker id 90315cc9f513

Here you can see configuration information about the container(s) and Pod (labels, resource requirements, etc.), as well as status information about the container(s) and Pod (state, readiness, restart count, events, etc.).

The container state is one of Waiting, Running, or Terminated. Depending on the state, additional information will be provided — here you can see that for a container in Running state, the system tells you when the container started.

Ready tells you whether the container passed its last readiness probe. (In this case, the container does not have a readiness probe configured; the container is assumed to be ready if no readiness probe is configured.)

Restart Count tells you how many times the container has been restarted; this information can be useful for detecting crash loops in containers that are configured with a restart policy of ‘always.’

Currently the only Condition associated with a Pod is the binary Ready condition, which indicates that the pod is able to service requests and should be added to the load balancing pools of all matching services.

Lastly, you see a log of recent events related to your Pod. The system compresses multiple identical events by indicating the first and last time it was seen and the number of times it was seen. “From” indicates the component that is logging the event, “SubobjectPath” tells you which object (e.g. container within the pod) is being referred to, and “Reason” and “Message” tell you what happened.

Example: debugging Pending Pods

A common scenario that you can detect using events is when you’ve created a Pod that won’t fit on any node. For example, the Pod might request more resources than are free on any node, or it might specify a label selector that doesn’t match any nodes. Let’s say we created the previous Deployment with 5 replicas (instead of 2) and requesting 600 millicores instead of 500, on a four-node cluster where each (virtual) machine has 1 CPU. In that case one of the Pods will not be able to schedule. (Note that because of the cluster addon pods such as fluentd, skydns, etc., that run on each node, if we requested 1000 millicores then none of the Pods would be able to schedule.)

  1. kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. nginx-deployment-1006230814-6winp 1/1 Running 0 7m
  3. nginx-deployment-1006230814-fmgu3 1/1 Running 0 7m
  4. nginx-deployment-1370807587-6ekbw 1/1 Running 0 1m
  5. nginx-deployment-1370807587-fg172 0/1 Pending 0 1m
  6. nginx-deployment-1370807587-fz9sd 0/1 Pending 0 1m

To find out why the nginx-deployment-1370807587-fz9sd pod is not running, we can use kubectl describe pod on the pending Pod and look at its events:

  1. kubectl describe pod nginx-deployment-1370807587-fz9sd
  1. Name: nginx-deployment-1370807587-fz9sd
  2. Namespace: default
  3. Node: /
  4. Labels: app=nginx,pod-template-hash=1370807587
  5. Status: Pending
  6. IP:
  7. Controllers: ReplicaSet/nginx-deployment-1370807587
  8. Containers:
  9. nginx:
  10. Image: nginx
  11. Port: 80/TCP
  12. QoS Tier:
  13. memory: Guaranteed
  14. cpu: Guaranteed
  15. Limits:
  16. cpu: 1
  17. memory: 128Mi
  18. Requests:
  19. cpu: 1
  20. memory: 128Mi
  21. Environment Variables:
  22. Volumes:
  23. default-token-4bcbi:
  24. Type: Secret (a volume populated by a Secret)
  25. SecretName: default-token-4bcbi
  26. Events:
  27. FirstSeen LastSeen Count From SubobjectPath Type Reason Message
  28. --------- -------- ----- ---- ------------- -------- ------ -------
  29. 1m 48s 7 {default-scheduler } Warning FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node
  30. fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000
  31. fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000

Here you can see the event generated by the scheduler saying that the Pod failed to schedule for reason FailedScheduling (and possibly others). The message tells us that there were not enough resources for the Pod on any of the nodes.

To correct this situation, you can use kubectl scale to update your Deployment to specify four or fewer replicas. (Or you could leave the one Pod pending, which is harmless.)

Events such as the ones you saw at the end of kubectl describe pod are persisted in etcd and provide high-level information on what is happening in the cluster. To list all events you can use

  1. kubectl get events

but you have to remember that events are namespaced. This means that if you’re interested in events for some namespaced object (e.g. what happened with Pods in namespace my-namespace) you need to explicitly provide a namespace to the command:

  1. kubectl get events --namespace=my-namespace

To see events from all namespaces, you can use the --all-namespaces argument.

In addition to kubectl describe pod, another way to get extra information about a pod (beyond what is provided by kubectl get pod) is to pass the -o yaml output format flag to kubectl get pod. This will give you, in YAML format, even more information than kubectl describe pod--essentially all of the information the system has about the Pod. Here you will see things like annotations (which are key-value metadata without the label restrictions, that is used internally by Kubernetes system components), restart policy, ports, and volumes.

  1. kubectl get pod nginx-deployment-1006230814-6winp -o yaml
  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. annotations:
  5. kubernetes.io/created-by: |
  6. {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-1006230814","uid":"4c84c175-f161-11e5-9a78-42010af00005","apiVersion":"extensions","resourceVersion":"133434"}}
  7. creationTimestamp: 2016-03-24T01:39:50Z
  8. generateName: nginx-deployment-1006230814-
  9. labels:
  10. app: nginx
  11. pod-template-hash: "1006230814"
  12. name: nginx-deployment-1006230814-6winp
  13. namespace: default
  14. resourceVersion: "133447"
  15. uid: 4c879808-f161-11e5-9a78-42010af00005
  16. spec:
  17. containers:
  18. - image: nginx
  19. imagePullPolicy: Always
  20. name: nginx
  21. ports:
  22. - containerPort: 80
  23. protocol: TCP
  24. resources:
  25. limits:
  26. cpu: 500m
  27. memory: 128Mi
  28. requests:
  29. cpu: 500m
  30. memory: 128Mi
  31. terminationMessagePath: /dev/termination-log
  32. volumeMounts:
  33. - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  34. name: default-token-4bcbi
  35. readOnly: true
  36. dnsPolicy: ClusterFirst
  37. nodeName: kubernetes-node-wul5
  38. restartPolicy: Always
  39. securityContext: {}
  40. serviceAccount: default
  41. serviceAccountName: default
  42. terminationGracePeriodSeconds: 30
  43. volumes:
  44. - name: default-token-4bcbi
  45. secret:
  46. secretName: default-token-4bcbi
  47. status:
  48. conditions:
  49. - lastProbeTime: null
  50. lastTransitionTime: 2016-03-24T01:39:51Z
  51. status: "True"
  52. type: Ready
  53. containerStatuses:
  54. - containerID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
  55. image: nginx
  56. imageID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
  57. lastState: {}
  58. name: nginx
  59. ready: true
  60. restartCount: 0
  61. state:
  62. running:
  63. startedAt: 2016-03-24T01:39:51Z
  64. hostIP: 10.240.0.9
  65. phase: Running
  66. podIP: 10.244.0.6
  67. startTime: 2016-03-24T01:39:49Z

Example: debugging a down/unreachable node

Sometimes when debugging it can be useful to look at the status of a node — for example, because you’ve noticed strange behavior of a Pod that’s running on the node, or to find out why a Pod won’t schedule onto the node. As with Pods, you can use kubectl describe node and kubectl get node -o yaml to retrieve detailed information about nodes. For example, here’s what you’ll see if a node is down (disconnected from the network, or kubelet dies and won’t restart, etc.). Notice the events that show the node is NotReady, and also notice that the pods are no longer running (they are evicted after five minutes of NotReady status).

  1. kubectl get nodes
  1. NAME STATUS ROLES AGE VERSION
  2. kubernetes-node-861h NotReady <none> 1h v1.13.0
  3. kubernetes-node-bols Ready <none> 1h v1.13.0
  4. kubernetes-node-st6x Ready <none> 1h v1.13.0
  5. kubernetes-node-unaj Ready <none> 1h v1.13.0
  1. kubectl describe node kubernetes-node-861h
  1. Name: kubernetes-node-861h
  2. Role
  3. Labels: kubernetes.io/arch=amd64
  4. kubernetes.io/os=linux
  5. kubernetes.io/hostname=kubernetes-node-861h
  6. Annotations: node.alpha.kubernetes.io/ttl=0
  7. volumes.kubernetes.io/controller-managed-attach-detach=true
  8. Taints: <none>
  9. CreationTimestamp: Mon, 04 Sep 2017 17:13:23 +0800
  10. Phase:
  11. Conditions:
  12. Type Status LastHeartbeatTime LastTransitionTime Reason Message
  13. ---- ------ ----------------- ------------------ ------ -------
  14. OutOfDisk Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
  15. MemoryPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
  16. DiskPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
  17. Ready Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
  18. Addresses: 10.240.115.55,104.197.0.26
  19. Capacity:
  20. cpu: 2
  21. hugePages: 0
  22. memory: 4046788Ki
  23. pods: 110
  24. Allocatable:
  25. cpu: 1500m
  26. hugePages: 0
  27. memory: 1479263Ki
  28. pods: 110
  29. System Info:
  30. Machine ID: 8e025a21a4254e11b028584d9d8b12c4
  31. System UUID: 349075D1-D169-4F25-9F2A-E886850C47E3
  32. Boot ID: 5cd18b37-c5bd-4658-94e0-e436d3f110e0
  33. Kernel Version: 4.4.0-31-generic
  34. OS Image: Debian GNU/Linux 8 (jessie)
  35. Operating System: linux
  36. Architecture: amd64
  37. Container Runtime Version: docker://1.12.5
  38. Kubelet Version: v1.6.9+a3d1dfa6f4335
  39. Kube-Proxy Version: v1.6.9+a3d1dfa6f4335
  40. ExternalID: 15233045891481496305
  41. Non-terminated Pods: (9 in total)
  42. Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
  43. --------- ---- ------------ ---------- --------------- -------------
  44. ......
  45. Allocated resources:
  46. (Total limits may be over 100 percent, i.e., overcommitted.)
  47. CPU Requests CPU Limits Memory Requests Memory Limits
  48. ------------ ---------- --------------- -------------
  49. 900m (60%) 2200m (146%) 1009286400 (66%) 5681286400 (375%)
  50. Events: <none>
  1. kubectl get node kubernetes-node-861h -o yaml
  1. apiVersion: v1
  2. kind: Node
  3. metadata:
  4. creationTimestamp: 2015-07-10T21:32:29Z
  5. labels:
  6. kubernetes.io/hostname: kubernetes-node-861h
  7. name: kubernetes-node-861h
  8. resourceVersion: "757"
  9. uid: 2a69374e-274b-11e5-a234-42010af0d969
  10. spec:
  11. externalID: "15233045891481496305"
  12. podCIDR: 10.244.0.0/24
  13. providerID: gce://striped-torus-760/us-central1-b/kubernetes-node-861h
  14. status:
  15. addresses:
  16. - address: 10.240.115.55
  17. type: InternalIP
  18. - address: 104.197.0.26
  19. type: ExternalIP
  20. capacity:
  21. cpu: "1"
  22. memory: 3800808Ki
  23. pods: "100"
  24. conditions:
  25. - lastHeartbeatTime: 2015-07-10T21:34:32Z
  26. lastTransitionTime: 2015-07-10T21:35:15Z
  27. reason: Kubelet stopped posting node status.
  28. status: Unknown
  29. type: Ready
  30. nodeInfo:
  31. bootID: 4e316776-b40d-4f78-a4ea-ab0d73390897
  32. containerRuntimeVersion: docker://Unknown
  33. kernelVersion: 3.16.0-0.bpo.4-amd64
  34. kubeProxyVersion: v0.21.1-185-gffc5a86098dc01
  35. kubeletVersion: v0.21.1-185-gffc5a86098dc01
  36. machineID: ""
  37. osImage: Debian GNU/Linux 7 (wheezy)
  38. systemUUID: ABE5F6B4-D44B-108B-C46A-24CCE16C8B6E

What’s next

Learn about additional debugging tools, including: