Troubleshooting commands
Big picture
Use command line tools to get status and troubleshoot.
note
calico-system
is used for operator-based commands and examples; for manifest-based install, use kube-system
.
See Calico architecture and components for help with components.
Hosts
Verify number of nodes in a cluster
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-0-10 Ready master 27h v1.18.0
ip-10-0-0-11 Ready <none> 27h v1.18.0
ip-10-0-0-12 Ready <none> 27h v1.18.0
Verify calico-node pods are running on every node, and are in a healthy state
kubectl get pods -n calico-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
calico-node-77zgj 1/1 Running 0 27h 10.0.0.10 ip-10-0-0-10
calico-node-nz8k2 1/1 Running 0 27h 10.0.0.11 ip-10-0-0-11
calico-node-7trv7 1/1 Running 0 27h 10.0.0.12 ip-10-0-0-12
Exec into pod for further troubleshooting
kubectl run multitool --image=praqma/network-multitool
kubectl exec -it multitool -- bash
bash-5.0 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=97 time=6.61 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=97 time=6.64 ms
Collect Calico diagnostic logs
sudo calicoctl node diags
Collecting diagnostics
Using temp dir: /tmp/calico194224816
Dumping netstat
Dumping routes (IPv4)
Dumping routes (IPv6)
Dumping interface info (IPv4)
Dumping interface info (IPv6)
Dumping iptables (IPv4)
Dumping iptables (IPv6)
Diags saved to /tmp/calico194224816/diags-20201127_010117.tar.gz
Kubernetes
Verify all pods are running
kubectl get pods -A
kube-system coredns-66bff467f8-dxbtl 1/1 Running 0 27h
kube-system coredns-66bff467f8-n95vq 1/1 Running 0 27h
kube-system etcd-ip-10-0-0-10 1/1 Running 0 27h
kube-system kube-apiserver-ip-10-0-0-10 1/1 Running 0 27h
Verify Kubernetes API server is running
kubectl cluster-info
Kubernetes master is running at https://10.0.0.10:6443
KubeDNS is running at https://10.0.0.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
ubuntu@master:~$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.49.0.1 <none> 443/TCP 2d2h
Verify Kubernetes kube-dns is working
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.49.0.1 <none> 443/TCP 2d2h
kubectl exec -it multitool bash
bash-5.0 curl -I -k https://kubernetes
HTTP/2 403
cache-control: no-cache, private
content-type: application/json
x-content-type-options: nosniff
content-length: 234
bash-5.0 nslookup google.com
Server: 10.49.0.10
Address: 10.49.0.10#53
Non-authoritative answer:
Name: google.com
Address: 172.217.14.238
Name: google.com
Address: 2607:f8b0:400a:804::200e
Verify that kubelet is running on the node with the correct flags
systemctl status kubelet
If there is a problem, check the journal
journalctl -u kubelet | head
Check the status of other system pods
Look especially at coredns; if they are not getting an IP, something is wrong with the CNI
kubectl get pod -n kube-system -o wide
But if other pods fail, it is likely a different issue. Perform normal Kubernetes troubleshooting. For example:
kubectl describe pod kube-scheduler-ip-10-0-1-20.eu-west-1.compute.internal -n kube-system | tail -15
Calico components
View Calico CNI configuration on a node
cat /etc/cni/net.d/10-calico.conflist
Verify calicoctl matches cluster
The cluster version and type must match the calicoctl version.
calicoctl version
For syntax:
calicoctl version -help
Check tigera operator status
kubectl get tigerastatus
NAME AVAILABLE PROGRESSING DEGRADED SINCE
calico True False False 27h
Check if operator pod is running
kubectl get pod -n tigera-operator
View calico nodes
kubectl get pod -n calico-system -o wide
View Calico installation parameters
kubectl get installation -o yaml
apiVersion: v1
items:
- apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
- apiVersion: operator.tigera.io/v1
spec:
calicoNetwork:
bgp: Enabled
hostPorts: Enabled
ipPools:
- blockSize: 26
cidr: 10.48.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
multiInterfaceMode: None
nodeAddressAutodetectionV4:
firstFound: true
cni:
ipam:
type: Calico
type: Calico
Run commands across multiple nodes
export THE_COMMAND_TO_RUN=date && for calinode in `kubectl get pod -o wide -n calico-system | grep calico-node | awk '{print $1}'`; do echo $calinode; echo "-----"; kubectl exec -n calico-system $calinode -- $THE_COMMAND_TO_RUN; printf "\n"; done
calico-node-87lpx
-----
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
Thu Apr 28 13:48:06 UTC 2022
calico-node-x5fmm
-----
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
Thu Apr 28 13:48:07 UTC 2022
View pod info
kubectl describe pods `<pod_name>` -n `<namespace> `
kubectl describe pods busybox -n default
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21s default-scheduler Successfully assigned default/busybox to ip-10-0-0-11
Normal Pulling 20s kubelet, ip-10-0-0-11 Pulling image "busybox"
Normal Pulled 19s kubelet, ip-10-0-0-11 Successfully pulled image "busybox"
Normal Created 19s kubelet, ip-10-0-0-11 Created container busybox
Normal Started 18s kubelet, ip-10-0-0-11 Started container busybox
View logs of a pod
kubectl logs `<pod_name>` -n `<namespace>`
kubectl logs busybox -n default
View kubelet logs
journalctl -u kubelet
Routing
Verify routing table on the node
ip route
default via 10.0.0.1 dev eth0 proto dhcp src 10.0.0.10 metric 100
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.10
10.0.0.1 dev eth0 proto dhcp scope link src 10.0.0.10 metric 100
10.48.66.128/26 via 10.0.0.12 dev eth0 proto 80 onlink
10.48.231.0/26 via 10.0.0.11 dev eth0 proto 80 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
Verify BGP peer status
sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+------------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+------------+-------------+
| 10.0.0.12 | node-to-node mesh | up | 2020-11-25 | Established |
| 10.0.0.11 | node-to-node mesh | up | 2020-11-25 | Established |
+--------------+-------------------+-------+------------+-------------+
Verify overlay configuration
kubectl get ippools default-ipv4-ippool -o yaml
---
spec:
ipipMode: Always
vxlanMode: Never
Verify bgp learned routes
ip r | grep bird
192.168.66.128/26 via 10.0.0.12 dev tunl0 proto bird onlink
192.168.180.192/26 via 10.0.0.10 dev tunl0 proto bird onlink
blackhole 192.168.231.0/26 proto bird
Verify BIRD routing table
Note: The BIRD routing table gets pushed to node routing tables.
kubectl exec -it -n calico-system calico-node-8cfc8 -- /bin/bash
[root@ip-10-0-0-11 /] birdcl
BIRD v0.3.3+birdv1.6.8 ready.
bird> show route
0.0.0.0/0 via 10.0.0.1 on eth0 [kernel1 18:13:33] * (10)
10.0.0.0/24 dev eth0 [direct1 18:13:32] * (240)
10.0.0.1/32 dev eth0 [kernel1 18:13:33] * (10)
10.48.231.2/32 dev calieb874a8ef0b [kernel1 18:13:41] * (10)
10.48.231.1/32 dev caliaeaa173109d [kernel1 18:13:35] * (10)
10.48.231.0/26 blackhole [static1 18:13:32] * (200)
10.48.231.0/32 dev vxlan.calico [direct1 18:13:32] * (240)
10.48.180.192/26 via 10.0.0.10 on eth0 [Mesh_10_0_0_10 18:13:34] * (100/0) [i]
via 10.0.0.10 on eth0 [Mesh_10_0_0_12 18:13:41 from 10.0.0.12] (100/0) [i]
via 10.0.0.10 on eth0 [kernel1 18:13:33] (10)
10.48.66.128/26 via 10.0.0.12 on eth0 [Mesh_10_0_0_10 18:13:36 from 10.0.0.10] * (100/0) [i]
via 10.0.0.12 on eth0 [Mesh_10_0_0_12 18:13:41] (100/0) [i]
via 10.0.0.12 on eth0 [kernel1 18:13:36] (10)
Capture traffic
For example,
sudo tcpdump -i calicofac0017c3 icmp
Network policy
Verify existing Kubernetes network policies
kubectl get networkpolicy --all-namespaces
NAMESPACE NAME POD-SELECTOR AGE
client allow-ui <none> 20m
client default-deny <none> 4h51m
stars allow-ui <none> 20m
stars backend-policy role=backend 20m
stars default-deny <none> 4h51m
Verify existing Calico network policies
calicoctl get networkpolicy --all-namespaces -o wide
NAMESPACE NAME ORDER SELECTOR
calico-demo allow-busybox 50 app == 'porter'
client knp.default.allow-ui 1000 projectcalico.org/orchestrator == 'k8s'
client knp.default.default-deny 1000 projectcalico.org/orchestrator == 'k8s'
stars knp.default.allow-ui 1000 projectcalico.org/orchestrator == 'k8s'
stars knp.default.backend-policy 1000 projectcalico.org/orchestrator == 'k8s'
stars knp.default.default-deny 1000 projectcalico.org/orchestrator == 'k8s'
Verify existing Calico global network policies
calicoctl get globalnetworkpolicy -o wide
NAME ORDER SELECTOR
default-app-policy 100
egress-lockdown 600
default-node-policy 100 has(kubernetes.io/hostname)
nodeport-policy 100 has(kubernetes.io/hostname)
Check policy selectors and order
For example,
calicoctl get np -n yaobank -o wide
If the selectors should match, check the endpoint IP and the node where it is running. For example,
kubectl get pod -l app=customer -n yaobank