Verifying node health
Reviewing node status, resource usage, and configuration
Review cluster node health status, resource consumption statistics, and node logs. Additionally, query kubelet
status on individual nodes.
Prerequisites
You have access to the cluster as a user with the
cluster-admin
role.You have installed the OpenShift CLI (
oc
).
Procedure
List the name, status, and role for all nodes in the cluster:
$ oc get nodes
Summarize CPU and memory usage for each node within the cluster:
$ oc adm top nodes
Summarize CPU and memory usage for a specific node:
$ oc adm top node my-node
Querying the kubelet’s status on a node
You can review cluster node health status, resource consumption statistics, and node logs. Additionally, you can query kubelet
status on individual nodes.
Prerequisites
You have access to the cluster as a user with the
cluster-admin
role.Your API service is still functional.
You have installed the OpenShift CLI (
oc
).
Procedure
The kubelet is managed using a systemd service on each node. Review the kubelet’s status by querying the
kubelet
systemd service within a debug pod.Start a debug pod for a node:
$ oc debug node/my-node
If you are running
oc debug
on a control plane node, you can find administrativekubeconfig
files in the/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs
directory.Set
/host
as the root directory within the debug shell. The debug pod mounts the host’s root file system in/host
within the pod. By changing the root directory to/host
, you can run binaries contained in the host’s executable paths:# chroot /host
OKD 4.13 cluster nodes running Fedora CoreOS (FCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. However, if the OKD API is not available, or
kubelet
is not properly functioning on the target node,oc
operations will be impacted. In such situations, it is possible to access nodes usingssh core@<node>.<cluster_name>.<base_domain>
instead.Check whether the
kubelet
systemd service is active on the node:# systemctl is-active kubelet
Output a more detailed
kubelet.service
status summary:# systemctl status kubelet
Querying cluster node journal logs
You can gather journald
unit logs and other logs within /var/log
on individual cluster nodes.
Prerequisites
You have access to the cluster as a user with the
cluster-admin
role.Your API service is still functional.
You have installed the OpenShift CLI (
oc
).You have SSH access to your hosts.
Procedure
Query
kubelet
journald
unit logs from OKD cluster nodes. The following example queries control plane nodes only:$ oc adm node-logs --role=master -u kubelet (1)
1 Replace kubelet
as appropriate to query other unit logs.Collect logs from specific subdirectories under
/var/log/
on cluster nodes.Retrieve a list of logs contained within a
/var/log/
subdirectory. The following example lists files in/var/log/openshift-apiserver/
on all control plane nodes:$ oc adm node-logs --role=master --path=openshift-apiserver
Inspect a specific log within a
/var/log/
subdirectory. The following example outputs/var/log/openshift-apiserver/audit.log
contents from all control plane nodes:$ oc adm node-logs --role=master --path=openshift-apiserver/audit.log
If the API is not functional, review the logs on each node using SSH instead. The following example tails
/var/log/openshift-apiserver/audit.log
:$ ssh core@<master-node>.<cluster_name>.<base_domain> sudo tail -f /var/log/openshift-apiserver/audit.log
OKD 4.13 cluster nodes running Fedora CoreOS (FCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. Before attempting to collect diagnostic data over SSH, review whether the data collected by running
oc adm must gather
and otheroc
commands is sufficient instead. However, if the OKD API is not available, or the kubelet is not properly functioning on the target node,oc
operations will be impacted. In such situations, it is possible to access nodes usingssh core@<node>.<cluster_name>.<base_domain>
.