Troubleshooting OVN-Kubernetes

OVN-Kubernetes has many sources of built-in health checks and logs.

Monitoring OVN-Kubernetes health by using readiness probes

The ovnkube-control-plane and ovnkube-node pods have containers configured with readiness probes.

Prerequisites

  • Access to the OpenShift CLI (oc).

  • You have access to the cluster with cluster-admin privileges.

  • You have installed jq.

Procedure

  1. Review the details of the ovnkube-node readiness probe by running the following command:

    1. $ oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node \
    2. -o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'

    The readiness probe for the northbound and southbound database containers in the ovnkube-node pod checks for the health of the databases and the ovnkube-controller container.

    The ovnkube-node container in the ovnkube-node pod has a readiness probe to verify the presence of the OVN-Kubernetes CNI configuration file, the absence of which would indicate that the pod is not running or is not ready to accept requests to configure pods.

  2. Show all events including the probe failures, for the namespace by using the following command:

    1. $ oc get events -n openshift-ovn-kubernetes
  3. Show the events for just a specific pod:

    1. $ oc describe pod ovnkube-node-9lqfk -n openshift-ovn-kubernetes
  4. Show the messages and statuses from the cluster network operator:

    1. $ oc get co/network -o json | jq '.status.conditions[]'
  5. Show the ready status of each container in ovnkube-node pods by running the following script:

    1. $ for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \
    2. -o jsonpath='{range.items[*]}{" "}{.metadata.name}'); do echo === $p ===; \
    3. oc get pods -n openshift-ovn-kubernetes $p -o json | jq '.status.containerStatuses[] | .name, .ready'; \
    4. done

    The expectation is all container statuses are reporting as true. Failure of a readiness probe sets the status to false.

Additional resources

Viewing OVN-Kubernetes alerts in the console

The Alerting UI provides detailed information about alerts and their governing alerting rules and silences.

Prerequisites

  • You have access to the cluster as a developer or as a user with view permissions for the project that you are viewing metrics for.

Procedure (UI)

  1. In the Administrator perspective, select ObserveAlerting. The three main pages in the Alerting UI in this perspective are the Alerts, Silences, and Alerting Rules pages.

  2. View the rules for OVN-Kubernetes alerts by selecting ObserveAlertingAlerting Rules.

Viewing OVN-Kubernetes alerts in the CLI

You can get information about alerts and their governing alerting rules and silences from the command line.

Prerequisites

  • Access to the cluster as a user with the cluster-admin role.

  • The OpenShift CLI (oc) installed.

  • You have installed jq.

Procedure

  1. View active or firing alerts by running the following commands.

    1. Set the alert manager route environment variable by running the following command:

      1. $ ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring \
      2. -o jsonpath='{@.spec.host}')
    2. Issue a curl request to the alert manager route API by running the following command, replacing $ALERT_MANAGER with the URL of your Alertmanager instance:

      1. $ curl -s -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$ALERT_MANAGER/api/v1/alerts | jq '.data[] | "\(.labels.severity) \(.labels.alertname) \(.labels.pod) \(.labels.container) \(.labels.endpoint) \(.labels.instance)"'
  2. View alerting rules by running the following command:

    1. $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/rules' | jq '.data.groups[].rules[] | select(((.name|contains("ovn")) or (.name|contains("OVN")) or (.name|contains("Ovn")) or (.name|contains("North")) or (.name|contains("South"))) and .type=="alerting")'

Viewing the OVN-Kubernetes logs using the CLI

You can view the logs for each of the pods in the ovnkube-master and ovnkube-node pods using the OpenShift CLI (oc).

Prerequisites

  • Access to the cluster as a user with the cluster-admin role.

  • Access to the OpenShift CLI (oc).

  • You have installed jq.

Procedure

  1. View the log for a specific pod:

    1. $ oc logs -f <pod_name> -c <container_name> -n <namespace>

    where:

    -f

    Optional: Specifies that the output follows what is being written into the logs.

    <pod_name>

    Specifies the name of the pod.

    <container_name>

    Optional: Specifies the name of a container. When a pod has more than one container, you must specify the container name.

    <namespace>

    Specify the namespace the pod is running in.

    For example:

    1. $ oc logs ovnkube-node-5dx44 -n openshift-ovn-kubernetes
    1. $ oc logs -f ovnkube-node-5dx44 -c ovnkube-controller -n openshift-ovn-kubernetes

    The contents of log files are printed out.

  2. Examine the most recent entries in all the containers in the ovnkube-node pods:

    1. $ for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \
    2. -o jsonpath='{range.items[*]}{" "}{.metadata.name}'); \
    3. do echo === $p ===; for container in $(oc get pods -n openshift-ovn-kubernetes $p \
    4. -o json | jq -r '.status.containerStatuses[] | .name');do echo ---$container---; \
    5. oc logs -c $container $p -n openshift-ovn-kubernetes --tail=5; done; done
  3. View the last 5 lines of every log in every container in an ovnkube-node pod using the following command:

    1. $ oc logs -l app=ovnkube-node -n openshift-ovn-kubernetes --all-containers --tail 5

Viewing the OVN-Kubernetes logs using the web console

You can view the logs for each of the pods in the ovnkube-master and ovnkube-node pods in the web console.

Prerequisites

  • Access to the OpenShift CLI (oc).

Procedure

  1. In the OKD console, navigate to WorkloadsPods or navigate to the pod through the resource you want to investigate.

  2. Select the openshift-ovn-kubernetes project from the drop-down menu.

  3. Click the name of the pod you want to investigate.

  4. Click Logs. By default for the ovnkube-master the logs associated with the northd container are displayed.

  5. Use the down-down menu to select logs for each container in turn.

Changing the OVN-Kubernetes log levels

The default log level for OVN-Kubernetes is 4. To debug OVN-Kubernetes, set the log level to 5. Follow this procedure to increase the log level of the OVN-Kubernetes to help you debug an issue.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.

  • You have access to the OpenShift Container Platform web console.

Procedure

  1. Run the following command to get detailed information for all pods in the OVN-Kubernetes project:

    1. $ oc get po -o wide -n openshift-ovn-kubernetes

    Example output

    1. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    2. ovnkube-control-plane-65497d4548-9ptdr 2/2 Running 2 (128m ago) 147m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none>
    3. ovnkube-control-plane-65497d4548-j6zfk 2/2 Running 0 147m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none>
    4. ovnkube-control-plane-65497d4548-k7xqt 2/2 Running 0 147m 10.0.0.4 ci-ln-3njdr9b-72292-5nwkp-master-1 <none> <none>
    5. ovnkube-node-5dx44 8/8 Running 0 146m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none>
    6. ovnkube-node-dpfn4 8/8 Running 0 146m 10.0.0.4 ci-ln-3njdr9b-72292-5nwkp-master-1 <none> <none>
    7. ovnkube-node-kwc9l 8/8 Running 0 134m 10.0.128.2 ci-ln-3njdr9b-72292-5nwkp-worker-a-2fjcj <none> <none>
    8. ovnkube-node-mcrhl 8/8 Running 0 134m 10.0.128.4 ci-ln-3njdr9b-72292-5nwkp-worker-c-v9x5v <none> <none>
    9. ovnkube-node-nsct4 8/8 Running 0 146m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none>
    10. ovnkube-node-zrj9f 8/8 Running 0 134m 10.0.128.3 ci-ln-3njdr9b-72292-5nwkp-worker-b-v78h7 <none> <none>
  2. Create a ConfigMap file similar to the following example and use a filename such as env-overrides.yaml:

    Example ConfigMap file

    1. kind: ConfigMap
    2. apiVersion: v1
    3. metadata:
    4. name: env-overrides
    5. namespace: openshift-ovn-kubernetes
    6. data:
    7. ci-ln-3njdr9b-72292-5nwkp-master-0: | (1)
    8. # This sets the log level for the ovn-kubernetes node process:
    9. OVN_KUBE_LOG_LEVEL=5
    10. # You might also/instead want to enable debug logging for ovn-controller:
    11. OVN_LOG_LEVEL=dbg
    12. ci-ln-3njdr9b-72292-5nwkp-master-2: |
    13. # This sets the log level for the ovn-kubernetes node process:
    14. OVN_KUBE_LOG_LEVEL=5
    15. # You might also/instead want to enable debug logging for ovn-controller:
    16. OVN_LOG_LEVEL=dbg
    17. _master: | (2)
    18. # This sets the log level for the ovn-kubernetes master process as well as the ovn-dbchecker:
    19. OVN_KUBE_LOG_LEVEL=5
    20. # You might also/instead want to enable debug logging for northd, nbdb and sbdb on all masters:
    21. OVN_LOG_LEVEL=dbg
    1Specify the name of the node you want to set the debug log level on.
    2Specify _master to set the log levels of ovnkube-master components.
  3. Apply the ConfigMap file by using the following command:

    1. $ oc apply -n openshift-ovn-kubernetes -f env-overrides.yaml

    Example output

    1. configmap/env-overrides.yaml created
  4. Restart the ovnkube pods to apply the new log level by using the following commands:

    1. $ oc delete pod -n openshift-ovn-kubernetes \
    2. --field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-0 -l app=ovnkube-node
    1. $ oc delete pod -n openshift-ovn-kubernetes \
    2. --field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-2 -l app=ovnkube-node
    1. $ oc delete pod -n openshift-ovn-kubernetes -l app=ovnkube-node
  5. To verify that the `ConfigMap`file has been applied to all nodes for a specific pod, run the following command:

    1. $ oc logs -n openshift-ovn-kubernetes --all-containers --prefix ovnkube-node-<xxxx> | grep -E -m 10 '(Logging config:|vconsole|DBG)'

    where:

    <XXXX>

    Specifies the random sequence of letters for a pod from the previous step.

    Example output

    1. [pod/ovnkube-node-2cpjc/sbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_sb_ovsdb
    2. [pod/ovnkube-node-2cpjc/ovnkube-controller] I1012 14:39:59.984506 35767 config.go:2247] Logging config: {File: CNIFile:/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:5 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20}
    3. [pod/ovnkube-node-2cpjc/northd] + exec ovn-northd --no-chdir -vconsole:info -vfile:off '-vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' --pidfile /var/run/ovn/ovn-northd.pid --n-threads=1
    4. [pod/ovnkube-node-2cpjc/nbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_nb_ovsdb
    5. [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.552Z|00002|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (32 nodes total across 32 buckets)
    6. [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00003|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (64 nodes total across 64 buckets)
    7. [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00004|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 7 nodes (32 nodes total across 32 buckets)
    8. [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00005|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering BACKOFF
    9. [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00007|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering CONNECTING
    10. [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00008|ovsdb_cs|DBG|unix:/var/run/openvswitch/db.sock: SERVER_SCHEMA_REQUESTED -> SERVER_SCHEMA_REQUESTED at lib/ovsdb-cs.c:423
  6. Optional: Check the ConfigMap file has been applied by running the following command:

    1. for f in $(oc -n openshift-ovn-kubernetes get po -l 'app=ovnkube-node' --no-headers -o custom-columns=N:.metadata.name) ; do echo "---- $f ----" ; oc -n openshift-ovn-kubernetes exec -c ovnkube-controller $f -- pgrep -a -f init-ovnkube-controller | grep -P -o '^.*loglevel\s+\d' ; done

    Example output

    1. ---- ovnkube-node-2dt57 ----
    2. 60981 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-c-vmh5n.c.openshift-qe.internal --init-node xpst8-worker-c-vmh5n.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
    3. ---- ovnkube-node-4zznh ----
    4. 178034 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-2.c.openshift-qe.internal --init-node xpst8-master-2.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
    5. ---- ovnkube-node-548sx ----
    6. 77499 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-a-fjtnb.c.openshift-qe.internal --init-node xpst8-worker-a-fjtnb.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
    7. ---- ovnkube-node-6btrf ----
    8. 73781 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-b-p8rww.c.openshift-qe.internal --init-node xpst8-worker-b-p8rww.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
    9. ---- ovnkube-node-fkc9r ----
    10. 130707 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-0.c.openshift-qe.internal --init-node xpst8-master-0.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 5
    11. ---- ovnkube-node-tk9l4 ----
    12. 181328 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-1.c.openshift-qe.internal --init-node xpst8-master-1.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4

Checking the OVN-Kubernetes pod network connectivity

The connectivity check controller, in OKD 4.10 and later, orchestrates connection verification checks in your cluster. These include Kubernetes API, OpenShift API and individual nodes. The results for the connection tests are stored in PodNetworkConnectivity objects in the openshift-network-diagnostics namespace. Connection tests are performed every minute in parallel.

Prerequisites

  • Access to the OpenShift CLI (oc).

  • Access to the cluster as a user with the cluster-admin role.

  • You have installed jq.

Procedure

  1. To list the current PodNetworkConnectivityCheck objects, enter the following command:

    1. $ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics
  2. View the most recent success for each connection object by using the following command:

    1. $ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
    2. -o json | jq '.items[]| .spec.targetEndpoint,.status.successes[0]'
  3. View the most recent failures for each connection object by using the following command:

    1. $ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
    2. -o json | jq '.items[]| .spec.targetEndpoint,.status.failures[0]'
  4. View the most recent outages for each connection object by using the following command:

    1. $ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
    2. -o json | jq '.items[]| .spec.targetEndpoint,.status.outages[0]'

    The connectivity check controller also logs metrics from these checks into Prometheus.

  5. View all the metrics by running the following command:

    1. $ oc exec prometheus-k8s-0 -n openshift-monitoring -- \
    2. promtool query instant http://localhost:9090 \
    3. '{component="openshift-network-diagnostics"}'
  6. View the latency between the source pod and the openshift api service for the last 5 minutes:

    1. $ oc exec prometheus-k8s-0 -n openshift-monitoring -- \
    2. promtool query instant http://localhost:9090 \
    3. '{component="openshift-network-diagnostics"}'

Additional resources