Troubleshooting network issues

How the network interface is selected

For installations on bare metal or with virtual machines that have more than one network interface controller (NIC), the NIC that OKD uses for communication with the Kubernetes API server is determined by the nodeip-configuration.service service unit that is run by systemd when the node boots. The service iterates through the network interfaces on the node and the first network interface that is configured with a subnet than can host the IP address for the API server is selected for OKD communication.

After the nodeip-configuration.service service determines the correct NIC, the service creates the /etc/systemd/system/kubelet.service.d/20-nodenet.conf file. The 20-nodenet.conf file sets the KUBELET_NODE_IP environment variable to the IP address that the service selected.

When the kubelet service starts, it reads the value of the environment variable from the 20-nodenet.conf file and sets the IP address as the value to the --node-ip kubelet command-line argument. As a result, the kubelet service uses the selected IP address as the node IP address.

If hardware or networking is reconfigured after installation, it is possible that the nodeip-configuration.service service can select a different NIC after a reboot. In some cases, you might be able to detect that a different NIC is selected by reviewing the INTERNAL-IP column in the output from the oc get nodes -o wide command.

If network communication is disrupted or misconfigured because a different NIC is selected, one strategy for overriding the selection process is to set the correct IP address explicitly. The following list identifies the high-level steps and considerations:

  • Create a shell script that determines the IP address to use for OKD communication. Have the script create a custom unit file such as /etc/systemd/system/kubelet.service.d/98-nodenet-override.conf. Use the custom unit file, 98-nodenet-override.conf, to set the KUBELET_NODE_IP environment variable to the IP address.

  • Do not overwrite the /etc/systemd/system/kubelet.service.d/20-nodenet.conf file. Specify a file name with a numerically higher value such as 98-nodenet-override.conf in the same directory path. The goal is to have the custom unit file run after 20-nodenet.conf and override the value of the environment variable.

  • Create a machine config object with the shell script as a base64-encoded string and use the Machine Config Operator to deploy the script to the nodes at a file system path such as /usr/local/bin/override-node-ip.sh.

  • Ensure that systemctl daemon-reload runs after the shell script runs. The simplest method is to specify ExecStart=systemctl daemon-reload in the machine config, as shown in the following sample.

Sample machine config to override the network interface for kubelet

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: worker
  6. name: 98-nodenet-override
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:text/plain;charset=utf-8;base64,<encoded_script>
  15. mode: 0755
  16. overwrite: true
  17. path: /usr/local/bin/override-node-ip.sh
  18. systemd:
  19. units:
  20. - contents: |
  21. [Unit]
  22. Description=Override node IP detection
  23. Wants=network-online.target
  24. Before=kubelet.service
  25. After=network-online.target
  26. [Service]
  27. Type=oneshot
  28. ExecStart=/usr/local/bin/override-node-ip.sh
  29. ExecStart=systemctl daemon-reload
  30. [Install]
  31. WantedBy=multi-user.target
  32. enabled: true
  33. name: nodenet-override.service

Troubleshooting Open vSwitch issues

To troubleshoot some Open vSwitch (OVS) issues, you might need to configure the log level to include more information.

If you modify the log level on a node temporarily, be aware that you can receive log messages from the machine config daemon on the node like the following example:

  1. E0514 12:47:17.998892 2281 daemon.go:1350] content mismatch for file /etc/systemd/system/ovs-vswitchd.service: [Unit]

To avoid the log messages related to the mismatch, revert the log level change after you complete your troubleshooting.

Configuring the Open vSwitch log level temporarily

For short-term troubleshooting, you can configure the Open vSwitch (OVS) log level temporarily. The following procedure does not require rebooting the node. In addition, the configuration change does not persist whenever you reboot the node.

After you perform this procedure to change the log level, you can receive log messages from the machine config daemon that indicate a content mismatch for the ovs-vswitchd.service. To avoid the log messages, repeat this procedure and set the log level to the original value.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure

  1. Start a debug pod for a node:

    1. $ oc debug node/<node_name>
  2. Set /host as the root directory within the debug shell. The debug pod mounts the root file system from the host in /host within the pod. By changing the root directory to /host, you can run binaries from the host file system:

    1. # chroot /host
  3. View the current syslog level for OVS modules:

    1. # ovs-appctl vlog/list

    The following example output shows the log level for syslog set to info.

    Example output

    1. console syslog file
    2. ------- ------ ------
    3. backtrace OFF INFO INFO
    4. bfd OFF INFO INFO
    5. bond OFF INFO INFO
    6. bridge OFF INFO INFO
    7. bundle OFF INFO INFO
    8. bundles OFF INFO INFO
    9. cfm OFF INFO INFO
    10. collectors OFF INFO INFO
    11. command_line OFF INFO INFO
    12. connmgr OFF INFO INFO
    13. conntrack OFF INFO INFO
    14. conntrack_tp OFF INFO INFO
    15. coverage OFF INFO INFO
    16. ct_dpif OFF INFO INFO
    17. daemon OFF INFO INFO
    18. daemon_unix OFF INFO INFO
    19. dns_resolve OFF INFO INFO
    20. dpdk OFF INFO INFO
    21. ...
  4. Specify the log level in the /etc/systemd/system/ovs-vswitchd.service.d/10-ovs-vswitchd-restart.conf file:

    1. Restart=always
    2. ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /var/lib/openvswitch'
    3. ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /etc/openvswitch'
    4. ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /run/openvswitch'
    5. ExecStartPost=-/usr/bin/ovs-appctl vlog/set syslog:dbg
    6. ExecReload=-/usr/bin/ovs-appctl vlog/set syslog:dbg

    In the preceding example, the log level is set to dbg. Change the last two lines by setting syslog:<log_level> to off, emer, err, warn, info, or dbg. The off log level filters out all log messages.

  5. Restart the service:

    1. # systemctl daemon-reload
    1. # systemctl restart ovs-vswitchd

Configuring the Open vSwitch log level permanently

For long-term changes to the Open vSwitch (OVS) log level, you can change the log level permanently.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create a file, such as 99-change-ovs-loglevel.yaml, with a MachineConfig object like the following example:

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfig
    3. metadata:
    4. labels:
    5. machineconfiguration.openshift.io/role: master (1)
    6. name: 99-change-ovs-loglevel
    7. spec:
    8. config:
    9. ignition:
    10. version: 3.2.0
    11. systemd:
    12. units:
    13. - dropins:
    14. - contents: |
    15. [Service]
    16. ExecStartPost=-/usr/bin/ovs-appctl vlog/set syslog:dbg (2)
    17. ExecReload=-/usr/bin/ovs-appctl vlog/set syslog:dbg
    18. name: 20-ovs-vswitchd-restart.conf
    19. name: ovs-vswitchd.service
    1After you perform this procedure to configure control plane nodes, repeat the procedure and set the role to worker to configure worker nodes.
    2Set the syslog:<log_level> value. Log levels are off, emer, err, warn, info, or dbg. Setting the value to off filters out all log messages.
  2. Apply the machine config:

    1. $ oc apply -f 99-change-ovs-loglevel.yaml

Additional resources

Displaying Open vSwitch logs

Use the following procedure to display Open vSwitch (OVS) logs.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the OpenShift CLI (oc).

Procedure

  • Run one of the following commands:

    • Display the logs by using the oc command from outside the cluster:

      1. $ oc adm node-logs -u ovs-vswitchd
    • Display the logs after logging on to a node in the cluster:

      1. # journalctl -b -f -u ovs-vswitchd.service

      One way to log on to a node is by using the oc debug node/<node_name> command.