Troubleshooting network issues
- How the network interface is selected
  - Optional: Overriding the default node IP selection logic
- Troubleshooting Open vSwitch issues

Troubleshooting network issues

How the network interface is selected

For installations on bare metal or with virtual machines that have more than one network interface controller (NIC), the NIC that OKD uses for communication with the Kubernetes API server is determined by the nodeip-configuration.service service unit that is run by systemd when the node boots. The nodeip-configuration.service selects the IP from the interface associated with the default route.

After the nodeip-configuration.service service determines the correct NIC, the service creates the /etc/systemd/system/kubelet.service.d/20-nodenet.conf file. The 20-nodenet.conf file sets the KUBELET_NODE_IP environment variable to the IP address that the service selected.

When the kubelet service starts, it reads the value of the environment variable from the 20-nodenet.conf file and sets the IP address as the value of the --node-ip kubelet command-line argument. As a result, the kubelet service uses the selected IP address as the node IP address.

If hardware or networking is reconfigured after installation, or if there is a networking layout where the node IP should not come from the default route interface, it is possible for the nodeip-configuration.service service to select a different NIC after a reboot. In some cases, you might be able to detect that a different NIC is selected by reviewing the INTERNAL-IP column in the output from the oc get nodes -o wide command.

If network communication is disrupted or misconfigured because a different NIC is selected, you might receive the following error: EtcdCertSignerControllerDegraded. You can create a hint file that includes the NODEIP_HINT variable to override the default IP selection logic. For more information, see Optional: Overriding the default node IP selection logic.

Optional: Overriding the default node IP selection logic

To override the default IP selection logic, you can create a hint file that includes the NODEIP_HINT variable to override the default IP selection logic. Creating a hint file allows you to select a specific node IP address from the interface in the subnet of the IP address specified in the NODEIP_HINT variable.

For example, if a node has two interfaces, eth0 with an address of 10.0.0.10/24, and eth1 with an address of 192.0.2.5/24, and the default route points to eth0 (10.0.0.10),the node IP address would normally use the 10.0.0.10 IP address.

Users can configure the NODEIP_HINT variable to point at a known IP in the subnet, for example, a subnet gateway such as 192.0.2.1 so that the other subnet, 192.0.2.0/24, is selected. As a result, the 192.0.2.5 IP address on eth1 is used for the node.

The following procedure shows how to override the default node IP selection logic.

Procedure

Add a hint file to your /etc/default/nodeip-configuration file, for example:

NODEIP_HINT=192.0.2.1

Do not use the exact IP address of a node as a hint, for example, 192.0.2.5. Using the exact IP address of a node causes the node using the hint IP address to fail to configure correctly.
The IP address in the hint file is only used to determine the correct subnet. It will not receive traffic as a result of appearing in the hint file.

Generate the base-64 encoded content by running the following command:
```
$ echo 'NODEIP_HINT=192.0.2.1' | base64
```
Example output
```
Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==
```

Activate the hint by creating a machine config manifest for both master and worker roles before deploying the cluster:

Master machine config manifest

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
 labels:
   machineconfiguration.openshift.io/role: master
   name: 99-nodeip-hint-master
spec:
 config:
   ignition:
     version: 3.2.0
   storage:
     files:
     - contents:
         source: data:text/plain;charset=utf-8;base64, <encoded_content> (1)
       mode: 0644
       overwrite: true
       path: /etc/default/nodeip-configuration

1	Replace `<encoded_contents>` with the base64-encoded content of the `/etc/default/nodeip-configuration` file, for example, `Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==`.

Worker machine config manifest

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
 labels:
   machineconfiguration.openshift.io/role: worker
   name: 99-nodeip-hint-worker
spec:
 config:
   ignition:
     version: 3.2.0
   storage:
     files:
     - contents:
         source: data:text/plain;charset=utf-8;base64, <encoded_content> (1)
       mode: 0644
       overwrite: true
       path: /etc/default/nodeip-configuration

1	Replace `<encoded_contents>` with the base64-encoded content of the `/etc/default/nodeip-configuration` file, for example, `Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==`.

Save the manifest to the directory where you store your cluster configuration, for example, ~/clusterconfigs.
Deploy the cluster.

Troubleshooting Open vSwitch issues

To troubleshoot some Open vSwitch (OVS) issues, you might need to configure the log level to include more information.

If you modify the log level on a node temporarily, be aware that you can receive log messages from the machine config daemon on the node like the following example:

E0514 12:47:17.998892    2281 daemon.go:1350] content mismatch for file /etc/systemd/system/ovs-vswitchd.service: [Unit]

To avoid the log messages related to the mismatch, revert the log level change after you complete your troubleshooting.

Configuring the Open vSwitch log level temporarily

For short-term troubleshooting, you can configure the Open vSwitch (OVS) log level temporarily. The following procedure does not require rebooting the node. In addition, the configuration change does not persist whenever you reboot the node.

After you perform this procedure to change the log level, you can receive log messages from the machine config daemon that indicate a content mismatch for the ovs-vswitchd.service. To avoid the log messages, repeat this procedure and set the log level to the original value.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).

Procedure

Start a debug pod for a node:
```
$ oc debug node/<node_name>
```
Set /host as the root directory within the debug shell. The debug pod mounts the root file system from the host in /host within the pod. By changing the root directory to /host, you can run binaries from the host file system:
```
# chroot /host
```

View the current syslog level for OVS modules:

# ovs-appctl vlog/list

The following example output shows the log level for syslog set to info.

Example output

                 console    syslog    file
                 -------    ------    ------
backtrace          OFF       INFO       INFO
bfd                OFF       INFO       INFO
bond               OFF       INFO       INFO
bridge             OFF       INFO       INFO
bundle             OFF       INFO       INFO
bundles            OFF       INFO       INFO
cfm                OFF       INFO       INFO
collectors         OFF       INFO       INFO
command_line       OFF       INFO       INFO
connmgr            OFF       INFO       INFO
conntrack          OFF       INFO       INFO
conntrack_tp       OFF       INFO       INFO
coverage           OFF       INFO       INFO
ct_dpif            OFF       INFO       INFO
daemon             OFF       INFO       INFO
daemon_unix        OFF       INFO       INFO
dns_resolve        OFF       INFO       INFO
dpdk               OFF       INFO       INFO
...

Specify the log level in the /etc/systemd/system/ovs-vswitchd.service.d/10-ovs-vswitchd-restart.conf file:

Restart=always
ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /var/lib/openvswitch'
ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /etc/openvswitch'
ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /run/openvswitch'
ExecStartPost=-/usr/bin/ovs-appctl vlog/set syslog:dbg
ExecReload=-/usr/bin/ovs-appctl vlog/set syslog:dbg

In the preceding example, the log level is set to dbg. Change the last two lines by setting syslog:<log_level> to off, emer, err, warn, info, or dbg. The off log level filters out all log messages.

Restart the service:

# systemctl daemon-reload

# systemctl restart ovs-vswitchd

Configuring the Open vSwitch log level permanently

For long-term changes to the Open vSwitch (OVS) log level, you can change the log level permanently.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).

Procedure

Create a file, such as 99-change-ovs-loglevel.yaml, with a MachineConfig object like the following example:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master  (1)
  name: 99-change-ovs-loglevel
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - dropins:
        - contents: |
            [Service]
              ExecStartPost=-/usr/bin/ovs-appctl vlog/set syslog:dbg  (2)
              ExecReload=-/usr/bin/ovs-appctl vlog/set syslog:dbg
          name: 20-ovs-vswitchd-restart.conf
        name: ovs-vswitchd.service

1	After you perform this procedure to configure control plane nodes, repeat the procedure and set the role to `worker` to configure worker nodes.
2	Set the `syslog:<log_level>` value. Log levels are `off`, `emer`, `err`, `warn`, `info`, or `dbg`. Setting the value to `off` filters out all log messages.

Apply the machine config:

$ oc apply -f 99-change-ovs-loglevel.yaml

Additional resources

Displaying Open vSwitch logs

Use the following procedure to display Open vSwitch (OVS) logs.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).

Procedure

Run one of the following commands:
- Display the logs by using the oc command from outside the cluster:
```
$ oc adm node-logs <node_name> -u ovs-vswitchd
```
- Display the logs after logging on to a node in the cluster:
```
# journalctl -b -f -u ovs-vswitchd.service
```
  One way to log on to a node is by using the oc debug node/<node_name> command.