Changing the MTU for the cluster network

As a cluster administrator, you can change the MTU for the cluster network after cluster installation. This change is disruptive as cluster nodes must be rebooted to finalize the MTU change. You can change the MTU only for clusters using the OVN-Kubernetes or OpenShift SDN cluster network providers.

About the cluster MTU

During installation the maximum transmission unit (MTU) for the cluster network is detected automatically based on the MTU of the primary network interface of nodes in the cluster. You do not normally need to override the detected MTU.

You might want to change the MTU of the cluster network for several reasons:

  • The MTU detected during cluster installation is not correct for your infrastructure

  • Your cluster infrastructure now requires a different MTU, such as from the addition of nodes that need a different MTU for optimal performance

You can change the cluster MTU for only the OVN-Kubernetes and OpenShift SDN cluster network providers.

Service interruption considerations

When you initiate an MTU change on your cluster the following effects might impact service availability:

  • At least two rolling reboots are required to complete the migration to a new MTU. During this time, some nodes are not available as they restart.

  • Specific applications deployed to the cluster with shorter timeout intervals than the absolute TCP timeout interval might experience disruption during the MTU change.

MTU value selection

When planning your MTU migration there are two related but distinct MTU values to consider.

  • Hardware MTU: This MTU value is set based on the specifics of your network infrastructure.

  • Cluster network MTU: This MTU value is always less than your hardware MTU to account for the cluster network overlay overhead. The specific overhead is determined by your cluster network provider:

    • OVN-Kubernetes: 100 bytes

    • OpenShift SDN: 50 bytes

If your cluster requires different MTU values for different nodes, you must subtract the overhead value for your cluster network provider from the lowest MTU value that is used by any node in your cluster. For example, if some nodes in your cluster have an MTU of 9001, and some have an MTU of 1500, you must set this value to 1400.

How the migration process works

The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.

Table 1. Live migration of the cluster MTU
User-initiated stepsOKD activity

Set the following values in the Cluster Network Operator configuration:

  • spec.migration.mtu.machine.to

  • spec.migration.mtu.network.from

  • spec.migration.mtu.network.to

Cluster Network Operator (CNO): Confirms that each field is set to a valid value.

  • The mtu.machine.to must be set to either the new hardware MTU or to the current hardware MTU if the MTU for the hardware is not changing. This value is transient and is used as part of the migration process. Separately, if you specify a hardware MTU that is different from your existing hardware MTU value, you must manually configure the MTU to persist by other means, such as with a machine config, DHCP setting, or a Linux kernel command line.

  • The mtu.network.from field must equal the network.status.clusterNetworkMTU field, which is the current MTU of the cluster network.

  • The mtu.network.to field must be set to the target cluster network MTU and must be lower than the hardware MTU to allow for the overlay overhead of the cluster network provider. For OVN-Kubernetes, the overhead is 100 bytes and for OpenShift SDN the overhead is 50 bytes.

If the values provided are valid, the CNO writes out a new temporary configuration with the MTU for the cluster network set to the value of the mtu.network.to field.

Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster.

Reconfigure the MTU of the primary network interface for the nodes on the cluster. You can use a variety of methods to accomplish this, including:

  • Deploying a new NetworkManager connection profile with the MTU change

  • Changing the MTU through a DHCP server setting

  • Changing the MTU through boot parameters

N/A

Set the mtu value in the CNO configuration for the cluster network provider and set spec.migration to null.

Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster with the new MTU configuration.

Changing the cluster MTU

As a cluster administrator, you can change the maximum transmission unit (MTU) for your cluster. The migration is disruptive and nodes in your cluster might be temporarily unavailable as the MTU update rolls out.

The following procedure describes how to change the cluster MTU by using either machine configs, DHCP, or an ISO. If you use the DHCP or ISO approach, you must refer to configuration artifacts that you kept after installing your cluster to complete the procedure.

Prerequisites

  • You installed the OpenShift CLI (oc).

  • You are logged in to the cluster with a user with cluster-admin privileges.

  • You identified the target MTU for your cluster. The correct MTU varies depending on the cluster network provider that your cluster uses:

    • OVN-Kubernetes: The cluster MTU must be set to 100 less than the lowest hardware MTU value in your cluster.

    • OpenShift SDN: The cluster MTU must be set to 50 less than the lowest hardware MTU value in your cluster.

Procedure

To increase or decrease the MTU for the cluster network complete the following procedure.

  1. To obtain the current MTU for the cluster network, enter the following command:

    1. $ oc describe network.config cluster

    Example output

    1. ...
    2. Status:
    3. Cluster Network:
    4. Cidr: 10.217.0.0/22
    5. Host Prefix: 23
    6. Cluster Network MTU: 1400
    7. Network Type: OpenShiftSDN
    8. Service Network:
    9. 10.217.4.0/23
    10. ...
  2. Prepare your configuration for the hardware MTU:

    • If your hardware MTU is specified with DHCP, update your DHCP configuration such as with the following dnsmasq configuration:

      1. dhcp-option-force=26,<mtu>

      where:

      <mtu>

      Specifies the hardware MTU for the DHCP server to advertise.

    • If your hardware MTU is specified with a kernel command line with PXE, update that configuration accordingly.

    • If your hardware MTU is specified in a NetworkManager connection configuration, complete the following steps. This approach is the default for OKD if you do not explicitly specify your network configuration with DHCP, a kernel command line, or some other method. Your cluster nodes must all use the same underlying network configuration for the following procedure to work unmodified.

      1. Find the primary network interface:

        • If you are using the OpenShift SDN cluster network provider, enter the following command:

          1. $ oc debug node/<node_name> -- chroot /host ip route list match 0.0.0.0/0 | awk '{print $5 }'

          where:

          <node_name>

          Specifies the name of a node in your cluster.

        • If you are using the OVN-Kubernetes cluster network provider, enter the following command:

          1. $ oc debug node/<node_name> -- chroot /host nmcli -g connection.interface-name c show ovs-if-phys0

          where:

          <node_name>

          Specifies the name of a node in your cluster.

      2. To find the connection profile that NetworkManager created for the interface name returned from the previous command, enter the following command:

        1. $ oc debug node/<node_name> -- chroot /host nmcli c | grep <interface>

        where:

        <interface>

        Specifies the name of the primary network interface.

        Example output for OpenShift SDN

        1. Wired connection 1 46da4a6a-xxxx-xxxx-xxxx-ac0ca900f213 ethernet ens3

        Example output for OVN-Kubernetes without an original connection configuration

        1. ovs-if-phys0 353774d3-0d3d-4ada-b14e-cd4d8824e2a8 ethernet ens4
        2. ovs-port-phys0 332ef950-b2e5-4991-a0dc-3158977c35ca ovs-port ens4

        For the OVN-Kubernetes cluster network provider, two or three connection manager profiles are returned.

        • If the previous command returns only two profiles, then you must use a default NetworkManager connection configuration as a template.

        • If the previous command returns three profiles, use the profile that is not named ovs-if-phys0 or ovs-port-phys0 as a template for the following modifications.

      3. To get the file name of the NetworkManager connection configuration for the primary network interface, enter the following command:

        1. $ oc debug node/<node_name> -- chroot /host nmcli -g UUID,FILENAME c show | grep <uuid> | cut -d: -f2

        where:

        <node_name>

        Specifies the name of a node in your cluster.

        <uuid>

        Specifies the UUID of the NetworkManager connection profile.

        Example output

        1. /run/NetworkManager/system-connections/Wired connection 1.nmconnection
      4. To copy the NetworkManager connection configuration from the node, enter the following command:

        1. $ oc debug node/<node_name> -- chroot /host cat "<profile_path>" > config.nmconnection

        where:

        <node_name>

        Specifies the hardware MTU for the primary network interface.

        <profile_path>

        Specifies the file system path of the NetworkManager connection from the previous step.

        Example NetworkManager connection configuration

        1. [connection]
        2. id=Wired connection 1
        3. uuid=3e96a02b-xxxx-xxxx-ad5d-61db28678130
        4. type=ethernet
        5. autoconnect-priority=-999
        6. interface-name=enp1s0
        7. permissions=
        8. timestamp=1644109633
        9. [ethernet]
        10. mac-address-blacklist=
        11. [ipv4]
        12. dns-search=
        13. method=auto
        14. [ipv6]
        15. addr-gen-mode=stable-privacy
        16. dns-search=
        17. method=auto
        18. [proxy]
        19. [.nmmeta]
        20. nm-generated=true
      5. Edit the NetworkManager configuration file saved in the config.nmconnection file from the previous step:

        • Set the following values:

          • 802-3-ethernet.mtu: Specify the MTU for the primary network interface of the system.

          • connection.interface-name: Optional: Specify the network interface name that this configuration applies to.

          • connection.autoconnect-priority: Optional: Consider specifying an integer priority value above 0 to ensure this profile is used over other profiles for the same interface. If you are using the OVN-Kubernetes cluster network provider, this value must be less than 100.

        • Remove the connection.uuid field.

        • Change the following values:

          • connection.id: Optional: Specify a different NetworkManager connection profile name.
  1. Example NetworkManager connection configuration
  2. ```
  3. [connection]
  4. id=Primary network interface
  5. type=ethernet
  6. autoconnect-priority=10
  7. interface-name=enp1s0
  8. [802-3-ethernet]
  9. mtu=8051
  10. ```
  11. 6. Create two `MachineConfig` objects, one for the control plane nodes and another for the worker nodes in your cluster:
  12. 1. Create the following Butane config in the `control-plane-interface.bu` file:
  13. ```
  14. variant: openshift
  15. version: 4.11.0
  16. metadata:
  17. name: 01-control-plane-interface
  18. labels:
  19. machineconfiguration.openshift.io/role: master
  20. storage:
  21. files:
  22. - path: /etc/NetworkManager/system-connections/<connection_name> (1)
  23. contents:
  24. local: config.nmconnection (2)
  25. mode: 0644
  26. ```
  27. <table><tbody><tr><td><i data-value="1"></i><b>1</b></td><td>Specify the NetworkManager connection name for the primary network interface.</td></tr><tr><td><i data-value="2"></i><b>2</b></td><td>Specify the local filename for the updated NetworkManager configuration file from the previous step.</td></tr></tbody></table>
  28. 2. Create the following Butane config in the `worker-interface.bu` file:
  29. ```
  30. variant: openshift
  31. version: 4.11.0
  32. metadata:
  33. name: 01-worker-interface
  34. labels:
  35. machineconfiguration.openshift.io/role: worker
  36. storage:
  37. files:
  38. - path: /etc/NetworkManager/system-connections/<connection_name> (1)
  39. contents:
  40. local: config.nmconnection (2)
  41. mode: 0644
  42. ```
  43. <table><tbody><tr><td><i data-value="1"></i><b>1</b></td><td>Specify the NetworkManager connection name for the primary network interface.</td></tr><tr><td><i data-value="2"></i><b>2</b></td><td>Specify the local filename for the updated NetworkManager configuration file from the previous step.</td></tr></tbody></table>
  44. 3. Create `MachineConfig` objects from the Butane configs by running the following command:
  45. ```
  46. $ for manifest in control-plane-interface worker-interface; do
  47. butane --files-dir . $manifest.bu > $manifest.yaml
  48. done
  49. ```
  1. To begin the MTU migration, specify the migration configuration by entering the following command. The Machine Config Operator performs a rolling reboot of the nodes in the cluster in preparation for the MTU change.

    1. $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
    2. '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'

    where:

    <overlay_from>

    Specifies the current cluster network MTU value.

    <overlay_to>

    Specifies the target MTU for the cluster network. This value is set relative to the value for <machine_to> and for OVN-Kubernetes must be 100 less and for OpenShift SDN must be 50 less.

    <machine_to>

    Specifies the MTU for the primary network interface on the underlying host network.

    Example that increases the cluster MTU

    1. $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
    2. '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'
  2. As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    1. $ oc get mcp

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

    By default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

  3. Confirm the status of the new machine configuration on the hosts:

    1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:

      1. $ oc describe node | egrep "hostname|machineconfig"

      Example output

      1. kubernetes.io/hostname=master-0
      2. machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      3. machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      4. machineconfiguration.openshift.io/reason:
      5. machineconfiguration.openshift.io/state: Done

      Verify that the following statements are true:

      • The value of machineconfiguration.openshift.io/state field is Done.

      • The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.

    2. To confirm that the machine config is correct, enter the following command:

      1. $ oc get machineconfig <config_name> -o yaml | grep ExecStart

      where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

      The machine config must include the following update to the systemd configuration:

      1. ExecStart=/usr/local/bin/mtu-migration.sh
  4. Update the underlying network interface MTU value:

    • If are specifying the new MTU with a NetworkManager connection configuration, enter the following command. The MachineConfig Operator automatically performs a rolling reboot of the nodes in your cluster.

      1. $ for manifest in control-plane-interface worker-interface; do
      2. oc create -f $manifest.yaml
      3. done
    • If are specifying the new MTU with a DHCP server option or a kernel command line and PXE, make the necessary changes for your infrastructure.

  5. As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    1. $ oc get mcp

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

    By default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

  6. Confirm the status of the new machine configuration on the hosts:

    1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:

      1. $ oc describe node | egrep "hostname|machineconfig"

      Example output

      1. kubernetes.io/hostname=master-0
      2. machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      3. machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      4. machineconfiguration.openshift.io/reason:
      5. machineconfiguration.openshift.io/state: Done

      Verify that the following statements are true:

      • The value of machineconfiguration.openshift.io/state field is Done.

      • The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.

    2. To confirm that the machine config is correct, enter the following command:

      1. $ oc get machineconfig <config_name> -o yaml | grep path:

      where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

      If the machine config is successfully deployed, the previous output contains the /etc/NetworkManager/system-connections/<connection_name> file path.

      The machine config must not contain the ExecStart=/usr/local/bin/mtu-migration.sh line.

  7. To finalize the MTU migration, enter one of the following commands:

    • If you are using the OVN-Kubernetes cluster network provider:

      1. $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
      2. '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'

      where:

      <mtu>

      Specifies the new cluster network MTU that you specified with <overlay_to>.

    • If you are using the OpenShift SDN cluster network provider:

      1. $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
      2. '{"spec": { "migration": null, "defaultNetwork":{ "openshiftSDNConfig": { "mtu": <mtu> }}}}'

      where:

      <mtu>

      Specifies the new cluster network MTU that you specified with <overlay_to>.

Verification

You can verify that a node in your cluster uses an MTU that you specified in the previous procedure.

  1. To get the current MTU for the cluster network, enter the following command:

    1. $ oc describe network.config cluster
  2. Get the current MTU for the primary network interface of a node.

    1. To list the nodes in your cluster, enter the following command:

      1. $ oc get nodes
    2. To obtain the current MTU setting for the primary network interface on a node, enter the following command:

      1. $ oc debug node/<node> -- chroot /host ip address show <interface>

      where:

      <node>

      Specifies a node from the output from the previous step.

      <interface>

      Specifies the primary network interface name for the node.

      Example output

      1. ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8051

Additional resources