Configuring hardware offloading

As a cluster administrator, you can configure hardware offloading on compatible nodes to increase data processing performance and reduce load on host CPUs.

About hardware offloading

Open vSwitch hardware offloading is a method of processing network tasks by diverting them away from the CPU and offloading them to a dedicated processor on a network interface controller. As a result, clusters can benefit from faster data transfer speeds, reduced CPU workloads, and lower computing costs.

The key element for this feature is a modern class of network interface controllers known as SmartNICs. A SmartNIC is a network interface controller that is able to handle computationally-heavy network processing tasks. In the same way that a dedicated graphics card can improve graphics performance, a SmartNIC can improve network performance. In each case, a dedicated processor improves performance for a specific type of processing task.

In OKD, you can configure hardware offloading for bare metal nodes that have a compatible SmartNIC. Hardware offloading is configured and enabled by the SR-IOV Network Operator.

Hardware offloading is not compatible with all workloads or application types. Only the following two communication types are supported:

  • pod-to-pod

  • pod-to-service, where the service is a ClusterIP service backed by a regular pod

In all cases, hardware offloading takes place only when those pods and services are assigned to nodes that have a compatible SmartNIC. Suppose, for example, that a pod on a node with hardware offloading tries to communicate with a service on a regular node. On the regular node, all the processing takes place in the kernel, so the overall performance of the pod-to-service communication is limited to the maximum performance of that regular node. Hardware offloading is not compatible with DPDK applications.

Supported devices

Hardware offloading is supported on the following network interface controllers:

Table 1. Supported network interface controllers
ManufacturerModelVendor IDDevice ID

Mellanox

MT27800 Family [ConnectX‑5]

15b3

1017

Mellanox

MT28880 Family [ConnectX‑5 Ex]

15b3

1019

Table 2. Technology Preview network interface controllers
ManufacturerModelVendor IDDevice ID

Mellanox

MT2892 Family [ConnectX-6 Dx]

15b3

101d

Mellanox

MT2894 Family [ConnectX-6 Lx]

15b3

101f

Mellanox

MT42822 BlueField-2 in ConnectX-6 NIC mode

15b3

a2d6

Using a ConnectX-6 Lx or BlueField-2 in ConnectX-6 NIC mode device is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Prerequisites

Configuring a machine config pool for hardware offloading

To enable hardware offloading, you must first create a dedicated machine config pool and configure it to work with the SR-IOV Network Operator.

Prerequisites

  • You installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  1. Create a machine config pool for machines you want to use hardware offloading on.

    1. Create a file, such as mcp-offloading.yaml, with content like the following example:

      1. apiVersion: machineconfiguration.openshift.io/v1
      2. kind: MachineConfigPool
      3. metadata:
      4. name: mcp-offloading (1)
      5. spec:
      6. machineConfigSelector:
      7. matchExpressions:
      8. - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp-offloading]} (1)
      9. nodeSelector:
      10. matchLabels:
      11. node-role.kubernetes.io/mcp-offloading: "" (2)
      1The name of your machine config pool for hardware offloading.
      2This node role label is used to add nodes to the machine config pool.
    2. Apply the configuration for the machine config pool:

      1. $ oc create -f mcp-offloading.yaml
  2. Add nodes to the machine config pool. Label each node with the node role label of your pool:

    1. $ oc label node worker-2 node-role.kubernetes.io/mcp-offloading=""
  3. Optional: To verify that the new pool is created, run the following command:

    1. $ oc get nodes

    Example output

    1. NAME STATUS ROLES AGE VERSION
    2. master-0 Ready master 2d v1.25.0
    3. master-1 Ready master 2d v1.25.0
    4. master-2 Ready master 2d v1.25.0
    5. worker-0 Ready worker 2d v1.25.0
    6. worker-1 Ready worker 2d v1.25.0
    7. worker-2 Ready mcp-offloading,worker 47h v1.25.0
    8. worker-3 Ready mcp-offloading,worker 47h v1.25.0
  4. Add this machine config pool to the SriovNetworkPoolConfig custom resource:

    1. Create a file, such as sriov-pool-config.yaml, with content like the following example:

      1. apiVersion: sriovnetwork.openshift.io/v1
      2. kind: SriovNetworkPoolConfig
      3. metadata:
      4. name: sriovnetworkpoolconfig-offload
      5. namespace: openshift-sriov-network-operator
      6. spec:
      7. ovsHardwareOffloadConfig:
      8. name: mcp-offloading (1)
      1The name of your machine config pool for hardware offloading.
    2. Apply the configuration:

      1. $ oc create -f <SriovNetworkPoolConfig_name>.yaml

      When you apply the configuration specified in a SriovNetworkPoolConfig object, the SR-IOV Operator drains and restarts the nodes in the machine config pool.

      It might take several minutes for a configuration changes to apply.

Configuring the SR-IOV network node policy

You can create an SR-IOV network device configuration for a node by creating an SR-IOV network node policy. To enable hardware offloading, you must define the .spec.eSwitchMode field with the value "switchdev".

The following procedure creates an SR-IOV interface for a network interface controller with hardware offloading.

Prerequisites

  • You installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  1. Create a file, such as sriov-node-policy.yaml, with content like the following example:

    1. apiVersion: sriovnetwork.openshift.io/v1
    2. kind: SriovNetworkNodePolicy
    3. metadata:
    4. name: sriov-node-policy (1)
    5. namespace: openshift-sriov-network-operator
    6. spec:
    7. deviceType: netdevice (2)
    8. eSwitchMode: "switchdev" (3)
    9. nicSelector:
    10. deviceID: "1019"
    11. rootDevices:
    12. - 0000:d8:00.0
    13. vendor: "15b3"
    14. pfNames:
    15. - ens8f0
    16. nodeSelector:
    17. feature.node.kubernetes.io/network-sriov.capable: "true"
    18. numVfs: 6
    19. priority: 5
    20. resourceName: mlxnics
    1The name for the custom resource object.
    2Required. Hardware offloading is not supported with vfio-pci.
    3Required.
  2. Apply the configuration for the policy:

    1. $ oc create -f sriov-node-policy.yaml

    When you apply the configuration specified in a SriovNetworkPoolConfig object, the SR-IOV Operator drains and restarts the nodes in the machine config pool.

    It might take several minutes for a configuration change to apply.

An example SR-IOV network node policy for OpenStack

The following example describes an SR-IOV interface for a network interface controller (NIC) with hardware offloading on OpenStack.

An SR-IOV interface for a NIC with hardware offloading on OpenStack

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovNetworkNodePolicy
  3. metadata:
  4. name: ${name}
  5. namespace: openshift-sriov-network-operator
  6. spec:
  7. deviceType: switchdev
  8. isRdma: true
  9. nicSelector:
  10. netFilter: openstack/NetworkID:${net_id}
  11. nodeSelector:
  12. feature.node.kubernetes.io/network-sriov.capable: 'true'
  13. numVfs: 1
  14. priority: 99
  15. resourceName: ${name}

Creating a network attachment definition

After you define the machine config pool and the SR-IOV network node policy, you can create a network attachment definition for the network interface card you specified.

Prerequisites

  • You installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  1. Create a file, such as net-attach-def.yaml, with content like the following example:

    1. apiVersion: "k8s.cni.cncf.io/v1"
    2. kind: NetworkAttachmentDefinition
    3. metadata:
    4. name: net-attach-def (1)
    5. namespace: net-attach-def (2)
    6. annotations:
    7. k8s.v1.cni.cncf.io/resourceName: openshift.io/mlxnics (3)
    8. spec:
    9. config: '{"cniVersion":"0.3.1","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{}}'
    1The name for your network attachment definition.
    2The namespace for your network attachment definition.
    3This is the value of the spec.resourceName field you specified in the SriovNetworkNodePolicy object.
  2. Apply the configuration for the network attachment definition:

    1. $ oc create -f net-attach-def.yaml

Verification

  • Run the following command to see whether the new definition is present:

    1. $ oc get net-attach-def -A

    Example output

    1. NAMESPACE NAME AGE
    2. net-attach-def net-attach-def 43h

Adding the network attachment definition to your pods

After you create the machine config pool, the SriovNetworkPoolConfig and SriovNetworkNodePolicy custom resources, and the network attachment definition, you can apply these configurations to your pods by adding the network attachment definition to your pod specifications.

Procedure

  • In the pod specification, add the .metadata.annotations.k8s.v1.cni.cncf.io/networks field and specify the network attachment definition you created for hardware offloading:

    1. ....
    2. metadata:
    3. annotations:
    4. k8s.v1.cni.cncf.io/default-network: net-attach-def/net-attach-def (1)
    1The value must be the name and namespace of the network attachment definition you created for hardware offloading.