Using DPDK with SR-IOV

The Data Plane Development Kit (DPDK) provides a set of libraries and drivers for fast packet processing.

You can configure clusters and virtual machines (VMs) to run DPDK workloads over SR-IOV networks.

Running DPDK workloads is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Configuring a cluster for DPDK workloads

You can configure an OKD cluster to run Data Plane Development Kit (DPDK) workloads for improved network performance.

Prerequisites

  • You have access to the cluster as a user with cluster-admin permissions.

  • You have installed the OpenShift CLI (oc).

  • You have installed the SR-IOV Network Operator.

  • You have installed the Node Tuning Operator.

Procedure

  1. Map your compute nodes topology to determine which Non-Uniform Memory Access (NUMA) CPUs are isolated for DPDK applications and which ones are reserved for the operating system (OS).

  2. Label a subset of the compute nodes with a custom role; for example, worker-dpdk:

    1. $ oc label node <node_name> node-role.kubernetes.io/worker-dpdk=""
  3. Create a new MachineConfigPool manifest that contains the worker-dpdk label in the spec.machineConfigSelector object:

    Example MachineConfigPool manifest

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfigPool
    3. metadata:
    4. name: worker-dpdk
    5. labels:
    6. machineconfiguration.openshift.io/role: worker-dpdk
    7. spec:
    8. machineConfigSelector:
    9. matchExpressions:
    10. - key: machineconfiguration.openshift.io/role
    11. operator: In
    12. values:
    13. - worker
    14. - worker-dpdk
    15. nodeSelector:
    16. matchLabels:
    17. node-role.kubernetes.io/worker-dpdk: ""
  4. Create a PerformanceProfile manifest that applies to the labeled nodes and the machine config pool that you created in the previous steps. The performance profile specifies the CPUs that are isolated for DPDK applications and the CPUs that are reserved for house keeping.

    Example PerformanceProfile manifest

    1. apiVersion: performance.openshift.io/v2
    2. kind: PerformanceProfile
    3. metadata:
    4. name: profile-1
    5. spec:
    6. cpu:
    7. isolated: 4-39,44-79
    8. reserved: 0-3,40-43
    9. globallyDisableIrqLoadBalancing: true
    10. hugepages:
    11. defaultHugepagesSize: 1G
    12. pages:
    13. - count: 8
    14. node: 0
    15. size: 1G
    16. net:
    17. userLevelNetworking: true
    18. nodeSelector:
    19. node-role.kubernetes.io/worker-dpdk: ""
    20. numa:
    21. topologyPolicy: single-numa-node

    The compute nodes automatically restart after you apply the MachineConfigPool and PerformanceProfile manifests.

  5. Retrieve the name of the generated RuntimeClass resource from the status.runtimeClass field of the PerformanceProfile object:

    1. $ oc get performanceprofiles.performance.openshift.io profile-1 -o=jsonpath='{.status.runtimeClass}{"\n"}'
  6. Set the previously obtained RuntimeClass name as the default container runtime class for the virt-launcher pods by editing the HyperConverged custom resource (CR):

    1. $ oc patch hyperconverged kubevirt-hyperconverged -n kubevirt-hyperconverged \
    2. --type='json' -p='[{"op": "add", "path": "/spec/defaultRuntimeClass", "value":"<runtimeclass-name>"}]'

    Editing the HyperConverged CR changes a global setting that affects all VMs that are created after the change is applied.

  7. If your DPDK-enabled compute nodes use Simultaneous multithreading (SMT), enable the AlignCPUs enabler by editing the HyperConverged CR:

    1. $ oc patch hyperconverged kubevirt-hyperconverged -n kubevirt-hyperconverged \
    2. --type='json' -p='[{"op": "replace", "path": "/spec/featureGates/alignCPUs", "value": true}]'

    Enabling AlignCPUs allows OKD Virtualization to request up to two additional dedicated CPUs to bring the total CPU count to an even parity when using emulator thread isolation.

  8. Create an SriovNetworkNodePolicy object with the spec.deviceType field set to vfio-pci:

    Example SriovNetworkNodePolicy manifest

    1. apiVersion: sriovnetwork.openshift.io/v1
    2. kind: SriovNetworkNodePolicy
    3. metadata:
    4. name: policy-1
    5. namespace: openshift-sriov-network-operator
    6. spec:
    7. resourceName: intel_nics_dpdk
    8. deviceType: vfio-pci
    9. mtu: 9000
    10. numVfs: 4
    11. priority: 99
    12. nicSelector:
    13. vendor: "8086"
    14. deviceID: "1572"
    15. pfNames:
    16. - eno3
    17. rootDevices:
    18. - "0000:19:00.2"
    19. nodeSelector:
    20. feature.node.kubernetes.io/network-sriov.capable: "true"

Additional resources

Configuring a project for DPDK workloads

You can configure the project to run DPDK workloads on SR-IOV hardware.

Prerequisites

  • Your cluster is configured to run DPDK workloads.

Procedure

  1. Create a namespace for your DPDK applications:

    1. $ oc create ns dpdk-checkup-ns
  2. Create an SriovNetwork object that references the SriovNetworkNodePolicy object. When you create an SriovNetwork object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition object.

    Example SriovNetwork manifest

    1. apiVersion: sriovnetwork.openshift.io/v1
    2. kind: SriovNetwork
    3. metadata:
    4. name: dpdk-sriovnetwork
    5. namespace: openshift-sriov-network-operator
    6. spec:
    7. ipam: |
    8. {
    9. "type": "host-local",
    10. "subnet": "10.56.217.0/24",
    11. "rangeStart": "10.56.217.171",
    12. "rangeEnd": "10.56.217.181",
    13. "routes": [{
    14. "dst": "0.0.0.0/0"
    15. }],
    16. "gateway": "10.56.217.1"
    17. }
    18. networkNamespace: dpdk-checkup-ns (1)
    19. resourceName: intel_nics_dpdk (2)
    20. spoofChk: "off"
    21. trust: "on"
    22. vlan: 1019
    1The namespace where the NetworkAttachmentDefinition object is deployed.
    2The value of the spec.resourceName attribute of the SriovNetworkNodePolicy object that was created when configuring the cluster for DPDK workloads.
  3. Optional: Run the virtual machine latency checkup to verify that the network is properly configured.

  4. Optional: Run the DPDK checkup to verify that the namespace is ready for DPDK workloads.

Additional resources

Configuring a virtual machine for DPDK workloads

You can run Data Packet Development Kit (DPDK) workloads on virtual machines (VMs) to achieve lower latency and higher throughput for faster packet processing in the user space. DPDK uses the SR-IOV network for hardware-based I/O sharing.

Prerequisites

  • Your cluster is configured to run DPDK workloads.

  • You have created and configured the project in which the VM will run.

Procedure

  1. Edit the VirtualMachine manifest to include information about the SR-IOV network interface, CPU topology, CRI-O annotations, and huge pages:

    Example VirtualMachine manifest

    1. apiVersion: kubevirt.io/v1
    2. kind: VirtualMachine
    3. metadata:
    4. name: rhel-dpdk-vm
    5. spec:
    6. running: true
    7. template:
    8. metadata:
    9. annotations:
    10. cpu-load-balancing.crio.io: disable (1)
    11. cpu-quota.crio.io: disable (2)
    12. irq-load-balancing.crio.io: disable (3)
    13. spec:
    14. domain:
    15. cpu:
    16. sockets: 1 (4)
    17. cores: 5 (5)
    18. threads: 2
    19. dedicatedCpuPlacement: true
    20. isolateEmulatorThread: true
    21. interfaces:
    22. - masquerade: {}
    23. name: default
    24. - model: virtio
    25. name: nic-east
    26. pciAddress: '0000:07:00.0'
    27. sriov: {}
    28. networkInterfaceMultiqueue: true
    29. rng: {}
    30. memory:
    31. hugepages:
    32. pageSize: 1Gi (6)
    33. guest: 8Gi
    34. networks:
    35. - name: default
    36. pod: {}
    37. - multus:
    38. networkName: dpdk-net (7)
    39. name: nic-east
    40. # ...
    1This annotation specifies that load balancing is disabled for CPUs that are used by the container.
    2This annotation specifies that the CPU quota is disabled for CPUs that are used by the container.
    3This annotation specifies that Interrupt Request (IRQ) load balancing is disabled for CPUs that are used by the container.
    4The number of sockets inside the VM. This field must be set to 1 for the CPUs to be scheduled from the same Non-Uniform Memory Access (NUMA) node.
    5The number of cores inside the VM. This must be a value greater than or equal to 1. In this example, the VM is scheduled with 5 hyper-threads or 10 CPUs.
    6The size of the huge pages. The possible values for x86-64 architecture are 1Gi and 2Mi. In this example, the request is for 8 huge pages of size 1Gi.
    7The name of the SR-IOV NetworkAttachmentDefinition object.
  2. Save and exit the editor.

  3. Apply the VirtualMachine manifest:

    1. $ oc apply -f <file_name>.yaml
  4. Configure the guest operating system. The following example shows the configuration steps for Fedora 8 OS:

    1. Configure huge pages by using the GRUB bootloader command-line interface. In the following example, 8 1G huge pages are specified.

      1. $ grubby --update-kernel=ALL --args="default_hugepagesz=1GB hugepagesz=1G hugepages=8"
    2. To achieve low-latency tuning by using the cpu-partitioning profile in the TuneD application, run the following commands:

      1. $ dnf install -y tuned-profiles-cpu-partitioning
      1. $ echo isolated_cores=2-9 > /etc/tuned/cpu-partitioning-variables.conf

      The first two CPUs (0 and 1) are set aside for house keeping tasks and the rest are isolated for the DPDK application.

      1. $ tuned-adm profile cpu-partitioning
    3. Override the SR-IOV NIC driver by using the driverctl device driver control utility:

      1. $ dnf install -y driverctl
      1. $ driverctl set-override 0000:07:00.0 vfio-pci
  5. Restart the VM to apply the changes.