Connecting a virtual machine to an SR-IOV network

You can connect a virtual machine (VM) to a Single Root I/O Virtualization (SR-IOV) network by performing the following steps:

  1. Configure an SR-IOV network device.

  2. Configure an SR-IOV network.

  3. Connect the VM to the SR-IOV network.

Prerequisites

Configuring SR-IOV network devices

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io CustomResourceDefinition to OKD. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.

It might take several minutes for a configuration change to apply.

Prerequisites

  • You installed the OpenShift CLI (oc).

  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the SR-IOV Network Operator.

  • You have enough available nodes in your cluster to handle the evicted workload from drained nodes.

  • You have not selected any control plane nodes for SR-IOV network device configuration.

Procedure

  1. Create an SriovNetworkNodePolicy object, and then save the YAML in the <name>-sriov-node-network.yaml file. Replace <name> with the name for this configuration.

    1. apiVersion: sriovnetwork.openshift.io/v1
    2. kind: SriovNetworkNodePolicy
    3. metadata:
    4. name: <name> (1)
    5. namespace: openshift-sriov-network-operator (2)
    6. spec:
    7. resourceName: <sriov_resource_name> (3)
    8. nodeSelector:
    9. feature.node.kubernetes.io/network-sriov.capable: "true" (4)
    10. priority: <priority> (5)
    11. mtu: <mtu> (6)
    12. numVfs: <num> (7)
    13. nicSelector: (8)
    14. vendor: "<vendor_code>" (9)
    15. deviceID: "<device_id>" (10)
    16. pfNames: ["<pf_name>", ...] (11)
    17. rootDevices: ["<pci_bus_id>", "..."] (12)
    18. deviceType: vfio-pci (13)
    19. isRdma: false (14)
    1Specify a name for the CR object.
    2Specify the namespace where the SR-IOV Operator is installed.
    3Specify the resource name of the SR-IOV device plugin. You can create multiple SriovNetworkNodePolicy objects for a resource name.
    4Specify the node selector to select which nodes are configured. Only SR-IOV network devices on selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed only on selected nodes.
    5Optional: Specify an integer value between 0 and 99. A smaller number gets higher priority, so a priority of 10 is higher than a priority of 99. The default value is 99.
    6Optional: Specify a value for the maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different NIC models.
    7Specify the number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
    8The nicSelector mapping selects the Ethernet device for the Operator to configure. You do not need to specify values for all the parameters. It is recommended to identify the Ethernet adapter with enough precision to minimize the possibility of selecting an Ethernet device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they point to an identical device.
    9Optional: Specify the vendor hex code of the SR-IOV network device. The only allowed values are either 8086 or 15b3.
    10Optional: Specify the device hex code of SR-IOV network device. The only allowed values are 158b, 1015, 1017.
    11Optional: The parameter accepts an array of one or more physical function (PF) names for the Ethernet device.
    12The parameter accepts an array of one or more PCI bus addresses for the physical function of the Ethernet device. Provide the address in the following format: 0000:02:00.1.
    13The vfio-pci driver type is required for virtual functions in OKD Virtualization.
    14Optional: Specify whether to enable remote direct memory access (RDMA) mode. For a Mellanox card, set isRdma to false. The default value is false.

    If isRDMA flag is set to true, you can continue to use the RDMA enabled VF as a normal network device. A device can be used in either mode.

  2. Optional: Label the SR-IOV capable cluster nodes with SriovNetworkNodePolicy.Spec.NodeSelector if they are not already labeled. For more information about labeling nodes, see “Understanding how to update labels on nodes”.

  3. Create the SriovNetworkNodePolicy object:

    1. $ oc create -f <name>-sriov-node-network.yaml

    where <name> specifies the name for this configuration.

    After applying the configuration update, all the pods in sriov-network-operator namespace transition to the Running status.

  4. To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.

    1. $ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'

Configuring SR-IOV additional network

You can configure an additional network that uses SR-IOV hardware by creating an SriovNetwork object.

When you create an SriovNetwork object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition object.

Do not modify or delete an SriovNetwork object if it is attached to pods or virtual machines in a running state.

Prerequisites

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create the following SriovNetwork object, and then save the YAML in the <name>-sriov-network.yaml file. Replace <name> with a name for this additional network.
  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovNetwork
  3. metadata:
  4. name: <name> (1)
  5. namespace: openshift-sriov-network-operator (2)
  6. spec:
  7. resourceName: <sriov_resource_name> (3)
  8. networkNamespace: <target_namespace> (4)
  9. vlan: <vlan> (5)
  10. spoofChk: "<spoof_check>" (6)
  11. linkState: <link_state> (7)
  12. maxTxRate: <max_tx_rate> (8)
  13. minTxRate: <min_rx_rate> (9)
  14. vlanQoS: <vlan_qos> (10)
  15. trust: "<trust_vf>" (11)
  16. capabilities: <capabilities> (12)
1Replace <name> with a name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
2Specify the namespace where the SR-IOV Network Operator is installed.
3Replace <sriov_resource_name> with the value for the .spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
4Replace <target_namespace> with the target namespace for the SriovNetwork. Only pods or virtual machines in the target namespace can attach to the SriovNetwork.
5Optional: Replace <vlan> with a Virtual LAN (VLAN) ID for the additional network. The integer value must be from 0 to 4095. The default value is 0.
6Optional: Replace <spoof_check> with the spoof check mode of the VF. The allowed values are the strings “on” and “off”.

You must enclose the value you specify in quotes or the CR is rejected by the SR-IOV Network Operator.

7Optional: Replace <link_state> with the link state of virtual function (VF). Allowed value are enable, disable and auto.
8Optional: Replace <max_tx_rate> with a maximum transmission rate, in Mbps, for the VF.
9Optional: Replace <min_tx_rate> with a minimum transmission rate, in Mbps, for the VF. This value should always be less than or equal to Maximum transmission rate.

Intel NICs do not support the minTxRate parameter. For more information, see BZ#1772847.

10Optional: Replace <vlan_qos> with an IEEE 802.1p priority level for the VF. The default value is 0.
11Optional: Replace <trust_vf> with the trust mode of the VF. The allowed values are the strings “on” and “off”.

You must enclose the value you specify in quotes or the CR is rejected by the SR-IOV Network Operator.

12Optional: Replace <capabilities> with the capabilities to configure for this network.
  1. To create the object, enter the following command. Replace <name> with a name for this additional network.

    1. $ oc create -f <name>-sriov-network.yaml
  2. Optional: To confirm that the NetworkAttachmentDefinition object associated with the SriovNetwork object that you created in the previous step exists, enter the following command. Replace <namespace> with the namespace you specified in the SriovNetwork object.

    1. $ oc get net-attach-def -n <namespace>

Connecting a virtual machine to an SR-IOV network

You can connect the virtual machine (VM) to the SR-IOV network by including the network details in the VM configuration.

Procedure

  1. Include the SR-IOV network details in the spec.domain.devices.interfaces and spec.networks of the VM configuration:

    1. kind: VirtualMachine
    2. # ...
    3. spec:
    4. domain:
    5. devices:
    6. interfaces:
    7. - name: <default> (1)
    8. masquerade: {} (2)
    9. - name: <nic1> (3)
    10. sriov: {}
    11. networks:
    12. - name: <default> (4)
    13. pod: {}
    14. - name: <nic1> (5)
    15. multus:
    16. networkName: <sriov-network> (6)
    17. # ...
    1A unique name for the interface that is connected to the pod network.
    2The masquerade binding to the default pod network.
    3A unique name for the SR-IOV interface.
    4The name of the pod network interface. This must be the same as the interfaces.name that you defined earlier.
    5The name of the SR-IOV interface. This must be the same as the interfaces.name that you defined earlier.
    6The name of the SR-IOV network attachment definition.
  2. Apply the virtual machine configuration:

    1. $ oc apply -f <vm-sriov.yaml> (1)
    1The name of the virtual machine YAML file.

Configuring a cluster for DPDK workloads

You can use the following procedure to configure an OKD cluster to run Data Plane Development Kit (DPDK) workloads.

Configuring a cluster for DPDK workloads is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Prerequisites

  • You have access to the cluster as a user with cluster-admin permissions.

  • You have installed the OpenShift CLI (oc).

  • You have installed the SR-IOV Network Operator.

  • You have installed the Node Tuning Operator.

Procedure

  1. Map your compute nodes topology to determine which Non-Uniform Memory Access (NUMA) CPUs are isolated for DPDK applications and which ones are reserved for the operating system (OS).

  2. Label a subset of the compute nodes with a custom role; for example, worker-dpdk:

    1. $ oc label node <node_name> node-role.kubernetes.io/worker-dpdk=""
  3. Create a new MachineConfigPool manifest that contains the worker-dpdk label in the spec.machineConfigSelector object:

    Example MachineConfigPool manifest

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfigPool
    3. metadata:
    4. name: worker-dpdk
    5. labels:
    6. machineconfiguration.openshift.io/role: worker-dpdk
    7. spec:
    8. machineConfigSelector:
    9. matchExpressions:
    10. - key: machineconfiguration.openshift.io/role
    11. operator: In
    12. values:
    13. - worker
    14. - worker-dpdk
    15. nodeSelector:
    16. matchLabels:
    17. node-role.kubernetes.io/worker-dpdk: ""
  4. Create a PerformanceProfile manifest that applies to the labeled nodes and the machine config pool that you created in the previous steps. The performance profile specifies the CPUs that are isolated for DPDK applications and the CPUs that are reserved for house keeping.

    Example PerformanceProfile manifest

    1. apiVersion: performance.openshift.io/v2
    2. kind: PerformanceProfile
    3. metadata:
    4. name: profile-1
    5. spec:
    6. cpu:
    7. isolated: 4-39,44-79
    8. reserved: 0-3,40-43
    9. hugepages:
    10. defaultHugepagesSize: 1G
    11. pages:
    12. - count: 32
    13. node: 0
    14. size: 1G
    15. net:
    16. userLevelNetworking: true
    17. nodeSelector:
    18. node-role.kubernetes.io/worker-dpdk: ""
    19. numa:
    20. topologyPolicy: single-numa-node

    The compute nodes automatically restart after you apply the MachineConfigPool and PerformanceProfile manifests.

  5. Retrieve the name of the generated RuntimeClass resource from the status.runtimeClass field of the PerformanceProfile object:

    1. $ oc get performanceprofiles.performance.openshift.io profile-1 -o=jsonpath='{.status.runtimeClass}{"\n"}'
  6. Set the previously obtained RuntimeClass name as the default container runtime class for the virt-launcher pods by adding the following annotation to the HyperConverged custom resource (CR):

    1. $ oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged \
    2. kubevirt.kubevirt.io/jsonpatch='[{"op": "add", "path": "/spec/configuration/defaultRuntimeClass", "value": <runtimeclass_name>}]'

    Adding the annotation to the HyperConverged CR changes a global setting that affects all VMs that are created after the annotation is applied. Setting this annotation breaches support of the OKD Virtualization instance and must be used only on test clusters. For best performance, apply for a support exception.

  7. Create an SriovNetworkNodePolicy object with the spec.deviceType field set to vfio-pci:

    Example SriovNetworkNodePolicy manifest

    1. apiVersion: sriovnetwork.openshift.io/v1
    2. kind: SriovNetworkNodePolicy
    3. metadata:
    4. name: policy-1
    5. namespace: openshift-sriov-network-operator
    6. spec:
    7. resourceName: intel_nics_dpdk
    8. deviceType: vfio-pci
    9. mtu: 9000
    10. numVfs: 4
    11. priority: 99
    12. nicSelector:
    13. vendor: "8086"
    14. deviceID: "1572"
    15. pfNames:
    16. - eno3
    17. rootDevices:
    18. - "0000:19:00.2"
    19. nodeSelector:
    20. feature.node.kubernetes.io/network-sriov.capable: "true"

Additional resources

Configuring a project for DPDK workloads

You can configure the project to run DPDK workloads on SR-IOV hardware.

Prerequisites

  • Your cluster is configured to run DPDK workloads.

Procedure

  1. Create a namespace for your DPDK applications:

    1. $ oc create ns dpdk-checkup-ns
  2. Create an SriovNetwork object that references the SriovNetworkNodePolicy object. When you create an SriovNetwork object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition object.

    Example SriovNetwork manifest

    1. apiVersion: sriovnetwork.openshift.io/v1
    2. kind: SriovNetwork
    3. metadata:
    4. name: dpdk-sriovnetwork
    5. namespace: openshift-sriov-network-operator
    6. spec:
    7. ipam: |
    8. {
    9. "type": "host-local",
    10. "subnet": "10.56.217.0/24",
    11. "rangeStart": "10.56.217.171",
    12. "rangeEnd": "10.56.217.181",
    13. "routes": [{
    14. "dst": "0.0.0.0/0"
    15. }],
    16. "gateway": "10.56.217.1"
    17. }
    18. networkNamespace: dpdk-checkup-ns (1)
    19. resourceName: intel_nics_dpdk (2)
    20. spoofChk: "off"
    21. trust: "on"
    22. vlan: 1019
    1The namespace where the NetworkAttachmentDefinition object is deployed.
    2The value of the spec.resourceName attribute of the SriovNetworkNodePolicy object that was created when configuring the cluster for DPDK workloads.
  3. Optional: Run the virtual machine latency checkup to verify that the network is properly configured.

  4. Optional: Run the DPDK checkup to verify that the namespace is ready for DPDK workloads.

Additional resources

Configuring a virtual machine for DPDK workloads

You can run Data Packet Development Kit (DPDK) workloads on virtual machines (VMs) to achieve lower latency and higher throughput for faster packet processing in the user space. DPDK uses the SR-IOV network for hardware-based I/O sharing.

Prerequisites

  • Your cluster is configured to run DPDK workloads.

  • You have created and configured the project in which the VM will run.

Procedure

  1. Edit the VirtualMachine manifest to include information about the SR-IOV network interface, CPU topology, CRI-O annotations, and huge pages:

    Example VirtualMachine manifest

    1. apiVersion: kubevirt.io/v1
    2. kind: VirtualMachine
    3. metadata:
    4. name: rhel-dpdk-vm
    5. spec:
    6. running: true
    7. template:
    8. metadata:
    9. annotations:
    10. cpu-load-balancing.crio.io: disable (1)
    11. cpu-quota.crio.io: disable (2)
    12. irq-load-balancing.crio.io: disable (3)
    13. spec:
    14. nodeSelector:
    15. node-role.kubernetes.io/worker-dpdk: "" (4)
    16. domain:
    17. cpu:
    18. sockets: 1 (5)
    19. cores: 5 (6)
    20. threads: 2
    21. dedicatedCpuPlacement: true
    22. isolateEmulatorThread: true
    23. interfaces:
    24. - masquerade: {}
    25. name: default
    26. - model: virtio
    27. name: nic-east
    28. pciAddress: '0000:07:00.0'
    29. sriov: {}
    30. networkInterfaceMultiqueue: true
    31. rng: {}
    32. memory:
    33. hugepages:
    34. pageSize: 1Gi (7)
    35. resources:
    36. requests:
    37. memory: 8Gi
    38. networks:
    39. - name: default
    40. pod: {}
    41. - multus:
    42. networkName: dpdk-net (8)
    43. name: nic-east
    44. # ...
    1This annotation specifies that load balancing is disabled for CPUs that are used by the container.
    2This annotation specifies that the CPU quota is disabled for CPUs that are used by the container.
    3This annotation specifies that Interrupt Request (IRQ) load balancing is disabled for CPUs that are used by the container.
    4The label that is used in the MachineConfigPool and PerformanceProfile manifests that were created when configuring the cluster for DPDK workloads.
    5The number of sockets inside the VM. This field must be set to 1 for the CPUs to be scheduled from the same Non-Uniform Memory Access (NUMA) node.
    6The number of cores inside the VM. This must be a value greater than or equal to 1. In this example, the VM is scheduled with 5 hyper-threads or 10 CPUs.
    7The size of the huge pages. The possible values for x86-64 architecture are 1Gi and 2Mi. In this example, the request is for 8 huge pages of size 1Gi.
    8The name of the SR-IOV NetworkAttachmentDefinition object.
  2. Save and exit the editor.

  3. Apply the VirtualMachine manifest:

    1. $ oc apply -f <file_name>.yaml
  4. Configure the guest operating system. The following example shows the configuration steps for Fedora 8 OS:

    1. Configure isolated VM CPUs and specify huge pages by using the GRUB bootloader command-line interface. In the following example, 8 1G huge pages are specified. The first two CPUs (0 and 1) are set aside for house keeping tasks and the rest are isolated for the DPDK application.

      1. $ grubby --update-kernel=ALL --args="default_hugepagesz=1GB hugepagesz=1G hugepages=8 isolcpus=2-9"
    2. To achieve low-latency tuning by using the cpu-partitioning profile in the TuneD application, run the following commands:

      1. $ dnf install -y tuned-profiles-cpu-partitioning
      1. $ echo isolated_cores=2-9 > /etc/tuned/cpu-partitioning-variables.conf
      1. $ tuned-adm profile cpu-partitioning
    3. Override the SR-IOV NIC driver by using the driverctl device driver control utility:

      1. $ dnf install -y driverctl
      1. $ driverctl set-override 0000:07:00.0 vfio-pci
  5. Restart the VM to apply the changes.

Next steps