Adding a pod to an SR-IOV additional network

You can add a pod to an existing Single Root I/O Virtualization (SR-IOV) network.

Runtime configuration for a network attachment

When attaching a pod to an additional network, you can specify a runtime configuration to make specific customizations for the pod. For example, you can request a specific MAC hardware address.

You specify the runtime configuration by setting an annotation in the pod specification. The annotation key is k8s.v1.cni.cncf.io/networks, and it accepts a JSON object that describes the runtime configuration.

Runtime configuration for an Ethernet-based SR-IOV attachment

The following JSON describes the runtime configuration options for an Ethernet-based SR-IOV network attachment.

  1. [
  2. {
  3. "name": "<name>", (1)
  4. "mac": "<mac_address>", (2)
  5. "ips": ["<cidr_range>"] (3)
  6. }
  7. ]
1The name of the SR-IOV network attachment definition CR.
2Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { “mac”: true } in the SriovNetwork object.
3Optional: IP addresses for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { “ips”: true } in the SriovNetwork object.

Example runtime configuration

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: sample-pod
  5. annotations:
  6. k8s.v1.cni.cncf.io/networks: |-
  7. [
  8. {
  9. "name": "net1",
  10. "mac": "20:04:0f:f1:88:01",
  11. "ips": ["192.168.10.1/24", "2001::1/64"]
  12. }
  13. ]
  14. spec:
  15. containers:
  16. - name: sample-container
  17. image: <image>
  18. imagePullPolicy: IfNotPresent
  19. command: ["sleep", "infinity"]

Runtime configuration for an InfiniBand-based SR-IOV attachment

The following JSON describes the runtime configuration options for an InfiniBand-based SR-IOV network attachment.

  1. [
  2. {
  3. "name": "<network_attachment>", (1)
  4. "infiniband-guid": "<guid>", (2)
  5. "ips": ["<cidr_range>"] (3)
  6. }
  7. ]
1The name of the SR-IOV network attachment definition CR.
2The InfiniBand GUID for the SR-IOV device. To use this feature, you also must specify { “infinibandGUID”: true } in the SriovIBNetwork object.
3The IP addresses for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { “ips”: true } in the SriovIBNetwork object.

Example runtime configuration

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: sample-pod
  5. annotations:
  6. k8s.v1.cni.cncf.io/networks: |-
  7. [
  8. {
  9. "name": "ib1",
  10. "infiniband-guid": "c2:11:22:33:44:55:66:77",
  11. "ips": ["192.168.10.1/24", "2001::1/64"]
  12. }
  13. ]
  14. spec:
  15. containers:
  16. - name: sample-container
  17. image: <image>
  18. imagePullPolicy: IfNotPresent
  19. command: ["sleep", "infinity"]

Adding a pod to an additional network

You can add a pod to an additional network. The pod continues to send normal cluster-related network traffic over the default network.

When a pod is created additional networks are attached to it. However, if a pod already exists, you cannot attach additional networks to it.

The pod must be in the same namespace as the additional network.

The SR-IOV Network Resource Injector adds the resource field to the first container in a pod automatically.

If you are using an Intel network interface controller (NIC) in Data Plane Development Kit (DPDK) mode, only the first container in your pod is configured to access the NIC. Your SR-IOV additional network is configured for DPDK mode if the deviceType is set to vfio-pci in the SriovNetworkNodePolicy object.

You can work around this issue by either ensuring that the container that needs access to the NIC is the first container defined in the Pod object or by disabling the Network Resource Injector. For more information, see BZ#1990953.

Prerequisites

  • Install the OpenShift CLI (oc).

  • Log in to the cluster.

  • Install the SR-IOV Operator.

  • Create either an SriovNetwork object or an SriovIBNetwork object to attach the pod to.

Procedure

  1. Add an annotation to the Pod object. Only one of the following annotation formats can be used:

    1. To attach an additional network without any customization, add an annotation with the following format. Replace <network> with the name of the additional network to associate with the pod:

      1. metadata:
      2. annotations:
      3. k8s.v1.cni.cncf.io/networks: <network>[,<network>,...] (1)
      1To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that pod will have multiple network interfaces attached to that network.
    2. To attach an additional network with customizations, add an annotation with the following format:

      1. metadata:
      2. annotations:
      3. k8s.v1.cni.cncf.io/networks: |-
      4. [
      5. {
      6. "name": "<network>", (1)
      7. "namespace": "<namespace>", (2)
      8. "default-route": ["<default-route>"] (3)
      9. }
      10. ]
      1Specify the name of the additional network defined by a NetworkAttachmentDefinition object.
      2Specify the namespace where the NetworkAttachmentDefinition object is defined.
      3Optional: Specify an override for the default route, such as 192.168.17.1.
  2. To create the pod, enter the following command. Replace <name> with the name of the pod.

    1. $ oc create -f <name>.yaml
  3. Optional: To Confirm that the annotation exists in the Pod CR, enter the following command, replacing <name> with the name of the pod.

    1. $ oc get pod <name> -o yaml

    In the following example, the example-pod pod is attached to the net1 additional network:

    1. $ oc get pod example-pod -o yaml
    2. apiVersion: v1
    3. kind: Pod
    4. metadata:
    5. annotations:
    6. k8s.v1.cni.cncf.io/networks: macvlan-bridge
    7. k8s.v1.cni.cncf.io/network-status: |- (1)
    8. [{
    9. "name": "openshift-sdn",
    10. "interface": "eth0",
    11. "ips": [
    12. "10.128.2.14"
    13. ],
    14. "default": true,
    15. "dns": {}
    16. },{
    17. "name": "macvlan-bridge",
    18. "interface": "net1",
    19. "ips": [
    20. "20.2.2.100"
    21. ],
    22. "mac": "22:2f:60:a5:f8:00",
    23. "dns": {}
    24. }]
    25. name: example-pod
    26. namespace: default
    27. spec:
    28. ...
    29. status:
    30. ...
    1The k8s.v1.cni.cncf.io/network-status parameter is a JSON array of objects. Each object describes the status of an additional network attached to the pod. The annotation value is stored as a plain text value.

Creating a non-uniform memory access (NUMA) aligned SR-IOV pod

You can create a NUMA aligned SR-IOV pod by restricting SR-IOV and the CPU resources allocated from the same NUMA node with restricted or single-numa-node Topology Manager polices.

Prerequisites

  • You have installed the OpenShift CLI (oc).

  • You have configured the CPU Manager policy to static. For more information on CPU Manager, see the “Additional resources” section.

  • You have configured the Topology Manager policy to single-numa-node.

    When single-numa-node is unable to satisfy the request, you can configure the Topology Manager policy to restricted. For more flexible SR-IOV network resource scheduling, see Excluding SR-IOV network topology during NUMA-aware scheduling in the Additional resources section.

Procedure

  1. Create the following SR-IOV pod spec, and then save the YAML in the <name>-sriov-pod.yaml file. Replace <name> with a name for this pod.

    The following example shows an SR-IOV pod spec:

    1. apiVersion: v1
    2. kind: Pod
    3. metadata:
    4. name: sample-pod
    5. annotations:
    6. k8s.v1.cni.cncf.io/networks: <name> (1)
    7. spec:
    8. containers:
    9. - name: sample-container
    10. image: <image> (2)
    11. command: ["sleep", "infinity"]
    12. resources:
    13. limits:
    14. memory: "1Gi" (3)
    15. cpu: "2" (4)
    16. requests:
    17. memory: "1Gi"
    18. cpu: "2"
    1Replace <name> with the name of the SR-IOV network attachment definition CR.
    2Replace <image> with the name of the sample-pod image.
    3To create the SR-IOV pod with guaranteed QoS, set memory limits equal to memory requests.
    4To create the SR-IOV pod with guaranteed QoS, set cpu limits equals to cpu requests.
  2. Create the sample SR-IOV pod by running the following command:

    1. $ oc create -f <filename> (1)
    1Replace <filename> with the name of the file you created in the previous step.
  3. Confirm that the sample-pod is configured with guaranteed QoS.

    1. $ oc describe pod sample-pod
  4. Confirm that the sample-pod is allocated with exclusive CPUs.

    1. $ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
  5. Confirm that the SR-IOV device and CPUs that are allocated for the sample-pod are on the same NUMA node.

    1. $ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus

A test pod template for clusters that use SR-IOV on OpenStack

The following testpmd pod demonstrates container creation with huge pages, reserved CPUs, and the SR-IOV port.

An example testpmd pod

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: testpmd-sriov
  5. namespace: mynamespace
  6. annotations:
  7. cpu-load-balancing.crio.io: "disable"
  8. cpu-quota.crio.io: "disable"
  9. # ...
  10. spec:
  11. containers:
  12. - name: testpmd
  13. command: ["sleep", "99999"]
  14. image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
  15. securityContext:
  16. capabilities:
  17. add: ["IPC_LOCK","SYS_ADMIN"]
  18. privileged: true
  19. runAsUser: 0
  20. resources:
  21. requests:
  22. memory: 1000Mi
  23. hugepages-1Gi: 1Gi
  24. cpu: '2'
  25. openshift.io/sriov1: 1
  26. limits:
  27. hugepages-1Gi: 1Gi
  28. cpu: '2'
  29. memory: 1000Mi
  30. openshift.io/sriov1: 1
  31. volumeMounts:
  32. - mountPath: /dev/hugepages
  33. name: hugepage
  34. readOnly: False
  35. runtimeClassName: performance-cnf-performanceprofile (1)
  36. volumes:
  37. - name: hugepage
  38. emptyDir:
  39. medium: HugePages
1This example assumes that the name of the performance profile is cnf-performance profile.

Additional resources