Using virtual functions (VFs) with DPDK and RDMA modes
You can use Single Root I/O Virtualization (SR-IOV) network hardware with the Data Plane Development Kit (DPDK) and with remote direct memory access (RDMA).
The Data Plane Development Kit (DPDK) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/. |
Using a virtual function in DPDK mode with an Intel NIC
Prerequisites
Install the OpenShift CLI (
oc
).Install the SR-IOV Network Operator.
Log in as a user with
cluster-admin
privileges.
Procedure
Create the following
SriovNetworkNodePolicy
object, and then save the YAML in theintel-dpdk-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: intel-dpdk-node-policy
namespace: openshift-sriov-network-operator
spec:
resourceName: intelnics
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
priority: <priority>
numVfs: <num>
nicSelector:
vendor: "8086"
deviceID: "158b"
pfNames: ["<pf_name>", ...]
rootDevices: ["<pci_bus_id>", "..."]
deviceType: vfio-pci (1)
1 Specify the driver type for the virtual functions to vfio-pci
.Please refer to the
Configuring SR-IOV network devices
section for a detailed explanation on each option inSriovNetworkNodePolicy
.When applying the configuration specified in a
SriovNetworkNodePolicy
object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.After the configuration update is applied, all the pods in
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the
SriovNetworkNodePolicy
object by running the following command:$ oc create -f intel-dpdk-node-policy.yaml
Create the following
SriovNetwork
object, and then save the YAML in theintel-dpdk-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: intel-dpdk-network
namespace: openshift-sriov-network-operator
spec:
networkNamespace: <target_namespace>
ipam: "{}" (1)
vlan: <vlan>
resourceName: intelnics
1 Specify an empty object “{}”
for the ipam CNI plug-in. DPDK works in userspace mode and does not require an IP address.See the “Configuring SR-IOV additional network” section for a detailed explanation on each option in
SriovNetwork
.Create the
SriovNetwork
object by running the following command:$ oc create -f intel-dpdk-network.yaml
Create the following
Pod
spec, and then save the YAML in theintel-dpdk-pod.yaml
file.apiVersion: v1
kind: Pod
metadata:
name: dpdk-app
namespace: <target_namespace> (1)
annotations:
k8s.v1.cni.cncf.io/networks: intel-dpdk-network
spec:
containers:
- name: testpmd
image: <DPDK_image> (2)
securityContext:
runAsUser: 0
capabilities:
add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] (3)
volumeMounts:
- mountPath: /dev/hugepages (4)
name: hugepage
resources:
limits:
openshift.io/intelnics: "1" (5)
memory: "1Gi"
cpu: "4" (6)
hugepages-1Gi: "4Gi" (7)
requests:
openshift.io/intelnics: "1"
memory: "1Gi"
cpu: "4"
hugepages-1Gi: "4Gi"
command: ["sleep", "infinity"]
volumes:
- name: hugepage
emptyDir:
medium: HugePages
1 Specify the same target_namespace
where theSriovNetwork
objectintel-dpdk-network
is created. If you would like to create the pod in a different namespace, changetarget_namespace
in both thePod
spec and theSriovNetowrk
object.2 Specify the DPDK image which includes your application and the DPDK library used by application. 3 Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access. 4 Mount a hugepage volume to the DPDK pod under /dev/hugepages
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
.5 Optional: Specify the number of DPDK devices allocated to DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by the SR-IOV Operator. It is enabled by default and can be disabled by setting enableInjector
option tofalse
in the defaultSriovOperatorConfig
CR.6 Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to static
and creating a pod withGuaranteed
QoS.7 Specify hugepage size hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the DPDK pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes. For example, adding kernel argumentsdefault_hugepagesz=1GB
,hugepagesz=1G
andhugepages=16
will result in16*1Gi
hugepages be allocated during system boot.Create the DPDK pod by running the following command:
$ oc create -f intel-dpdk-pod.yaml
Using a virtual function in DPDK mode with a Mellanox NIC
Prerequisites
Install the OpenShift CLI (
oc
).Install the SR-IOV Network Operator.
Log in as a user with
cluster-admin
privileges.
Procedure
Create the following
SriovNetworkNodePolicy
object, and then save the YAML in themlx-dpdk-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: mlx-dpdk-node-policy
namespace: openshift-sriov-network-operator
spec:
resourceName: mlxnics
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
priority: <priority>
numVfs: <num>
nicSelector:
vendor: "15b3"
deviceID: "1015" (1)
pfNames: ["<pf_name>", ...]
rootDevices: ["<pci_bus_id>", "..."]
deviceType: netdevice (2)
isRdma: true (3)
1 Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are 1015
,1017
.2 Specify the driver type for the virtual functions to netdevice
. Mellanox SR-IOV VF can work in DPDK mode without using thevfio-pci
device type. VF device appears as a kernel network interface inside a container.3 Enable RDMA mode. This is required by Mellanox cards to work in DPDK mode. Please refer to
Configuring SR-IOV network devices
section for detailed explanation on each option inSriovNetworkNodePolicy
.When applying the configuration specified in a
SriovNetworkNodePolicy
object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.After the configuration update is applied, all the pods in the
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the
SriovNetworkNodePolicy
object by running the following command:$ oc create -f mlx-dpdk-node-policy.yaml
Create the following
SriovNetwork
object, and then save the YAML in themlx-dpdk-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: mlx-dpdk-network
namespace: openshift-sriov-network-operator
spec:
networkNamespace: <target_namespace>
ipam: |- (1)
...
vlan: <vlan>
resourceName: mlxnics
1 Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition. See the “Configuring SR-IOV additional network” section for a detailed explanation on each option in
SriovNetwork
.Create the
SriovNetworkNodePolicy
object by running the following command:$ oc create -f mlx-dpdk-network.yaml
Create the following
Pod
spec, and then save the YAML in themlx-dpdk-pod.yaml
file.apiVersion: v1
kind: Pod
metadata:
name: dpdk-app
namespace: <target_namespace> (1)
annotations:
k8s.v1.cni.cncf.io/networks: mlx-dpdk-network
spec:
containers:
- name: testpmd
image: <DPDK_image> (2)
securityContext:
runAsUser: 0
capabilities:
add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] (3)
volumeMounts:
- mountPath: /dev/hugepages (4)
name: hugepage
resources:
limits:
openshift.io/mlxnics: "1" (5)
memory: "1Gi"
cpu: "4" (6)
hugepages-1Gi: "4Gi" (7)
requests:
openshift.io/mlxnics: "1"
memory: "1Gi"
cpu: "4"
hugepages-1Gi: "4Gi"
command: ["sleep", "infinity"]
volumes:
- name: hugepage
emptyDir:
medium: HugePages
1 Specify the same target_namespace
whereSriovNetwork
objectmlx-dpdk-network
is created. If you would like to create the pod in a different namespace, changetarget_namespace
in bothPod
spec andSriovNetowrk
object.2 Specify the DPDK image which includes your application and the DPDK library used by application. 3 Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access. 4 Mount the hugepage volume to the DPDK pod under /dev/hugepages
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
.5 Optional: Specify the number of DPDK devices allocated to the DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the enableInjector
option tofalse
in the defaultSriovOperatorConfig
CR.6 Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs be allocated from kubelet. This is achieved by setting CPU Manager policy to static
and creating a pod withGuaranteed
QoS.7 Specify hugepage size hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to DPDK pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes.Create the DPDK pod by running the following command:
$ oc create -f mlx-dpdk-pod.yaml
Using a virtual function in RDMA mode with a Mellanox NIC
RDMA over Converged Ethernet (RoCE) is the only supported mode when using RDMA on OKD.
Prerequisites
Install the OpenShift CLI (
oc
).Install the SR-IOV Network Operator.
Log in as a user with
cluster-admin
privileges.
Procedure
Create the following
SriovNetworkNodePolicy
object, and then save the YAML in themlx-rdma-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: mlx-rdma-node-policy
namespace: openshift-sriov-network-operator
spec:
resourceName: mlxnics
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
priority: <priority>
numVfs: <num>
nicSelector:
vendor: "15b3"
deviceID: "1015" (1)
pfNames: ["<pf_name>", ...]
rootDevices: ["<pci_bus_id>", "..."]
deviceType: netdevice (2)
isRdma: true (3)
1 Specify the device hex code of SR-IOV network device. The only allowed values for Mellanox cards are 1015
,1017
.2 Specify the driver type for the virtual functions to netdevice
.3 Enable RDMA mode. Please refer to the
Configuring SR-IOV network devices
section for a detailed explanation on each option inSriovNetworkNodePolicy
.When applying the configuration specified in a
SriovNetworkNodePolicy
object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.After the configuration update is applied, all the pods in the
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the
SriovNetworkNodePolicy
object by running the following command:$ oc create -f mlx-rdma-node-policy.yaml
Create the following
SriovNetwork
object, and then save the YAML in themlx-rdma-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: mlx-rdma-network
namespace: openshift-sriov-network-operator
spec:
networkNamespace: <target_namespace>
ipam: |- (1)
...
vlan: <vlan>
resourceName: mlxnics
1 Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition. See the “Configuring SR-IOV additional network” section for a detailed explanation on each option in
SriovNetwork
.Create the
SriovNetworkNodePolicy
object by running the following command:$ oc create -f mlx-rdma-network.yaml
Create the following
Pod
spec, and then save the YAML in themlx-rdma-pod.yaml
file.apiVersion: v1
kind: Pod
metadata:
name: rdma-app
namespace: <target_namespace> (1)
annotations:
k8s.v1.cni.cncf.io/networks: mlx-rdma-network
spec:
containers:
- name: testpmd
image: <RDMA_image> (2)
securityContext:
runAsUser: 0
capabilities:
add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] (3)
volumeMounts:
- mountPath: /dev/hugepages (4)
name: hugepage
resources:
limits:
memory: "1Gi"
cpu: "4" (5)
hugepages-1Gi: "4Gi" (6)
requests:
memory: "1Gi"
cpu: "4"
hugepages-1Gi: "4Gi"
command: ["sleep", "infinity"]
volumes:
- name: hugepage
emptyDir:
medium: HugePages
1 Specify the same target_namespace
whereSriovNetwork
objectmlx-rdma-network
is created. If you would like to create the pod in a different namespace, changetarget_namespace
in bothPod
spec andSriovNetowrk
object.2 Specify the RDMA image which includes your application and RDMA library used by application. 3 Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access. 4 Mount the hugepage volume to RDMA pod under /dev/hugepages
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
.5 Specify number of CPUs. The RDMA pod usually requires exclusive CPUs be allocated from the kubelet. This is achieved by setting CPU Manager policy to static
and create pod withGuaranteed
QoS.6 Specify hugepage size hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the RDMA pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes.Create the RDMA pod by running the following command:
$ oc create -f mlx-rdma-pod.yaml