Configuring interface-level network sysctl settings for SR-IOV networks
As a cluster administrator, you can modify interface-level network sysctls using the tuning Container Network Interface (CNI) meta plugin for a pod connected to a SR-IOV network device.
Labeling nodes with an SR-IOV enabled NIC
If you want to enable SR-IOV on only SR-IOV capable nodes there are a couple of ways to do this:
Install the Node Feature Discovery (NFD) Operator. NFD detects the presence of SR-IOV enabled NICs and labels the nodes with
node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = true
.Examine the
SriovNetworkNodeState
CR for each node. Theinterfaces
stanza includes a list of all of the SR-IOV devices discovered by the SR-IOV Network Operator on the worker node. Label each node withfeature.node.kubernetes.io/network-sriov.capable: "true"
by using the following command:$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
You can label the nodes with whatever name you want.
Setting one sysctl flag
You can set interface-level network sysctl
settings for a pod connected to a SR-IOV network device.
In this example, net.ipv4.conf.IFNAME.accept_redirects
is set to 1
on the created virtual interfaces.
The sysctl-tuning-test
is a namespace used in this example.
Use the following command to create the
sysctl-tuning-test
namespace:$ oc create namespace sysctl-tuning-test
Setting one sysctl flag on nodes with SR-IOV network devices
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
custom resource definition (CRD) to OKD. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy
custom resource (CR).
When applying the configuration specified in a It can take several minutes for a configuration change to apply. |
Follow this procedure to create a SriovNetworkNodePolicy
custom resource (CR).
Procedure
Create an
SriovNetworkNodePolicy
custom resource (CR). For example, save the following YAML as the filepolicyoneflag-sriov-node-network.yaml
:apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policyoneflag (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyoneflag (3)
nodeSelector: (4)
feature.node.kubernetes.io/network-sriov.capable="true"
priority: 10 (5)
numVfs: 5 (6)
nicSelector: (7)
pfNames: ["ens5"] (8)
deviceType: "netdevice" (9)
isRdma: false (10)
1 The name for the custom resource object. 2 The namespace where the SR-IOV Network Operator is installed. 3 The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name. 4 The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only. 5 Optional: The priority is an integer value between 0
and99
. A smaller value receives higher priority. For example, a priority of10
is a higher priority than99
. The default value is99
.6 The number of the virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128
.7 The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify rootDevices
, you must also specify a value forvendor
,deviceID
, orpfNames
. If you specify bothpfNames
androotDevices
at the same time, ensure that they refer to the same device. If you specify a value fornetFilter
, then you do not need to specify any other parameter because a network ID is unique.8 Optional: An array of one or more physical function (PF) names for the device. 9 Optional: The driver type for the virtual functions. The only allowed value is netdevice
. For a Mellanox NIC to work in DPDK mode on bare metal nodes, setisRdma
totrue
.10 Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false
. If theisRdma
parameter is set totrue
, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. SetisRdma
totrue
and additionally setneedVhostNet
totrue
to configure a Mellanox NIC for use with Fast Datapath DPDK applications.The
vfio-pci
driver type is not supported.Create the
SriovNetworkNodePolicy
object:$ oc create -f policyoneflag-sriov-node-network.yaml
After applying the configuration update, all the pods in
sriov-network-operator
namespace change to theRunning
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Example output
Succeeded
Configuring sysctl on a SR-IOV network
You can set interface specific sysctl
settings on virtual interfaces created by SR-IOV by adding the tuning configuration to the optional metaPlugins
parameter of the SriovNetwork
resource.
The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition
custom resource (CR) automatically.
Do not edit |
To change the interface-level network net.ipv4.conf.IFNAME.accept_redirects
sysctl
settings, create an additional SR-IOV network with the Container Network Interface (CNI) tuning plugin.
Prerequisites
Install the OKD CLI (oc).
Log in to the OKD cluster as a user with cluster-admin privileges.
Procedure
Create the
SriovNetwork
custom resource (CR) for the additional SR-IOV network attachment and insert themetaPlugins
configuration, as in the following example CR. Save the YAML as the filesriov-network-interface-sysctl.yaml
.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: onevalidflag (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyoneflag (3)
networkNamespace: sysctl-tuning-test (4)
ipam: '{ "type": "static" }' (5)
capabilities: '{ "mac": true, "ips": true }' (6)
metaPlugins : | (7)
{
"type": "tuning",
"capabilities":{
"mac":true
},
"sysctl":{
"net.ipv4.conf.IFNAME.accept_redirects": "1"
}
}
1 A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name. 2 The namespace where the SR-IOV Network Operator is installed. 3 The value for the spec.resourceName
parameter from theSriovNetworkNodePolicy
object that defines the SR-IOV hardware for this additional network.4 The target namespace for the SriovNetwork
object. Only pods in the target namespace can attach to the additional network.5 A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition. 6 Optional: Set capabilities for the additional network. You can specify “{ “ips”: true }”
to enable IP address support or“{ “mac”: true }”
to enable MAC address support.7 Optional: The metaPlugins parameter is used to add additional capabilities to the device. In this use case set the type
field totuning
. Specify the interface-level networksysctl
you want to set in thesysctl
field.Create the
SriovNetwork
resource:$ oc create -f sriov-network-interface-sysctl.yaml
Verifying that the NetworkAttachmentDefinition
CR is successfully created
Confirm that the SR-IOV Network Operator created the
NetworkAttachmentDefinition
CR by running the following command:$ oc get network-attachment-definitions -n <namespace> (1)
1 Replace <namespace>
with the value fornetworkNamespace
that you specified in theSriovNetwork
object. For example,sysctl-tuning-test
.Example output
NAME AGE
onevalidflag 14m
There might be a delay before the SR-IOV Network Operator creates the CR.
Verifying that the additional SR-IOV network attachment is successful
To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
Create a
Pod
CR. Save the following YAML as the fileexamplepod.yaml
:apiVersion: v1
kind: Pod
metadata:
name: tunepod
namespace: sysctl-tuning-test
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{
"name": "onevalidflag", (1)
"mac": "0a:56:0a:83:04:0c", (2)
"ips": ["10.100.100.200/24"] (3)
}
]
spec:
containers:
- name: podexample
image: centos
command: ["/bin/bash", "-c", "sleep INF"]
securityContext:
runAsUser: 2000
runAsGroup: 3000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
1 The name of the SR-IOV network attachment definition CR. 2 Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { “mac”: true }
in the SriovNetwork object.3 Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { “ips”: true }
in theSriovNetwork
object.Create the
Pod
CR:$ oc apply -f examplepod.yaml
Verify that the pod is created by running the following command:
$ oc get pod -n sysctl-tuning-test
Example output
NAME READY STATUS RESTARTS AGE
tunepod 1/1 Running 0 47s
Log in to the pod by running the following command:
$ oc rsh -n sysctl-tuning-test tunepod
Verify the values of the configured sysctl flag. Find the value
net.ipv4.conf.IFNAME.accept_redirects
by running the following command::$ sysctl net.ipv4.conf.net1.accept_redirects
Example output
net.ipv4.conf.net1.accept_redirects = 1
Configuring sysctl settings for pods associated with bonded SR-IOV interface flag
You can set interface-level network sysctl
settings for a pod connected to a bonded SR-IOV network device.
In this example, the specific network interface-level sysctl
settings that can be configured are set on the bonded interface.
The sysctl-tuning-test
is a namespace used in this example.
Use the following command to create the
sysctl-tuning-test
namespace:$ oc create namespace sysctl-tuning-test
Setting all sysctl flag on nodes with bonded SR-IOV network devices
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
custom resource definition (CRD) to OKD. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy
custom resource (CR).
When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes. It might take several minutes for a configuration change to apply. |
Follow this procedure to create a SriovNetworkNodePolicy
custom resource (CR).
Procedure
Create an
SriovNetworkNodePolicy
custom resource (CR). Save the following YAML as the filepolicyallflags-sriov-node-network.yaml
. Replacepolicyallflags
with the name for the configuration.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policyallflags (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyallflags (3)
nodeSelector: (4)
node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = `true`
priority: 10 (5)
numVfs: 5 (6)
nicSelector: (7)
pfNames: ["ens1f0"] (8)
deviceType: "netdevice" (9)
isRdma: false (10)
1 The name for the custom resource object. 2 The namespace where the SR-IOV Network Operator is installed. 3 The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name. 4 The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only. 5 Optional: The priority is an integer value between 0
and99
. A smaller value receives higher priority. For example, a priority of10
is a higher priority than99
. The default value is99
.6 The number of virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 128
.7 The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify rootDevices
, you must also specify a value forvendor
,deviceID
, orpfNames
. If you specify bothpfNames
androotDevices
at the same time, ensure that they refer to the same device. If you specify a value fornetFilter
, then you do not need to specify any other parameter because a network ID is unique.8 Optional: An array of one or more physical function (PF) names for the device. 9 Optional: The driver type for the virtual functions. The only allowed value is netdevice
. For a Mellanox NIC to work in DPDK mode on bare metal nodes, setisRdma
totrue
.10 Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false
. If theisRdma
parameter is set totrue
, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. SetisRdma
totrue
and additionally setneedVhostNet
totrue
to configure a Mellanox NIC for use with Fast Datapath DPDK applications.The
vfio-pci
driver type is not supported.Create the SriovNetworkNodePolicy object:
$ oc create -f policyallflags-sriov-node-network.yaml
After applying the configuration update, all the pods in sriov-network-operator namespace change to the
Running
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Example output
Succeeded
Configuring sysctl on a bonded SR-IOV network
You can set interface specific sysctl
settings on a bonded interface created from two SR-IOV interfaces. Do this by adding the tuning configuration to the optional Plugins
parameter of the bond network attachment definition.
Do not edit |
To change specific interface-level network sysctl
settings create the SriovNetwork
custom resource (CR) with the Container Network Interface (CNI) tuning plugin by using the following procedure.
Prerequisites
Install the OKD CLI (oc).
Log in to the OKD cluster as a user with cluster-admin privileges.
Procedure
Create the
SriovNetwork
custom resource (CR) for the bonded interface as in the following example CR. Save the YAML as the filesriov-network-attachment.yaml
.apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: allvalidflags (1)
namespace: openshift-sriov-network-operator (2)
spec:
resourceName: policyallflags (3)
networkNamespace: sysctl-tuning-test (4)
capabilities: '{ "mac": true, "ips": true }' (5)
1 A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name. 2 The namespace where the SR-IOV Network Operator is installed. 3 The value for the spec.resourceName
parameter from theSriovNetworkNodePolicy
object that defines the SR-IOV hardware for this additional network.4 The target namespace for the SriovNetwork
object. Only pods in the target namespace can attach to the additional network.5 Optional: The capabilities to configure for this additional network. You can specify “{ “ips”: true }”
to enable IP address support or“{ “mac”: true }”
to enable MAC address support.Create the
SriovNetwork
resource:$ oc create -f sriov-network-attachment.yaml
Create a bond network attachment definition as in the following example CR. Save the YAML as the file
sriov-bond-network-interface.yaml
.apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: bond-sysctl-network
namespace: sysctl-tuning-test
spec:
config: '{
"cniVersion":"0.4.0",
"name":"bound-net",
"plugins":[
{
"type":"bond", (1)
"ifname":"bond0", (2)
"mode": "active-backup", (3)
"failOverMac": 1, (4)
"linksInContainer": true, (5)
"miimon": "100",
"links": [ (6)
{"name": "net1"},
{"name": "net2"}
],
"ipam":{ (7)
"type":"static"
}
},
{
"type":"tuning", (8)
"capabilities":{
"mac":true
},
"sysctl":{
"net.ipv4.conf.IFNAME.accept_redirects": "0",
"net.ipv4.conf.IFNAME.accept_source_route": "0",
"net.ipv4.conf.IFNAME.disable_policy": "1",
"net.ipv4.conf.IFNAME.secure_redirects": "0",
"net.ipv4.conf.IFNAME.send_redirects": "0",
"net.ipv6.conf.IFNAME.accept_redirects": "0",
"net.ipv6.conf.IFNAME.accept_source_route": "1",
"net.ipv6.neigh.IFNAME.base_reachable_time_ms": "20000",
"net.ipv6.neigh.IFNAME.retrans_time_ms": "2000"
}
}
]
}'
1 The type is bond
.2 The ifname
attribute specifies the name of the bond interface.3 The mode
attribute specifies the bonding mode. The bonding modes supported are:balance-rr
- 0active-backup
- 1balance-xor
- 2For
balance-rr
orbalance-xor
modes, you must set thetrust
mode toon
for the SR-IOV virtual function.
4 The failover
attribute is mandatory for active-backup mode.5 The linksInContainer=true
flag informs the Bond CNI that the interfaces required are to be found inside the container. By default Bond CNI looks for these interfaces on the host which does not work for integration with SRIOV and Multus.6 The links
section defines which interfaces will be used to create the bond. By default, Multus names the attached interfaces as: “net”, plus a consecutive number, starting with one.7 A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition. In this pod example IP addresses are configured manually, so in this case ipam
is set to static.8 Add additional capabilities to the device. For example, set the type
field totuning
. Specify the interface-level networksysctl
you want to set in the sysctl field. This example sets all interface-level networksysctl
settings that can be set.Create the bond network attachment resource:
$ oc create -f sriov-bond-network-interface.yaml
Verifying that the NetworkAttachmentDefinition
CR is successfully created
Confirm that the SR-IOV Network Operator created the
NetworkAttachmentDefinition
CR by running the following command:$ oc get network-attachment-definitions -n <namespace> (1)
1 Replace <namespace>
with the networkNamespace that you specified when configuring the network attachment, for example,sysctl-tuning-test
.Example output
NAME AGE
bond-sysctl-network 22m
allvalidflags 47m
There might be a delay before the SR-IOV Network Operator creates the CR.
Verifying that the additional SR-IOV network resource is successful
To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
Create a
Pod
CR. For example, save the following YAML as the fileexamplepod.yaml
:apiVersion: v1
kind: Pod
metadata:
name: tunepod
namespace: sysctl-tuning-test
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{"name": "allvalidflags"}, (1)
{"name": "allvalidflags"},
{
"name": "bond-sysctl-network",
"interface": "bond0",
"mac": "0a:56:0a:83:04:0c", (2)
"ips": ["10.100.100.200/24"] (3)
}
]
spec:
containers:
- name: podexample
image: centos
command: ["/bin/bash", "-c", "sleep INF"]
securityContext:
runAsUser: 2000
runAsGroup: 3000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
1 The name of the SR-IOV network attachment definition CR. 2 Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { “mac”: true }
in the SriovNetwork object.3 Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { “ips”: true }
in theSriovNetwork
object.Apply the YAML:
$ oc apply -f examplepod.yaml
Verify that the pod is created by running the following command:
$ oc get pod -n sysctl-tuning-test
Example output
NAME READY STATUS RESTARTS AGE
tunepod 1/1 Running 0 47s
Log in to the pod by running the following command:
$ oc rsh -n sysctl-tuning-test tunepod
Verify the values of the configured
sysctl
flag. Find the valuenet.ipv6.neigh.IFNAME.base_reachable_time_ms
by running the following command::$ sysctl net.ipv6.neigh.bond0.base_reachable_time_ms
Example output
net.ipv6.neigh.bond0.base_reachable_time_ms = 20000