- Performance Addon Operator for low latency nodes
- Understanding low latency
- Installing the Performance Addon Operator
- Upgrading Performance Addon Operator
- Provisioning real-time and low latency workloads
- Known limitations for real-time
- Provisioning a worker with real-time capabilities
- Verifying the real-time kernel installation
- Creating a workload that works in real-time
- Creating a pod with a QoS class of
Guaranteed
- Optional: Disabling CPU load balancing for DPDK
- Assigning a proper node selector
- Scheduling a workload onto a worker with real-time capabilities
- Managing device interrupt processing for guaranteed pod isolated CPUs
- Upgrading the performance profile to use device interrupt processing
- Configuring a node for IRQ dynamic load balancing
- Configuring hyperthreading for a cluster
- Configuring huge pages
- Allocating multiple huge page sizes
- Tuning nodes for low latency with the performance profile
- Performing end-to-end tests for platform verification
- Debugging low latency CNF tuning status
- Collecting low latency tuning debugging data for Red Hat Support
Performance Addon Operator for low latency nodes
Understanding low latency
The emergence of Edge computing in the area of Telco / 5G plays a key role in reducing latency and congestion problems and improving application performance.
Simply put, latency determines how fast data (packets) moves from the sender to receiver and returns to the sender after processing by the receiver. Obviously, maintaining a network architecture with the lowest possible delay of latency speeds is key for meeting the network performance requirements of 5G. Compared to 4G technology, with an average latency of 50ms, 5G is targeted to reach latency numbers of 1ms or less. This reduction in latency boosts wireless throughput by a factor of 10.
Many of the deployed applications in the Telco space require low latency that can only tolerate zero packet loss. Tuning for zero packet loss helps mitigate the inherent issues that degrade network performance. For more information, see Tuning for Zero Packet Loss in Red Hat OpenStack Platform (RHOSP).
The Edge computing initiative also comes in to play for reducing latency rates. Think of it as literally being on the edge of the cloud and closer to the user. This greatly reduces the distance between the user and distant data centers, resulting in reduced application response times and performance latency.
Administrators must be able to manage their many Edge sites and local services in a centralized way so that all of the deployments can run at the lowest possible management cost. They also need an easy way to deploy and configure certain nodes of their cluster for real-time low latency and high-performance purposes. Low latency nodes are useful for applications such as Cloud-native Network Functions (CNF) and Data Plane Development Kit (DPDK).
OKD currently provides mechanisms to tune software on an OKD cluster for real-time running and low latency (around <20 microseconds reaction time). This includes tuning the kernel and OKD set values, installing a kernel, and reconfiguring the machine. But this method requires setting up four different Operators and performing many configurations that, when done manually, is complex and could be prone to mistakes.
OKD provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance for OpenShift applications. The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt, the CPUs that will be reserved for housekeeping, and the CPUs that will be used for running the workloads.
About hyperthreading for low latency and real-time applications
Hyperthreading is an Intel processor technology that allows a physical CPU processor core to function as two logical cores, executing two independent threads simultaneously. Hyperthreading allows for better system throughput for certain workload types where parallel processing is beneficial. The default OKD configuration expects hyperthreading to be enabled by default.
For telecommunications applications, it is important to design your application infrastructure to minimize latency as much as possible. Hyperthreading can slow performance times and negatively affect throughput for compute intensive workloads that require low latency. Disabling hyperthreading ensures predictable performance and can decrease processing times for these workloads.
Hyperthreading implementation and configuration differs depending on the hardware you are running OKD on. Consult the relevant host hardware tuning information for more details of the hyperthreading implementation specific to that hardware. Disabling hyperthreading can increase the cost per core of the cluster. |
Additional resources
Installing the Performance Addon Operator
Performance Addon Operator provides the ability to enable advanced node performance tunings on a set of nodes. As a cluster administrator, you can install Performance Addon Operator using the OKD CLI or the web console.
Installing the Operator using the CLI
As a cluster administrator, you can install the Operator using the CLI.
Prerequisites
A cluster installed on bare-metal hardware.
Install the OpenShift CLI (
oc
).Log in as a user with
cluster-admin
privileges.
Procedure
Create a namespace for the Performance Addon Operator by completing the following actions:
Create the following Namespace Custom Resource (CR) that defines the
openshift-performance-addon-operator
namespace, and then save the YAML in thepao-namespace.yaml
file:apiVersion: v1
kind: Namespace
metadata:
name: openshift-performance-addon-operator
Create the namespace by running the following command:
$ oc create -f pao-namespace.yaml
Install the Performance Addon Operator in the namespace you created in the previous step by creating the following objects:
Create the following
OperatorGroup
CR and save the YAML in thepao-operatorgroup.yaml
file:apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-performance-addon-operator
namespace: openshift-performance-addon-operator
Create the
OperatorGroup
CR by running the following command:$ oc create -f pao-operatorgroup.yaml
Run the following command to get the
channel
value required for the next step.$ oc get packagemanifest performance-addon-operator -n openshift-marketplace -o jsonpath='{.status.defaultChannel}'
Example output
4.7
Create the following Subscription CR and save the YAML in the
pao-sub.yaml
file:Example Subscription
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: openshift-performance-addon-operator-subscription
namespace: openshift-performance-addon-operator
spec:
channel: "<channel>" (1)
name: performance-addon-operator
source: redhat-operators (2)
sourceNamespace: openshift-marketplace
1 Specify the value from you obtained in the previous step for the .status.defaultChannel
parameter.2 You must specify the redhat-operators
value.Create the Subscription object by running the following command:
$ oc create -f pao-sub.yaml
Change to the
openshift-performance-addon-operator
project:$ oc project openshift-performance-addon-operator
Installing the Performance Addon Operator using the web console
As a cluster administrator, you can install the Performance Addon Operator using the web console.
You must create the |
Procedure
Install the Performance Addon Operator using the OKD web console:
In the OKD web console, click Operators → OperatorHub.
Choose Performance Addon Operator from the list of available Operators, and then click Install.
On the Install Operator page, select All namespaces on the cluster. Then, click Install.
Optional: Verify that the performance-addon-operator installed successfully:
Switch to the Operators → Installed Operators page.
Ensure that Performance Addon Operator is listed in the openshift-performance-addon-operator project with a Status of InstallSucceeded.
During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not appear as installed, to troubleshoot further:
Go to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
Go to the Workloads → Pods page and check the logs for pods in the
performance-addon-operator
project.
Upgrading Performance Addon Operator
You can manually upgrade to the next minor version of Performance Addon Operator and monitor the status of an update by using the web console.
About upgrading Performance Addon Operator
You can upgrade to the next minor version of Performance Addon Operator by using the OKD web console to change the channel of your Operator subscription.
You can enable automatic z-stream updates during Performance Addon Operator installation.
Updates are delivered via the Marketplace Operator, which is deployed during OKD installation.The Marketplace Operator makes external Operators available to your cluster.
The amount of time an update takes to complete depends on your network connection. Most automatic updates complete within fifteen minutes.
How Performance Addon Operator upgrades affect your cluster
Neither the low latency tuning nor huge pages are affected.
Updating the Operator should not cause any unexpected reboots.
Upgrading Performance Addon Operator to the next minor version
You can manually upgrade Performance Addon Operator to the next minor version by using the OKD web console to change the channel of your Operator subscription.
Prerequisites
- Access to the cluster as a user with the cluster-admin role.
Procedure
Access the OpenShift web console and navigate to Operators → Installed Operators.
Click Performance Addon Operator to open the Operator Details page.
Click the Subscription tab to open the Subscription Overview page.
In the Channel pane, click the pencil icon on the right side of the version number to open the Change Subscription Update Channel window.
Select the next minor version. For example, if you want to upgrade to Performance Addon Operator 4.7, select 4.7.
Click Save.
Check the status of the upgrade by navigating to Operators → Installed Operators. You can also check the status by running the following
oc
command:$ oc get csv -n openshift-performance-addon-operator
Upgrading Performance Addon Operator when previously installed to a specific namespace
If you previously installed the Performance Addon Operator to a specific namespace on the cluster, for example openshift-performance-addon-operator
, modify the OperatorGroup
object to remove the targetNamespaces
entry before upgrading.
Prerequisites
Install the OKD CLI (oc).
Log in to the OpenShift cluster as a user with cluster-admin privileges.
Procedure
Edit the Performance Addon Operator
OperatorGroup
CR and remove thespec
element that contains thetargetNamespaces
entry by running the following command:$ oc patch operatorgroup -n openshift-performance-addon-operator openshift-performance-addon-operator --type json -p '[{ "op": "remove", "path": "/spec" }]'
Wait until the Operator Lifecycle Manager (OLM) processes the change.
Verify that the OperatorGroup CR change has been successfully applied. Check that the
OperatorGroup
CRspec
element has been removed:$ oc describe -n openshift-performance-addon-operator og openshift-performance-addon-operator
Proceed with the Performance Addon Operator upgrade.
Monitoring upgrade status
The best way to monitor Performance Addon Operator upgrade status is to watch the ClusterServiceVersion
(CSV) PHASE
. You can also monitor the CSV conditions in the web console or by running the oc get csv
command.
The |
Prerequisites
Access to the cluster as a user with the
cluster-admin
role.Install the OpenShift CLI (
oc
).
Procedure
Run the following command:
$ oc get csv
Review the output, checking the
PHASE
field. For example:VERSION REPLACES PHASE
4.7.0 performance-addon-operator.v4.6.0 Installing
4.6.0 Replacing
Run
get csv
again to verify the output:# oc get csv
Example output
NAME DISPLAY VERSION REPLACES PHASE
performance-addon-operator.v4.7.0 Performance Addon Operator 4.7.0 performance-addon-operator.v4.6.0 Succeeded
Provisioning real-time and low latency workloads
Many industries and organizations need extremely high performance computing and might require low and predictable latency, especially in the financial and telecommunications industries. For these industries, with their unique requirements, OKD provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance and consistent response time for OKD applications.
The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt (real-time), the CPUs that will be reserved for housekeeping, and the CPUs that are used for running the workloads.
Known limitations for real-time
The RT kernel is only supported on worker nodes. |
To fully utilize the real-time mode, the containers must run with elevated privileges. See Set capabilities for a Container for information on granting privileges.
OKD restricts the allowed capabilities, so you might need to create a SecurityContext
as well.
This procedure is fully supported with bare metal installations using Fedora CoreOS (FCOS) systems. |
Establishing the right performance expectations refers to the fact that the real-time kernel is not a panacea. Its objective is consistent, low-latency determinism offering predictable response times. There is some additional kernel overhead associated with the real-time kernel. This is due primarily to handling hardware interruptions in separately scheduled threads. The increased overhead in some workloads results in some degradation in overall throughput. The exact amount of degradation is very workload dependent, ranging from 0% to 30%. However, it is the cost of determinism.
Provisioning a worker with real-time capabilities
Install Performance Addon Operator to the cluster.
Optional: Add a node to the OKD cluster. See Setting BIOS parameters.
Optional: Create a new machine config pool for real-time nodes.
Add the node to the proper machine config pool, using node role labels.
You must decide which nodes will be configured with real-time workloads. It could be all of the nodes in the cluster or a subset of the nodes. The Performance Addon Operator expects all of the nodes are part of a dedicated machine config pool. If you use all of the nodes, you just point the Performance Addon Operator to the worker node role label. If you use a subset, you must group the nodes into a new machine config pool.
Create the
PerformanceProfile
with the proper set of housekeeping cores andrealTimeKernel: enabled: true
.Specify a node selector in the
PerformanceProfile
, as shown here:apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: example-performanceprofile
spec:
...
realTimeKernel:
enabled: true
nodeSelector:
node-role.kubernetes.io/worker-rt: ""
Verify that a matching machine config pool exists with a label:
machineconfiguration.openshift.io/role=worker-rt
OKD will start configuring the nodes, which might involve multiple reboots. Wait for the nodes to settle. This can take a long time depending on the specific hardware you use, but 20 minutes per node is expected.
Verify everything is working as expected.
Verifying the real-time kernel installation
Use this command to verify that the real-time kernel is installed:
$ oc get node -o wide
Note the worker with the role worker-rt
that contains the string 4.18.0-211.rt5.23.el8.x86_64
:
NAME STATUS ROLES AGE VERSION INTERNAL-IP
EXTERNAL-IP OS-IMAGE KERNEL-VERSION
CONTAINER-RUNTIME
cnf-worker-0.example.com Ready worker,worker-rt 5d17h v1.20.0
128.66.135.107 <none> Red Hat Enterprise Linux CoreOS 46.82.202008252340-0 (Ootpa)
4.18.0-211.rt5.23.el8.x86_64 cri-o://1.20.0-90.rhaos4.7.git4a0ac05.el8-rc.1
[...]
Creating a workload that works in real-time
Use the following procedures for preparing a workload that will use real-time capabilities.
Procedure
Create a pod with a QoS class of
Guaranteed
.Optional: Disable CPU load balancing for DPDK.
Assign a proper node selector.
When writing your applications, follow the general recommendations described in Application tuning and deployment.
Creating a pod with a QoS class of Guaranteed
Keep the following in mind when you create a pod that is given a QoS class of Guaranteed
:
Every container in the pod must have a memory limit and a memory request, and they must be the same.
Every container in the pod must have a CPU limit and a CPU request, and they must be the same.
The following example shows the configuration file for a pod that has one container. The container has a memory limit and a memory request, both equal to 200 MiB. The container has a CPU limit and a CPU request, both equal to 1 CPU.
apiVersion: v1
kind: Pod
metadata:
name: qos-demo
namespace: qos-example
spec:
containers:
- name: qos-demo-ctr
image: <image-pull-spec>
resources:
limits:
memory: "200Mi"
cpu: "1"
requests:
memory: "200Mi"
cpu: "1"
Create the pod:
$ oc apply -f qos-pod.yaml --namespace=qos-example
View detailed information about the pod:
$ oc get pod qos-demo --namespace=qos-example --output=yaml
Example output
spec:
containers:
...
status:
qosClass: Guaranteed
If a container specifies its own memory limit, but does not specify a memory request, OKD automatically assigns a memory request that matches the limit. Similarly, if a container specifies its own CPU limit, but does not specify a CPU request, OKD automatically assigns a CPU request that matches the limit.
Optional: Disabling CPU load balancing for DPDK
Functionality to disable or enable CPU load balancing is implemented on the CRI-O level. The code under the CRI-O disables or enables CPU load balancing only when the following requirements are met.
The pod must use the
performance-<profile-name>
runtime class. You can get the proper name by looking at the status of the performance profile, as shown here:apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
...
status:
...
runtimeClass: performance-manual
The pod must have the
cpu-load-balancing.crio.io: true
annotation.
The Performance Addon Operator is responsible for the creation of the high-performance runtime handler config snippet under relevant nodes and for creation of the high-performance runtime class under the cluster. It will have the same content as default runtime handler except it enables the CPU load balancing configuration functionality.
To disable the CPU load balancing for the pod, the Pod
specification must include the following fields:
apiVersion: v1
kind: Pod
metadata:
...
annotations:
...
cpu-load-balancing.crio.io: "disable"
...
...
spec:
...
runtimeClassName: performance-<profile_name>
...
Only disable CPU load balancing when the CPU manager static policy is enabled and for pods with guaranteed QoS that use whole CPUs. Otherwise, disabling CPU load balancing can affect the performance of other containers in the cluster. |
Assigning a proper node selector
The preferred way to assign a pod to nodes is to use the same node selector the performance profile used, as shown here:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
[...]
nodeSelector:
node-role.kubernetes.io/worker-rt: ""
For more information, see Placing pods on specific nodes using node selectors.
Scheduling a workload onto a worker with real-time capabilities
Use label selectors that match the nodes attached to the machine config pool that was configured for low latency by the Performance Addon Operator. For more information, see Assigning pods to nodes.
Managing device interrupt processing for guaranteed pod isolated CPUs
The Performance Addon Operator manages host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, and isolated CPUs for workloads. CPUs that are used for low latency workloads are set as isolated.
Device interrupts are load balanced between all isolated and reserved CPUs to avoid CPUs being overloaded, with the exception of CPUs where there is a guaranteed pod running. Guaranteed pod CPUs are prevented from processing device interrupts when the relevant annotations are set for the pod.
In the performance profile, globallyDisableIrqLoadBalancing
is used to manage whether device interrupts are processed or not. For certain workloads the reserved CPUs are not always sufficient for dealing with device interrupts, and for this reason, device interrupts are not globally disabled on the isolated CPUs. By default, Performance Addon Operator does not disable device interrupts on isolated CPUs.
To achieve low latency for workloads, some (but not all) pods require the CPUs they are running on to not process device interrupts. A pod annotation, irq-load-balancing.crio.io
, is used to define whether device interrupts are processed or not. When configured, CRI-O disables device interrupts only as long as the pod is running.
Disabling global device interrupts handling in Performance Addon Operator
To configure Performance Addon Operator to disable global device interrupts for the isolated CPU set, set the globallyDisableIrqLoadBalancing
field in the performance profile to true
. When true
, conflicting pod annotations are ignored. When false
, IRQ loads are balanced across all CPUs.
A performance profile snippet illustrates this setting:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: manual
spec:
globallyDisableIrqLoadBalancing: true
...
Disabling interrupt processing for individual pods
To disable interrupt processing for individual pods, ensure that globallyDisableIrqLoadBalancing
is set to false
in the performance profile. Then, in the pod specification, set the irq-load-balancing.crio.io
and cpu-load-balancing.crio.io
pod annotations to disable
. An example pod specification snippet that illustrates this is below:
apiVersion: performance.openshift.io/v2
kind: Pod
metadata:
annotations:
irq-load-balancing.crio.io: "disable"
cpu-load-balancing.crio.io: "disable"
spec:
runtimeClassName: performance-<profile_name>
...
Upgrading the performance profile to use device interrupt processing
When you upgrade the Performance Addon Operator performance profile custom resource definition (CRD) from v1 or v1alpha1 to v2, globallyDisableIrqLoadBalancing
is set to true
on existing profiles.
When |
Supported API Versions
The Performance Addon Operator supports v2
, v1
, and v1alpha1
for the performance profile apiVersion
field. The v1 and v1alpha1 APIs are identical. The v2 API includes an optional boolean field globallyDisableIrqLoadBalancing
with a default value of false
.
Upgrading Performance Addon Operator API from v1alpha1 to v1
When upgrading Performance Addon Operator API version from v1alpha1 to v1, the v1alpha1 performance profiles are converted on-the-fly using a “None” Conversion strategy and served to the Performance Addon Operator with API version v1.
Upgrading Performance Addon Operator API from v1alpha1 or v1 to v2
When upgrading from an older Performance Addon Operator API version, the existing v1 and v1alpha1 performance profiles are converted using a conversion webhook that injects the globallyDisableIrqLoadBalancing
field with a value of true
.
Configuring a node for IRQ dynamic load balancing
To configure a cluster node to handle IRQ dynamic load balancing, do the following:
Log in to the OKD cluster as a user with cluster-admin privileges.
Set the performance profile
apiVersion
to useperformance.openshift.io/v2
.Remove the
globallyDisableIrqLoadBalancing
field or set it tofalse
.Set the appropriate isolated and reserved CPUs. The following snippet illustrates a profile that reserves 2 CPUs. IRQ load-balancing is enabled for pods running on the
isolated
CPU set:apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: dynamic-irq-profile
spec:
cpu:
isolated: 2-5
reserved: 0-1
...
Create the pod that uses exclusive CPUs, and set
irq-load-balancing.crio.io
andcpu-quota.crio.io
annotations todisable
. For example:apiVersion: v1
kind: Pod
metadata:
name: dynamic-irq-pod
annotations:
irq-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
spec:
containers:
- name: dynamic-irq-pod
image: "quay.io/openshift-kni/cnf-tests:4.7"
command: ["sleep", "10h"]
resources:
requests:
cpu: 2
memory: "200M"
limits:
cpu: 2
memory: "200M"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
runtimeClassName: performance-dynamic-irq-profile
...
Enter the pod
runtimeClassName
in the form performance-<profile_name>, where <profile_name> is thename
from thePerformanceProfile
YAML, in this example,performance-dynamic-irq-profile
.Set the node selector to target a cnf-worker.
Ensure the pod is running correctly. Status should be
running
, and the correct cnf-worker node should be set:$ oc get pod -o wide
Expected output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dynamic-irq-pod 1/1 Running 0 5h33m <ip-address> <node-name> <none> <none>
Get the CPUs that the pod configured for IRQ dynamic load balancing runs on:
$ oc exec -it dynamic-irq-pod -- /bin/bash -c "grep Cpus_allowed_list /proc/self/status | awk '{print $2}'"
Expected output
Cpus_allowed_list: 2-3
Ensure the node configuration is applied correctly. SSH into the node to verify the configuration.
$ oc debug node/<node-name>
Expected output
Starting pod/<node-name>-debug ...
To use host binaries, run `chroot /host`
Pod IP: <ip-address>
If you don't see a command prompt, try pressing enter.
sh-4.4#
Verify that you can use the node file system:
sh-4.4# chroot /host
Expected output
sh-4.4#
Ensure the default system CPU affinity mask does not include the
dynamic-irq-pod
CPUs, for example, CPUs 2 and 3.$ cat /proc/irq/default_smp_affinity
Example output
33
Ensure the system IRQs are not configured to run on the
dynamic-irq-pod
CPUs:find /proc/irq/ -name smp_affinity_list -exec sh -c 'i="$1"; mask=$(cat $i); file=$(echo $i); echo $file: $mask' _ {} \;
Example output
/proc/irq/0/smp_affinity_list: 0-5
/proc/irq/1/smp_affinity_list: 5
/proc/irq/2/smp_affinity_list: 0-5
/proc/irq/3/smp_affinity_list: 0-5
/proc/irq/4/smp_affinity_list: 0
/proc/irq/5/smp_affinity_list: 0-5
/proc/irq/6/smp_affinity_list: 0-5
/proc/irq/7/smp_affinity_list: 0-5
/proc/irq/8/smp_affinity_list: 4
/proc/irq/9/smp_affinity_list: 4
/proc/irq/10/smp_affinity_list: 0-5
/proc/irq/11/smp_affinity_list: 0
/proc/irq/12/smp_affinity_list: 1
/proc/irq/13/smp_affinity_list: 0-5
/proc/irq/14/smp_affinity_list: 1
/proc/irq/15/smp_affinity_list: 0
/proc/irq/24/smp_affinity_list: 1
/proc/irq/25/smp_affinity_list: 1
/proc/irq/26/smp_affinity_list: 1
/proc/irq/27/smp_affinity_list: 5
/proc/irq/28/smp_affinity_list: 1
/proc/irq/29/smp_affinity_list: 0
/proc/irq/30/smp_affinity_list: 0-5
Some IRQ controllers do not support IRQ re-balancing and will always expose all online CPUs as the IRQ mask. These IRQ controllers effectively run on CPU 0. For more information on the host configuration, SSH into the host and run the following, replacing <irq-num>
with the CPU number that you want to query:
$ cat /proc/irq/<irq-num>/effective_affinity
Configuring hyperthreading for a cluster
To configure hyperthreading for an OKD cluster, set the CPU threads in the performance profile to the same cores that are configured for the reserved or isolated CPU pools.
If you configure a performance profile, and subsequently change the hyperthreading configuration for the host, ensure that you update the CPU |
Disabling a previously enabled host hyperthreading configuration can cause the CPU core IDs listed in the |
Prerequisites
Access to the cluster as a user with the
cluster-admin
role.Install the OpenShift CLI (oc).
Procedure
Ascertain which threads are running on what CPUs for the host you want to configure.
You can view which threads are running on the host CPUs by logging in to the cluster and running the following command:
$ lscpu --all --extended
Example output
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ
0 0 0 0 0:0:0:0 yes 4800.0000 400.0000
1 0 0 1 1:1:1:0 yes 4800.0000 400.0000
2 0 0 2 2:2:2:0 yes 4800.0000 400.0000
3 0 0 3 3:3:3:0 yes 4800.0000 400.0000
4 0 0 0 0:0:0:0 yes 4800.0000 400.0000
5 0 0 1 1:1:1:0 yes 4800.0000 400.0000
6 0 0 2 2:2:2:0 yes 4800.0000 400.0000
7 0 0 3 3:3:3:0 yes 4800.0000 400.0000
In this example, there are eight logical CPU cores running on four physical CPU cores. CPU0 and CPU4 are running on physical Core0, CPU1 and CPU5 are running on physical Core 1, and so on.
Alternatively, to view the threads that are set for a particular physical CPU core (
cpu0
in the example below), open a command prompt and run the following:$ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
Example output
0-4
Apply the isolated and reserved CPUs in the
PerformanceProfile
YAML. For example, you could set logical cores CPU0 and CPU4 as isolated, and logical cores CPU1 and CPU5 as reserved:...
cpu:
isolated: 0-4
reserved: 1-5
...
Hyperthreading is enabled by default on most Intel processors. If you enable hyperthreading, all threads processed by a particular core must be isolated or processed on the same core. |
Disabling hyperthreading for low latency applications
When configuring clusters for low latency processing, consider whether you want to disable hyperthreading before you deploy the cluster. To disable hyperthreading, do the following:
Create a performance profile that is appropriate for your hardware and topology.
Set
nosmt
as an additional kernel argument. The following example performance profile illustrates this setting:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: example-performanceprofile
spec:
additionalKernelArgs:
- nmi_watchdog=0
- audit=0
- mce=off
- processor.max_cstate=1
- idle=poll
- intel_idle.max_cstate=0
- nosmt
cpu:
isolated: 2-3
reserved: 0-1
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 2
node: 0
size: 1G
nodeSelector:
node-role.kubernetes.io/performance: ''
realTimeKernel:
enabled: true
Configuring huge pages
Nodes must pre-allocate huge pages used in an OKD cluster. Use the Performance Addon Operator to allocate huge pages on a specific node.
OKD provides a method for creating and allocating huge pages. Performance Addon Operator provides an easier method for doing this using the performance profile.
For example, in the hugepages
pages
section of the performance profile, you can specify multiple blocks of size
, count
, and, optionally, node
:
hugepages:
defaultHugepagesSize: "1G"
pages:
- size: "1G"
count: 4
node: 0 (1)
1 | node is the NUMA node in which the huge pages are allocated. If you omit node , the pages are evenly spread across all NUMA nodes. |
Wait for the relevant machine config pool status that indicates the update is finished. |
These are the only configuration steps you need to do to allocate huge pages.
Verification
To verify the configuration, see the
/proc/meminfo
file on the node:$ oc debug node/ip-10-0-141-105.ec2.internal
# grep -i huge /proc/meminfo
Example output
AnonHugePages: ###### ##
ShmemHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: #### ##
Hugetlb: #### ##
Use
oc describe
to report the new size:$ oc describe node worker-0.ocp4poc.example.com | grep -i huge
Example output
hugepages-1g=true
hugepages-###: ###
hugepages-###: ###
Allocating multiple huge page sizes
You can request huge pages with different sizes under the same container. This allows you to define more complicated pods consisting of containers with different huge page size needs.
For example, you can define sizes 1G
and 2M
and the Performance Addon Operator will configure both sizes on the node, as shown here:
spec:
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 1024
node: 0
size: 2M
- count: 4
node: 1
size: 1G
Tuning nodes for low latency with the performance profile
The performance profile lets you control latency tuning aspects of nodes that belong to a certain machine config pool. After you specify your settings, the PerformanceProfile
object is compiled into multiple objects that perform the actual node level tuning:
A
MachineConfig
file that manipulates the nodes.A
KubeletConfig
file that configures the Topology Manager, the CPU Manager, and the OKD nodes.The Tuned profile that configures the Node Tuning Operator.
Procedure
Prepare a cluster.
Create a machine config pool.
Install the Performance Addon Operator.
Create a performance profile that is appropriate for your hardware and topology. In the performance profile, you can specify whether to update the kernel to kernel-rt, allocation of huge pages, the CPUs that will be reserved for operating system housekeeping processes and CPUs that will be used for running the workloads.
This is a typical performance profile:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance
spec:
cpu:
isolated: "5-15"
reserved: "0-4"
hugepages:
defaultHugepagesSize: "1G"
pages:
-size: "1G"
count: 16
node: 0
realTimeKernel:
enabled: true (1)
numa: (2)
topologyPolicy: "best-effort"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
1 | Valid values are true or false . Setting the true value installs the real-time kernel on the node. |
2 | Use this field to configure the topology manager policy. Valid values are none (default), best-effort , restricted , and single-numa-node . For more information, see Topology Manager Policies. |
Partitioning the CPUs
You can reserve cores, or threads, for operating system housekeeping tasks from a single NUMA node and put your workloads on another NUMA node. The reason for this is that the housekeeping processes might be using the CPUs in a way that would impact latency sensitive processes running on those same CPUs. Keeping your workloads on a separate NUMA node prevents the processes from interfering with each other. Additionally, each NUMA node has its own memory bus that is not shared.
Specify two groups of CPUs in the spec
section:
isolated
- Has the lowest latency. Processes in this group have no interruptions and so can, for example, reach much higher DPDK zero packet loss bandwidth.reserved
- The housekeeping CPUs. Threads in the reserved group tend to be very busy, so latency-sensitive applications should be run in the isolated group. See Create a pod that gets assigned a QoS class ofGuaranteed
.
Performing end-to-end tests for platform verification
The Cloud-native Network Functions (CNF) tests image is a containerized test suite that validates features required to run CNF payloads. You can use this image to validate a CNF-enabled OpenShift cluster where all the components required for running CNF workloads are installed.
The tests run by the image are split into three different phases:
Simple cluster validation
Setup
End to end tests
The validation phase checks that all the features required to be tested are deployed correctly on the cluster.
Validations include:
Targeting a machine config pool that belong to the machines to be tested
Enabling SCTP on the nodes
Enabling xt_u32 kernel module via machine config
Having the Performance Addon Operator installed
Having the SR-IOV Operator installed
Having the PTP Operator installed
Using OVN kubernetes as the SDN
Latency tests, a part of the CNF-test container, also require the same validations. For more information about running a latency test, see the Running the latency tests section.
The tests need to perform an environment configuration every time they are executed. This involves items such as creating SR-IOV node policies, performance profiles, or PTP profiles. Allowing the tests to configure an already configured cluster might affect the functionality of the cluster. Also, changes to configuration items such as SR-IOV node policy might result in the environment being temporarily unavailable until the configuration change is processed.
Prerequisites
The test entrypoint is
/usr/bin/test-run.sh
. It runs both a setup test set and the real conformance test suite. The minimum requirement is to provide it with a kubeconfig file and its related$KUBECONFIG
environment variable, mounted through a volume.The tests assumes that a given feature is already available on the cluster in the form of an Operator, flags enabled on the cluster, or machine configs.
Some tests require a pre-existing machine config pool to append their changes to. This must be created on the cluster before running the tests.
The default worker pool is
worker-cnf
and can be created with the following manifest:apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-cnf
labels:
machineconfiguration.openshift.io/role: worker-cnf
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker-cnf, worker],
}
paused: false
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-cnf: ""
You can use the
ROLE_WORKER_CNF
variable to override the worker pool name:$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig -e
ROLE_WORKER_CNF=custom-worker-pool registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh
Currently, not all tests run selectively on the nodes belonging to the pool.
Running the tests
Assuming the kubeconfig
file is in the current folder, the command for running the test suite is:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh
This allows your kubeconfig
file to be consumed from inside the running container.
Running the latency tests
In OKD 4.7, you can also run latency tests from the CNF-test container. The latency test allows you to set a latency limit so that you can determine performance, throughput, and latency.
The latency test runs the oslat
tool, which is an open source program to detect OS level latency. For more information, see the Red Hat Knowledgebase solution How to measure OS and hardware latency on isolated CPUs?.
By default, the latency tests are disabled. To enable the latency test, you must add the LATENCY_TEST_RUN
variable and set its value to true
. For example, LATENCY_TEST_RUN=true
.
Additionally, you can set the following environment variables for latency tests:
LATENCY_TEST_RUNTIME
- Specifies the amount of time (in seconds) that the latency test must run.OSLAT_MAXIMUM_LATENCY
- Specifies the maximum latency (in microseconds) that is expected from all buckets during theoslat
test run.
To perform the latency tests, run the following command:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=600 -e OSLAT_MAXIMUM_LATENCY=20 registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh
You must run the latency test in discovery mode. For more information, see the Discovery mode section. |
Excerpt of a sample result of a 10-second latency test using the following command:
[root@cnf12-installer ~]# podman run --rm -v $KUBECONFIG:/kubeconfig:Z -e PERF_TEST_PROFILE=worker-cnf-2 -e KUBECONFIG=/kubeconfig -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=10 -e OSLAT_MAXIMUM_LATENCY=20 -e DISCOVERY_MODE=true registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh
-ginkgo.focus="Latency"
running /0_config.test -ginkgo.focus=Latency
Example output
I1106 15:09:08.087085 7 request.go:621] Throttling request took 1.037172581s, request: GET:https://api.cnf12.kni.lab.eng.bos.redhat.com:6443/apis/autoscaling.openshift.io/v1?timeout=32s
Running Suite: Performance Addon Operator configuration
Random Seed: 1604675347
Will run 0 of 1 specs
JUnit report was created: /unit_report_performance_config.xml
Ran 0 of 1 Specs in 0.000 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 1 Skipped
PASS
running /4_latency.test -ginkgo.focus=Latency
I1106 15:09:10.735795 23 request.go:621] Throttling request took 1.037276624s, request: GET:https://api.cnf12.kni.lab.eng.bos.redhat.com:6443/apis/certificates.k8s.io/v1?timeout=32s
Running Suite: Performance Addon Operator latency e2e tests
Random Seed: 1604675349
Will run 1 of 1 specs
I1106 15:10:06.401180 23 nodes.go:86] found mcd machine-config-daemon-r78qc for node cnfdd8.clus2.t5g.lab.eng.bos.redhat.com
I1106 15:10:06.738120 23 utils.go:23] run command 'oc [exec -i -n openshift-machine-config-operator -c machine-config-daemon --request-timeout 30 machine-config-daemon-r78qc -- cat /rootfs/var/log/oslat.log]' (err=<nil>):
stdout=
Version: v0.1.7
Total runtime: 10 seconds
Thread priority: SCHED_FIFO:1
CPU list: 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50
CPU for main thread: 2
Workload: no
Workload mem: 0 (KiB)
Preheat cores: 48
Pre-heat for 1 seconds...
Test starts...
Test completed.
Core: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
CPU Freq: 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2092 2096 2096 2096 2092 2092 2096 2096 2096 2096 2096 2096 2096 2096 2096 2096 2092 2096 2096 2092 2096 2096 2096 2096 2092 2096 2096 2096 2092 2096 2096 2096 2096 2096 2096 (Mhz)
...
Maximum: 3 4 3 3 3 3 3 3 4 3 3 3 3 4 3 3 3 3 3 4 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 3 4 (us)
Image parameters
Depending on the requirements, the tests can use different images. There are two images used by the tests that can be changed using the following environment variables:
CNF_TESTS_IMAGE
DPDK_TESTS_IMAGE
For example, to change the CNF_TESTS_IMAGE
with a custom registry run the following command:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig -e CNF_TESTS_IMAGE="custom-cnf-tests-image:latests" registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh
Ginkgo parameters
The test suite is built upon the ginkgo BDD framework. This means that it accepts parameters for filtering or skipping tests.
You can use the -ginkgo.focus
parameter to filter a set of tests:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh -ginkgo.focus="performance|sctp"
You can run only the latency test using the -ginkgo.focus
parameter.
To run only the latency test, you must provide the -ginkgo.focus
parameter and the PERF_TEST_PROFILE
environment variable that contains the name of the performance profile that needs to be tested. For example:
$ docker run --rm -v $KUBECONFIG:/kubeconfig -e KUBECONFIG=/kubeconfig -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=600 -e OSLAT_MAXIMUM_LATENCY=20 -e PERF_TEST_PROFILE=<performance_profile_name> registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh -ginkgo.focus="\[performance\]\[config\]|\[performance\]\ Latency\ Test"
There is a particular test that requires both SR-IOV and SCTP. Given the selective nature of the |
Available features
The set of available features to filter are:
performance
sriov
ptp
sctp
xt_u32
dpdk
Dry run
Use this command to run in dry-run mode. This is useful for checking what is in the test suite and provides output for all of the tests the image would run.
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh -ginkgo.dryRun -ginkgo.v
Disconnected mode
The CNF tests image support running tests in a disconnected cluster, meaning a cluster that is not able to reach outer registries. This is done in two steps:
Performing the mirroring.
Instructing the tests to consume the images from a custom registry.
Mirroring the images to a custom registry accessible from the cluster
A mirror
executable is shipped in the image to provide the input required by oc
to mirror the images needed to run the tests to a local registry.
Run this command from an intermediate machine that has access both to the cluster and to registry.redhat.io over the Internet:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/mirror -registry my.local.registry:5000/ | oc image mirror -f -
Then, follow the instructions in the following section about overriding the registry used to fetch the images.
Instruct the tests to consume those images from a custom registry
This is done by setting the IMAGE_REGISTRY
environment variable:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig -e IMAGE_REGISTRY="my.local.registry:5000/" -e CNF_TESTS_IMAGE="custom-cnf-tests-image:latests" registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh
Mirroring to the cluster internal registry
OKD provides a built-in container image registry, which runs as a standard workload on the cluster.
Procedure
Gain external access to the registry by exposing it with a route:
$ oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
Fetch the registry endpoint:
REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')
Create a namespace for exposing the images:
$ oc create ns cnftests
Make that image stream available to all the namespaces used for tests. This is required to allow the tests namespaces to fetch the images from the
cnftests
image stream.$ oc policy add-role-to-user system:image-puller system:serviceaccount:sctptest:default --namespace=cnftests
$ oc policy add-role-to-user system:image-puller system:serviceaccount:cnf-features-testing:default --namespace=cnftests
$ oc policy add-role-to-user system:image-puller system:serviceaccount:performance-addon-operators-testing:default --namespace=cnftests
$ oc policy add-role-to-user system:image-puller system:serviceaccount:dpdk-testing:default --namespace=cnftests
$ oc policy add-role-to-user system:image-puller system:serviceaccount:sriov-conformance-testing:default --namespace=cnftests
Retrieve the docker secret name and auth token:
SECRET=$(oc -n cnftests get secret | grep builder-docker | awk {'print $1'}
TOKEN=$(oc -n cnftests get secret $SECRET -o jsonpath="{.data['\.dockercfg']}" | base64 --decode | jq '.["image-registry.openshift-image-registry.svc:5000"].auth')
Write a
dockerauth.json
similar to this:echo "{\"auths\": { \"$REGISTRY\": { \"auth\": $TOKEN } }}" > dockerauth.json
Do the mirroring:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/mirror -registry $REGISTRY/cnftests | oc image mirror --insecure=true -a=$(pwd)/dockerauth.json -f -
Run the tests:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig -e IMAGE_REGISTRY=image-registry.openshift-image-registry.svc:5000/cnftests cnf-tests-local:latest /usr/bin/test-run.sh
Mirroring a different set of images
Procedure
The
mirror
command tries to mirror the u/s images by default. This can be overridden by passing a file with the following format to the image:[
{
"registry": "public.registry.io:5000",
"image": "imageforcnftests:4.7"
},
{
"registry": "public.registry.io:5000",
"image": "imagefordpdk:4.7"
}
]
Pass it to the
mirror
command, for example saving it locally asimages.json
. With the following command, the local path is mounted in/kubeconfig
inside the container and that can be passed to the mirror command.$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/mirror --registry "my.local.registry:5000/" --images "/kubeconfig/images.json" | oc image mirror -f -
Discovery mode
Discovery mode allows you to validate the functionality of a cluster without altering its configuration. Existing environment configurations are used for the tests. The tests attempt to find the configuration items needed and use those items to execute the tests. If resources needed to run a specific test are not found, the test is skipped, providing an appropriate message to the user. After the tests are finished, no cleanup of the pre-configured configuration items is done, and the test environment can be immediately used for another test run.
Some configuration items are still created by the tests. These are specific items needed for a test to run; for example, a SR-IOV Network. These configuration items are created in custom namespaces and are cleaned up after the tests are executed.
An additional bonus is a reduction in test run times. As the configuration items are already there, no time is needed for environment configuration and stabilization.
To enable discovery mode, the tests must be instructed by setting the DISCOVERY_MODE
environment variable as follows:
$ docker run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig -e
DISCOVERY_MODE=true registry.redhat.io/openshift-kni/cnf-tests /usr/bin/test-run.sh
Required environment configuration prerequisites
SR-IOV tests
Most SR-IOV tests require the following resources:
SriovNetworkNodePolicy
.At least one with the resource specified by
SriovNetworkNodePolicy
being allocatable; a resource count of at least 5 is considered sufficient.
Some tests have additional requirements:
An unused device on the node with available policy resource, with link state
DOWN
and not a bridge slave.A
SriovNetworkNodePolicy
with a MTU value of9000
.
DPDK tests
The DPDK related tests require:
A performance profile.
A SR-IOV policy.
A node with resources available for the SR-IOV policy and available with the
PerformanceProfile
node selector.
PTP tests
A slave
PtpConfig
(ptp4lOpts="-s" ,phc2sysOpts="-a -r"
).A node with a label matching the slave
PtpConfig
.
SCTP tests
SriovNetworkNodePolicy
.A node matching both the
SriovNetworkNodePolicy
and aMachineConfig
that enables SCTP.
XT_U32 tests
- A node with a machine config that enables XT_U32.
Performance Operator tests
Various tests have different requirements. Some of them are:
A performance profile.
A performance profile having
profile.Spec.CPU.Isolated = 1
.A performance profile having
profile.Spec.RealTimeKernel.Enabled == true
.A node with no huge pages usage.
Limiting the nodes used during tests
The nodes on which the tests are executed can be limited by specifying a NODES_SELECTOR
environment variable. Any resources created by the test are then limited to the specified nodes.
$ docker run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig -e
NODES_SELECTOR=node-role.kubernetes.io/worker-cnf registry.redhat.io/openshift-kni/cnf-tests /usr/bin/test-run.sh
Using a single performance profile
The resources needed by the DPDK tests are higher than those required by the performance test suite. To make the execution faster, the performance profile used by tests can be overridden using one that also serves the DPDK test suite.
To do this, a profile like the following one can be mounted inside the container, and the performance tests can be instructed to deploy it.
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance
spec:
cpu:
isolated: "4-15"
reserved: "0-3"
hugepages:
defaultHugepagesSize: "1G"
pages:
- size: "1G"
count: 16
node: 0
realTimeKernel:
enabled: true
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
To override the performance profile used, the manifest must be mounted inside the container and the tests must be instructed by setting the PERFORMANCE_PROFILE_MANIFEST_OVERRIDE
parameter as follows:
$ docker run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig -e
PERFORMANCE_PROFILE_MANIFEST_OVERRIDE=/kubeconfig/manifest.yaml registry.redhat.io/openshift-kni/cnf-tests /usr/bin/test-run.sh
Disabling the performance profile cleanup
When not running in discovery mode, the suite cleans up all the created artifacts and configurations. This includes the performance profile.
When deleting the performance profile, the machine config pool is modified and nodes are rebooted. After a new iteration, a new profile is created. This causes long test cycles between runs.
To speed up this process, set CLEAN_PERFORMANCE_PROFILE="false"
to instruct the tests not to clean the performance profile. In this way, the next iteration will not need to create it and wait for it to be applied.
$ docker run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig -e
CLEAN_PERFORMANCE_PROFILE="false" registry.redhat.io/openshift-kni/cnf-tests /usr/bin/test-run.sh
Troubleshooting
The cluster must be reached from within the container. You can verify this by running:
$ docker run -v $(pwd)/:/kubeconfig -e KUBECONFIG=/kubeconfig/kubeconfig
registry.redhat.io/openshift-kni/cnf-tests oc get nodes
If this does not work, it could be caused by spanning across DNS, MTU size, or firewall issues.
Test reports
CNF end-to-end tests produce two outputs: a JUnit test output and a test failure report.
JUnit test output
A JUnit-compliant XML is produced by passing the --junit
parameter together with the path where the report is dumped:
$ docker run -v $(pwd)/:/kubeconfig -v $(pwd)/junitdest:/path/to/junit -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh --junit /path/to/junit
Test failure report
A report with information about the cluster state and resources for troubleshooting can be produced by passing the --report
parameter with the path where the report is dumped:
$ docker run -v $(pwd)/:/kubeconfig -v $(pwd)/reportdest:/path/to/report -e KUBECONFIG=/kubeconfig/kubeconfig registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh --report /path/to/report
A note on podman
When executing podman as non root and non privileged, mounting paths can fail with “permission denied” errors. To make it work, append :Z
to the volumes creation; for example, -v $(pwd)/:/kubeconfig:Z
to allow podman to do the proper SELinux relabeling.
Running on OKD 4.4
With the exception of the following, the CNF end-to-end tests are compatible with OKD 4.4:
[test_id:28466][crit:high][vendor:cnf-qe@redhat.com][level:acceptance] Should contain configuration injected through openshift-node-performance profile
[test_id:28467][crit:high][vendor:cnf-qe@redhat.com][level:acceptance] Should contain configuration injected through the openshift-node-performance profile
You can skip these tests by adding the -ginkgo.skip “28466|28467"
parameter.
Using a single performance profile
The DPDK tests require more resources than what is required by the performance test suite. To make the execution faster, you can override the performance profile used by the tests using a profile that also serves the DPDK test suite.
To do this, use a profile like the following one that can be mounted inside the container, and the performance tests can be instructed to deploy it.
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: performance
spec:
cpu:
isolated: "5-15"
reserved: "0-4"
hugepages:
defaultHugepagesSize: "1G"
pages:
-size: "1G"
count: 16
node: 0
realTimeKernel:
enabled: true
numa:
topologyPolicy: "best-effort"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
To override the performance profile, the manifest must be mounted inside the container and the tests must be instructed by setting the PERFORMANCE_PROFILE_MANIFEST_OVERRIDE
:
$ docker run -v $(pwd)/:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig -e PERFORMANCE_PROFILE_MANIFEST_OVERRIDE=/kubeconfig/manifest.yaml registry.redhat.io/openshift4/cnf-tests-rhel8:v4.7 /usr/bin/test-run.sh
Impacts on the cluster
Depending on the feature, running the test suite could cause different impacts on the cluster. In general, only the SCTP tests do not change the cluster configuration. All of the other features have various impacts on the configuration.
SCTP
SCTP tests just run different pods on different nodes to check connectivity. The impacts on the cluster are related to running simple pods on two nodes.
XT_U32
XT_U32 tests run pods on different nodes to check iptables rule that utilize xt_u32. The impacts on the cluster are related to running simple pods on two nodes.
SR-IOV
SR-IOV tests require changes in the SR-IOV network configuration, where the tests create and destroy different types of configuration.
This might have an impact if existing SR-IOV network configurations are already installed on the cluster, because there may be conflicts depending on the priority of such configurations.
At the same time, the result of the tests might be affected by existing configurations.
PTP
PTP tests apply a PTP configuration to a set of nodes of the cluster. As with SR-IOV, this might conflict with any existing PTP configuration already in place, with unpredictable results.
Performance
Performance tests apply a performance profile to the cluster. The effect of this is changes in the node configuration, reserving CPUs, allocating memory huge pages, and setting the kernel packages to be realtime. If an existing profile named performance
is already available on the cluster, the tests do not deploy it.
DPDK
DPDK relies on both performance and SR-IOV features, so the test suite configures both a performance profile and SR-IOV networks, so the impacts are the same as those described in SR-IOV testing and performance testing.
Cleaning up
After running the test suite, all the dangling resources are cleaned up.
Debugging low latency CNF tuning status
The PerformanceProfile
custom resource (CR) contains status fields for reporting tuning status and debugging latency degradation issues. These fields report on conditions that describe the state of the operator’s reconciliation functionality.
A typical issue can arise when the status of machine config pools that are attached to the performance profile are in a degraded state, causing the PerformanceProfile
status to degrade. In this case, the machine config pool issues a failure message.
The Performance Addon Operator contains the performanceProfile.spec.status.Conditions
status field:
Status:
Conditions:
Last Heartbeat Time: 2020-06-02T10:01:24Z
Last Transition Time: 2020-06-02T10:01:24Z
Status: True
Type: Available
Last Heartbeat Time: 2020-06-02T10:01:24Z
Last Transition Time: 2020-06-02T10:01:24Z
Status: True
Type: Upgradeable
Last Heartbeat Time: 2020-06-02T10:01:24Z
Last Transition Time: 2020-06-02T10:01:24Z
Status: False
Type: Progressing
Last Heartbeat Time: 2020-06-02T10:01:24Z
Last Transition Time: 2020-06-02T10:01:24Z
Status: False
Type: Degraded
The Status
field contains Conditions
that specify Type
values that indicate the status of the performance profile:
Available
All machine configs and Tuned profiles have been created successfully and are available for cluster components are responsible to process them (NTO, MCO, Kubelet).
Upgradeable
Indicates whether the resources maintained by the Operator are in a state that is safe to upgrade.
Progressing
Indicates that the deployment process from the performance profile has started.
Degraded
Indicates an error if:
Validation of the performance profile has failed.
Creation of all relevant components did not complete successfully.
Each of these types contain the following fields:
Status
The state for the specific type (true
or false
).
Timestamp
The transaction timestamp.
Reason string
The machine readable reason.
Message string
The human readable reason describing the state and error details, if any.
Machine config pools
A performance profile and its created products are applied to a node according to an associated machine config pool (MCP). The MCP holds valuable information about the progress of applying the machine configurations created by performance addons that encompass kernel args, kube config, huge pages allocation, and deployment of rt-kernel. The performance addons controller monitors changes in the MCP and updates the performance profile status accordingly.
The only conditions returned by the MCP to the performance profile status is when the MCP is Degraded
, which leads to performaceProfile.status.condition.Degraded = true
.
Example
The following example is for a performance profile with an associated machine config pool (worker-cnf
) that was created for it:
The associated machine config pool is in a degraded state:
# oc get mcp
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-2ee57a93fa6c9181b546ca46e1571d2d True False False 3 3 3 0 2d21h
worker rendered-worker-d6b2bdc07d9f5a59a6b68950acf25e5f True False False 2 2 2 0 2d21h
worker-cnf rendered-worker-cnf-6c838641b8a08fff08dbd8b02fb63f7c False True True 2 1 1 1 2d20h
The
describe
section of the MCP shows the reason:# oc describe mcp worker-cnf
Example output
Message: Node node-worker-cnf is reporting: "prepping update:
machineconfig.machineconfiguration.openshift.io \"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not
found"
Reason: 1 nodes are reporting degraded status on sync
The degraded state should also appear under the performance profile
status
field marked asdegraded = true
:# oc describe performanceprofiles performance
Example output
Message: Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.
Machine config pool worker-cnf Degraded Message: Node yquinn-q8s5v-w-b-z5lqn.c.openshift-gce-devel.internal is
reporting: "prepping update: machineconfig.machineconfiguration.openshift.io
\"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not found". Reason: MCPDegraded
Status: True
Type: Degraded
Collecting low latency tuning debugging data for Red Hat Support
When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.
The must-gather
tool enables you to collect diagnostic information about your OKD cluster, including node tuning, NUMA topology, and other information needed to debug issues with low latency setup.
For prompt support, supply diagnostic information for both OKD and low latency tuning.
About the must-gather tool
The oc adm must-gather
CLI command collects the information from your cluster that is most likely needed for debugging issues, such as:
Resource definitions
Audit logs
Service logs
You can specify one or more images when you run the command by including the --image
argument. When you specify an image, the tool collects data related to that feature or product. When you run oc adm must-gather
, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local
. This directory is created in your current working directory.
About collecting low latency tuning data
Use the oc adm must-gather
CLI command to collect information about your cluster, including features and objects associated with low latency tuning, including:
The Performance Addon Operator namespaces and child objects.
MachineConfigPool
and associatedMachineConfig
objects.The Node Tuning Operator and associated Tuned objects.
Linux Kernel command line options.
CPU and NUMA topology
Basic PCI device information and NUMA locality.
To collect container-native virtualization data with must-gather
, you must specify the container-native virtualization image:
--image=registry.redhat.io/openshift4/performance-addon-operator-must-gather-rhel8.
Gathering data about specific features
You can gather debugging information about specific features by using the oc adm must-gather
CLI command with the --image
or --image-stream
argument. The must-gather
tool supports multiple images, so you can gather data about more than one feature by running a single command.
To collect the default |
Prerequisites
Access to the cluster as a user with the
cluster-admin
role.The OKD CLI (oc) installed.
Procedure
Navigate to the directory where you want to store the
must-gather
data.Run the
oc adm must-gather
command with one or more--image
or--image-stream
arguments. For example, the following command gathers both the default cluster data and information specific to container-native virtualization:$ oc adm must-gather \
--image-stream=openshift/must-gather \ (1)
--image=registry.redhat.io/openshift4/performance-addon-operator-must-gather-rhel8 (2)
1 The default OKD must-gather
image.2 The must-gather
image for low latency tuning diagnostics.Create a compressed file from the
must-gather
directory that was created in your working directory. For example, on a computer that uses a Linux operating system, run the following command:$ tar cvaf must-gather.tar.gz must-gather.local.5421342344627712289/ (1)
1 Replace must-gather-local.5421342344627712289/
with the actual directory name.Attach the compressed file to your support case on the Red Hat Customer Portal.
Additional resources
For more information about MachineConfig and KubeletConfig, see Managing nodes.
For more information about the Node Tuning Operator, see Using the Node Tuning Operator.
For more information about the PerformanceProfile, see Configuring huge pages.
For more information about consuming huge pages from your containers, see How huge pages are consumed by apps.