Recommended single-node OpenShift cluster configuration for vDU application workloads

Use the following reference information to understand the single-node OpenShift configurations required to deploy virtual distributed unit (vDU) applications in the cluster. Configurations include cluster optimizations for high performance workloads, enabling workload partitioning, and minimizing the number of reboots required postinstallation.

Additional resources

Running low latency applications on OKD

OKD enables low latency processing for applications running on commercial off-the-shelf (COTS) hardware by using several technologies and specialized hardware devices:

Real-time kernel for RHCOS

Ensures workloads are handled with a high degree of process determinism.

CPU isolation

Avoids CPU scheduling delays and ensures CPU capacity is available consistently.

NUMA-aware topology management

Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the non-uniform memory access (NUMA) node. Pod resources for all Quality of Service (QoS) classes stay on the same NUMA node. This decreases latency and improves performance of the node.

Huge pages memory management

Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.

Precision timing synchronization using PTP

Allows synchronization between nodes in the network with sub-microsecond accuracy.

Running vDU application workloads requires a bare-metal host with sufficient resources to run OKD services and production workloads.

Table 1. Minimum resource requirements
ProfilevCPUMemoryStorage

Minimum

4 to 8 vCPU cores

32GB of RAM

120GB

One vCPU is equivalent to one physical core when simultaneous multithreading (SMT), or Hyper-Threading, is not enabled. When enabled, use the following formula to calculate the corresponding ratio:

  • (threads per core × cores) × sockets = vCPUs

The server must have a Baseboard Management Controller (BMC) when booting with virtual media.

Configuring host firmware for low latency and high performance

Bare-metal hosts require the firmware to be configured before the host can be provisioned. The firmware configuration is dependent on the specific hardware and the particular requirements of your installation.

Procedure

  1. Set the UEFI/BIOS Boot Mode to UEFI.

  2. In the host boot sequence order, set Hard drive first.

  3. Apply the specific firmware configuration for your hardware. The following table describes a representative firmware configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.

    The exact firmware configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.

    Table 2. Sample firmware configuration for an Intel Xeon Skylake or Cascade Lake server
    Firmware settingConfiguration

    CPU Power and Performance Policy

    Performance

    Uncore Frequency Scaling

    Disabled

    Performance P-limit

    Disabled

    Enhanced Intel SpeedStep ® Tech

    Enabled

    Intel Configurable TDP

    Enabled

    Configurable TDP Level

    Level 2

    Intel® Turbo Boost Technology

    Enabled

    Energy Efficient Turbo

    Disabled

    Hardware P-States

    Disabled

    Package C-State

    C0/C1 state

    C1E

    Disabled

    Processor C6

    Disabled

Enable global SR-IOV and VT-d settings in the firmware for the host. These settings are relevant to bare-metal environments.

Connectivity prerequisites for managed cluster networks

Before you can install and provision a managed cluster with the GitOps Zero Touch Provisioning (ZTP) pipeline, the managed cluster host must meet the following networking prerequisites:

  • There must be bi-directional connectivity between the GitOps ZTP container in the hub cluster and the Baseboard Management Controller (BMC) of the target bare-metal host.

  • The managed cluster must be able to resolve and reach the API hostname of the hub hostname and *.apps hostname. Here is an example of the API hostname of the hub and *.apps hostname:

    • api.hub-cluster.internal.domain.com

    • console-openshift-console.apps.hub-cluster.internal.domain.com

  • The hub cluster must be able to resolve and reach the API and *.apps hostname of the managed cluster. Here is an example of the API hostname of the managed cluster and *.apps hostname:

    • api.sno-managed-cluster-1.internal.domain.com

    • console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com

Workload partitioning in single-node OpenShift with GitOps ZTP

Workload partitioning configures OKD services, cluster management workloads, and infrastructure pods to run on a reserved number of host CPUs.

To configure workload partitioning with GitOps Zero Touch Provisioning (ZTP), you configure a cpuPartitioningMode field in the SiteConfig custom resource (CR) that you use to install the cluster and you apply a PerformanceProfile CR that configures the isolated and reserved CPUs on the host.

Configuring the SiteConfig CR enables workload partitioning at cluster installation time and applying the PerformanceProfile CR configures the specific allocation of CPUs to reserved and isolated sets. Both of these steps happen at different points during cluster provisioning.

Configuring workload partitioning by using the cpuPartitioningMode field in the SiteConfig CR is a Tech Preview feature in OKD 4.13.

Alternatively, you can specify cluster management CPU resources with the cpuset field of the SiteConfig custom resource (CR) and the reserved field of the group PolicyGenTemplate CR. The GitOps ZTP pipeline uses these values to populate the required fields in the workload partitioning MachineConfig CR (cpuset) and the PerformanceProfile CR (reserved) that configure the single-node OpenShift cluster. This method is a General Availability feature in OKD 4.14.

The workload partitioning configuration pins the OKD infrastructure pods to the reserved CPU set. Platform services such as systemd, CRI-O, and kubelet run on the reserved CPU set. The isolated CPU sets are exclusively allocated to your container workloads. Isolating CPUs ensures that the workload has guaranteed access to the specified CPUs without contention from other applications running on the same node. All CPUs that are not isolated should be reserved.

Ensure that reserved and isolated CPU sets do not overlap with each other.

Additional resources

  • For the recommended single-node OpenShift workload partitioning configuration, see Workload partitioning.

The ZTP pipeline applies the following custom resources (CRs) during cluster installation. These configuration CRs ensure that the cluster meets the feature and performance requirements necessary for running a vDU application.

When using the GitOps ZTP plugin and SiteConfig CRs for cluster deployment, the following MachineConfig CRs are included by default.

Use the SiteConfig extraManifests filter to alter the CRs that are included by default. For more information, see Advanced managed cluster configuration with SiteConfig CRs.

Workload partitioning

Single-node OpenShift clusters that run DU workloads require workload partitioning. This limits the cores allowed to run platform services, maximizing the CPU core for application payloads.

Workload partitioning can be enabled during cluster installation only. You cannot disable workload partitioning postinstallation. You can however change the set of CPUs assigned to the isolated and reserved sets through the PerformanceProfile CR. Changes to CPU settings cause the node to reboot.

Upgrading from OKD 4.12 to 4.13+

When transitioning to using cpuPartitioningMode for enabling workload partitioning, remove the workload partitioning MachineConfig CRs from the /extra-manifest folder that you use to provision the cluster.

Recommended SiteConfig CR configuration for workload partitioning

  1. apiVersion: ran.openshift.io/v1
  2. kind: SiteConfig
  3. metadata:
  4. name: "<site_name>"
  5. namespace: "<site_name>"
  6. spec:
  7. baseDomain: "example.com"
  8. cpuPartitioningMode: AllNodes (1)
1Set the cpuPartitioningMode field to AllNodes to configure workload partitioning for all nodes in the cluster.

Verification

Check that the applications and cluster system CPU pinning is correct. Run the following commands:

  1. Open a remote shell prompt to the managed cluster:

    1. $ oc debug node/example-sno-1
  2. Check that the user applications CPU pinning is correct:

    1. sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done

    Example output

    1. pid 8481's current affinity list: 0-3
    2. pid 8726's current affinity list: 0-3
    3. pid 9088's current affinity list: 0-3
    4. pid 9945's current affinity list: 0-3
    5. pid 10387's current affinity list: 0-3
    6. pid 12123's current affinity list: 0-3
    7. pid 13313's current affinity list: 0-3
  3. Check that the system applications CPU pinning is correct:

    1. sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done

    Example output

    1. pid 1's current affinity list: 0-3
    2. pid 938's current affinity list: 0-3
    3. pid 962's current affinity list: 0-3
    4. pid 1197's current affinity list: 0-3

Reduced platform management footprint

To reduce the overall management footprint of the platform, a MachineConfig custom resource (CR) is required that places all Kubernetes-specific mount points in a new namespace separate from the host operating system. The following base64-encoded example MachineConfig CR illustrates this configuration.

Recommended container mount namespace configuration (01-container-mount-ns-and-kubelet-conf-master.yaml)

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: master
  8. name: container-mount-namespace-and-kubelet-conf-master
  9. spec:
  10. config:
  11. ignition:
  12. version: 3.2.0
  13. storage:
  14. files:
  15. - contents:
  16. source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo=
  17. mode: 493
  18. path: /usr/local/bin/extractExecStart
  19. - contents:
  20. source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo=
  21. mode: 493
  22. path: /usr/local/bin/nsenterCmns
  23. systemd:
  24. units:
  25. - contents: |
  26. [Unit]
  27. Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts
  28. [Service]
  29. Type=oneshot
  30. RemainAfterExit=yes
  31. RuntimeDirectory=container-mount-namespace
  32. Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace
  33. Environment=BIND_POINT=%t/container-mount-namespace/mnt
  34. ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}"
  35. ExecStartPre=touch ${BIND_POINT}
  36. ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared /
  37. ExecStop=umount -R ${RUNTIME_DIRECTORY}
  38. name: container-mount-namespace.service
  39. - dropins:
  40. - contents: |
  41. [Unit]
  42. Wants=container-mount-namespace.service
  43. After=container-mount-namespace.service
  44. [Service]
  45. ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
  46. EnvironmentFile=-/%t/%N-execstart.env
  47. ExecStart=
  48. ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
  49. ${ORIG_EXECSTART}"
  50. name: 90-container-mount-namespace.conf
  51. name: crio.service
  52. - dropins:
  53. - contents: |
  54. [Unit]
  55. Wants=container-mount-namespace.service
  56. After=container-mount-namespace.service
  57. [Service]
  58. ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
  59. EnvironmentFile=-/%t/%N-execstart.env
  60. ExecStart=
  61. ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
  62. ${ORIG_EXECSTART} --housekeeping-interval=30s"
  63. name: 90-container-mount-namespace.conf
  64. - contents: |
  65. [Service]
  66. Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"
  67. Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s"
  68. name: 30-kubelet-interval-tuning.conf
  69. name: kubelet.service

SCTP

Stream Control Transmission Protocol (SCTP) is a key protocol used in RAN applications. This MachineConfig object adds the SCTP kernel module to the node to enable this protocol.

Recommended SCTP configuration (03-sctp-machine-config-master.yaml)

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: master
  8. name: load-sctp-module-master
  9. spec:
  10. config:
  11. ignition:
  12. version: 2.2.0
  13. storage:
  14. files:
  15. - contents:
  16. source: data:,
  17. verification: {}
  18. filesystem: root
  19. mode: 420
  20. path: /etc/modprobe.d/sctp-blacklist.conf
  21. - contents:
  22. source: data:text/plain;charset=utf-8,sctp
  23. filesystem: root
  24. mode: 420
  25. path: /etc/modules-load.d/sctp-load.conf

Accelerated container startup

The following MachineConfig CR configures core OpenShift processes and containers to use all available CPU cores during system startup and shutdown. This accelerates the system recovery during initial boot and reboots.

Recommended accelerated container startup configuration (04-accelerated-container-startup-master.yaml)

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: master
  8. name: 04-accelerated-container-startup-master
  9. spec:
  10. config:
  11. ignition:
  12. version: 3.2.0
  13. storage:
  14. files:
  15. - contents:
  16. source: data:text/plain;charset=utf-8;base64,#!/bin/bash
#
# Temporarily reset the core system processes's CPU affinity to be unrestricted to accelerate startup and shutdown
#
# The defaults below can be overridden via environment variables
#

# The default set of critical processes whose affinity should be temporarily unbound:
CRITICAL_PROCESSES=${CRITICAL_PROCESSES:-"crio kubelet NetworkManager conmon dbus"}

# Default wait time is 600s = 10m:
MAXIMUM_WAIT_TIME=${MAXIMUM_WAIT_TIME:-600}

# Default steady-state threshold = 2%
# Allowed values:
#  4  - absolute pod count (+/-)
#  4% - percent change (+/-)
#  -1 - disable the steady-state check
STEADY_STATE_THRESHOLD=${STEADY_STATE_THRESHOLD:-2%}

# Default steady-state window = 60s
# If the running pod count stays within the given threshold for this time
# period, return CPU utilization to normal before the maximum wait time has
# expires
STEADY_STATE_WINDOW=${STEADY_STATE_WINDOW:-60}

# Default steady-state allows any pod count to be "steady state"
# Increasing this will skip any steady-state checks until the count rises above
# this number to avoid false positives if there are some periods where the
# count doesn't increase but we know we can't be at steady-state yet.
STEADY_STATE_MINIMUM=${STEADY_STATE_MINIMUM:-0}

#######################################################

KUBELET_CPU_STATE=/var/lib/kubelet/cpu_manager_state
FULL_CPU_STATE=/sys/fs/cgroup/cpuset/cpuset.cpus
KUBELET_CONF=/etc/kubernetes/kubelet.conf
unrestrictedCpuset() {
  local cpus
  if [[ -e $KUBELET_CPU_STATE ]]; then
    cpus=$(jq -r '.defaultCpuSet' <$KUBELET_CPU_STATE)
    if [[ -n "${cpus}" && -e ${KUBELET_CONF} ]]; then
      reserved_cpus=$(jq -r '.reservedSystemCPUs' </etc/kubernetes/kubelet.conf)
      if [[ -n "${reserved_cpus}" ]]; then
        # Use taskset to merge the two cpusets
        cpus=$(taskset -c "${reserved_cpus},${cpus}" grep -i Cpus_allowed_list /proc/self/status | awk '{print $2}')
      fi
    fi
  fi
  if [[ -z $cpus ]]; then
    # fall back to using all cpus if the kubelet state is not configured yet
    [[ -e $FULL_CPU_STATE ]] || return 1
    cpus=$(<$FULL_CPU_STATE)
  fi
  echo $cpus
}

restrictedCpuset() {
  for arg in $(</proc/cmdline); do
    if [[ $arg =~ ^systemd.cpu_affinity= ]]; then
      echo ${arg#*=}
      return 0
    fi
  done
  return 1
}

resetAffinity() {
  local cpuset="$1"
  local failcount=0
  local successcount=0
  logger "Recovery: Setting CPU affinity for critical processes \"$CRITICAL_PROCESSES\" to $cpuset"
  for proc in $CRITICAL_PROCESSES; do
    local pids="$(pgrep $proc)"
    for pid in $pids; do
      local tasksetOutput
      tasksetOutput="$(taskset -apc "$cpuset" $pid 2>&1)"
      if [[ $? -ne 0 ]]; then
        echo "ERROR: $tasksetOutput"
        ((failcount++))
      else
        ((successcount++))
      fi
    done
  done

  logger "Recovery: Re-affined $successcount pids successfully"
  if [[ $failcount -gt 0 ]]; then
    logger "Recovery: Failed to re-affine $failcount processes"
    return 1
  fi
}

setUnrestricted() {
  logger "Recovery: Setting critical system processes to have unrestricted CPU access"
  resetAffinity "$(unrestrictedCpuset)"
}

setRestricted() {
  logger "Recovery: Resetting critical system processes back to normally restricted access"
  resetAffinity "$(restrictedCpuset)"
}

currentAffinity() {
  local pid="$1"
  taskset -pc $pid | awk -F': ' '{print $2}'
}

within() {
  local last=$1 current=$2 threshold=$3
  local delta=0 pchange
  delta=$(( current - last ))
  if [[ $current -eq $last ]]; then
    pchange=0
  elif [[ $last -eq 0 ]]; then
    pchange=1000000
  else
    pchange=$(( ( $delta * 100) / last ))
  fi
  echo -n "last:$last current:$current delta:$delta pchange:${pchange}%: "
  local absolute limit
  case $threshold in
    *%)
      absolute=${pchange##-} # absolute value
      limit=${threshold%%%}
      ;;
    *)
      absolute=${delta##-} # absolute value
      limit=$threshold
      ;;
  esac
  if [[ $absolute -le $limit ]]; then
    echo "within (+/-)$threshold"
    return 0
  else
    echo "outside (+/-)$threshold"
    return 1
  fi
}

steadystate() {
  local last=$1 current=$2
  if [[ $last -lt $STEADY_STATE_MINIMUM ]]; then
    echo "last:$last current:$current Waiting to reach $STEADY_STATE_MINIMUM before checking for steady-state"
    return 1
  fi
  within $last $current $STEADY_STATE_THRESHOLD
}

waitForReady() {
  logger "Recovery: Waiting ${MAXIMUM_WAIT_TIME}s for the initialization to complete"
  local lastSystemdCpuset="$(currentAffinity 1)"
  local lastDesiredCpuset="$(unrestrictedCpuset)"
  local t=0 s=10
  local lastCcount=0 ccount=0 steadyStateTime=0
  while [[ $t -lt $MAXIMUM_WAIT_TIME ]]; do
    sleep $s
    ((t += s))
    # Re-check the current affinity of systemd, in case some other process has changed it
    local systemdCpuset="$(currentAffinity 1)"
    # Re-check the unrestricted Cpuset, as the allowed set of unreserved cores may change as pods are assigned to cores
    local desiredCpuset="$(unrestrictedCpuset)"
    if [[ $systemdCpuset != $lastSystemdCpuset || $lastDesiredCpuset != $desiredCpuset ]]; then
      resetAffinity "$desiredCpuset"
      lastSystemdCpuset="$(currentAffinity 1)"
      lastDesiredCpuset="$desiredCpuset"
    fi

    # Detect steady-state pod count
    ccount=$(crictl ps | wc -l)
    if steadystate $lastCcount $ccount; then
      ((steadyStateTime += s))
      echo "Steady-state for ${steadyStateTime}s/${STEADY_STATE_WINDOW}s"
      if [[ $steadyStateTime -ge $STEADY_STATE_WINDOW ]]; then
        logger "Recovery: Steady-state (+/- $STEADY_STATE_THRESHOLD) for ${STEADY_STATE_WINDOW}s: Done"
        return 0
      fi
    else
      if [[ $steadyStateTime -gt 0 ]]; then
        echo "Resetting steady-state timer"
        steadyStateTime=0
      fi
    fi
    lastCcount=$ccount
  done
  logger "Recovery: Recovery Complete Timeout"
}

main() {
  if ! unrestrictedCpuset >&/dev/null; then
    logger "Recovery: No unrestricted Cpuset could be detected"
    return 1
  fi

  if ! restrictedCpuset >&/dev/null; then
    logger "Recovery: No restricted Cpuset has been configured.  We are already running unrestricted."
    return 0
  fi

  # Ensure we reset the CPU affinity when we exit this script for any reason
  # This way either after the timer expires or after the process is interrupted
  # via ^C or SIGTERM, we return things back to the way they should be.
  trap setRestricted EXIT

  logger "Recovery: Recovery Mode Starting"
  setUnrestricted
  waitForReady
}

if [[ "${BASH_SOURCE[0]}" = "${0}" ]]; then
  main "${@}"
  exit $?
fi

  17. mode: 493
  18. path: /usr/local/bin/accelerated-container-startup.sh
  19. systemd:
  20. units:
  21. - contents: |
  22. [Unit]
  23. Description=Unlocks more CPUs for critical system processes during container startup
  24. [Service]
  25. Type=simple
  26. ExecStart=/usr/local/bin/accelerated-container-startup.sh
  27. # Maximum wait time is 600s = 10m:
  28. Environment=MAXIMUM_WAIT_TIME=600
  29. # Steady-state threshold = 2%
  30. # Allowed values:
  31. # 4 - absolute pod count (+/-)
  32. # 4% - percent change (+/-)
  33. # -1 - disable the steady-state check
  34. # Note: '%' must be escaped as '%%' in systemd unit files
  35. Environment=STEADY_STATE_THRESHOLD=2%%
  36. # Steady-state window = 120s
  37. # If the running pod count stays within the given threshold for this time
  38. # period, return CPU utilization to normal before the maximum wait time has
  39. # expires
  40. Environment=STEADY_STATE_WINDOW=120
  41. # Steady-state minimum = 40
  42. # Increasing this will skip any steady-state checks until the count rises above
  43. # this number to avoid false positives if there are some periods where the
  44. # count doesn't increase but we know we can't be at steady-state yet.
  45. Environment=STEADY_STATE_MINIMUM=40
  46. [Install]
  47. WantedBy=multi-user.target
  48. enabled: true
  49. name: accelerated-container-startup.service
  50. - contents: |
  51. [Unit]
  52. Description=Unlocks more CPUs for critical system processes during container shutdown
  53. DefaultDependencies=no
  54. [Service]
  55. Type=simple
  56. ExecStart=/usr/local/bin/accelerated-container-startup.sh
  57. # Maximum wait time is 600s = 10m:
  58. Environment=MAXIMUM_WAIT_TIME=600
  59. # Steady-state threshold
  60. # Allowed values:
  61. # 4 - absolute pod count (+/-)
  62. # 4% - percent change (+/-)
  63. # -1 - disable the steady-state check
  64. # Note: '%' must be escaped as '%%' in systemd unit files
  65. Environment=STEADY_STATE_THRESHOLD=-1
  66. # Steady-state window = 60s
  67. # If the running pod count stays within the given threshold for this time
  68. # period, return CPU utilization to normal before the maximum wait time has
  69. # expires
  70. Environment=STEADY_STATE_WINDOW=60
  71. [Install]
  72. WantedBy=shutdown.target reboot.target halt.target
  73. enabled: true
  74. name: accelerated-container-shutdown.service

Setting rcu_normal

The following MachineConfig CR configures the system to set rcu_normal to 1 after the system has finished startup. This improves kernel latency for vDU applications.

Recommended configuration for disabling rcu_expedited after the node has finished startup in the 08-set-rcu-normal-master.yaml file

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: master
  8. name: 08-set-rcu-normal-master
  9. spec:
  10. config:
  11. ignition:
  12. version: 3.2.0
  13. storage:
  14. files:
  15. - contents:
  16. source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIwojIERpc2FibGUgcmN1X2V4cGVkaXRlZCBhZnRlciBub2RlIGhhcyBmaW5pc2hlZCBib290aW5nCiMKIyBUaGUgZGVmYXVsdHMgYmVsb3cgY2FuIGJlIG92ZXJyaWRkZW4gdmlhIGVudmlyb25tZW50IHZhcmlhYmxlcwojCgojIERlZmF1bHQgd2FpdCB0aW1lIGlzIDYwMHMgPSAxMG06Ck1BWElNVU1fV0FJVF9USU1FPSR7TUFYSU1VTV9XQUlUX1RJTUU6LTYwMH0KCiMgRGVmYXVsdCBzdGVhZHktc3RhdGUgdGhyZXNob2xkID0gMiUKIyBBbGxvd2VkIHZhbHVlczoKIyAgNCAgLSBhYnNvbHV0ZSBwb2QgY291bnQgKCsvLSkKIyAgNCUgLSBwZXJjZW50IGNoYW5nZSAoKy8tKQojICAtMSAtIGRpc2FibGUgdGhlIHN0ZWFkeS1zdGF0ZSBjaGVjawpTVEVBRFlfU1RBVEVfVEhSRVNIT0xEPSR7U1RFQURZX1NUQVRFX1RIUkVTSE9MRDotMiV9CgojIERlZmF1bHQgc3RlYWR5LXN0YXRlIHdpbmRvdyA9IDYwcwojIElmIHRoZSBydW5uaW5nIHBvZCBjb3VudCBzdGF5cyB3aXRoaW4gdGhlIGdpdmVuIHRocmVzaG9sZCBmb3IgdGhpcyB0aW1lCiMgcGVyaW9kLCByZXR1cm4gQ1BVIHV0aWxpemF0aW9uIHRvIG5vcm1hbCBiZWZvcmUgdGhlIG1heGltdW0gd2FpdCB0aW1lIGhhcwojIGV4cGlyZXMKU1RFQURZX1NUQVRFX1dJTkRPVz0ke1NURUFEWV9TVEFURV9XSU5ET1c6LTYwfQoKIyBEZWZhdWx0IHN0ZWFkeS1zdGF0ZSBhbGxvd3MgYW55IHBvZCBjb3VudCB0byBiZSAic3RlYWR5IHN0YXRlIgojIEluY3JlYXNpbmcgdGhpcyB3aWxsIHNraXAgYW55IHN0ZWFkeS1zdGF0ZSBjaGVja3MgdW50aWwgdGhlIGNvdW50IHJpc2VzIGFib3ZlCiMgdGhpcyBudW1iZXIgdG8gYXZvaWQgZmFsc2UgcG9zaXRpdmVzIGlmIHRoZXJlIGFyZSBzb21lIHBlcmlvZHMgd2hlcmUgdGhlCiMgY291bnQgZG9lc24ndCBpbmNyZWFzZSBidXQgd2Uga25vdyB3ZSBjYW4ndCBiZSBhdCBzdGVhZHktc3RhdGUgeWV0LgpTVEVBRFlfU1RBVEVfTUlOSU1VTT0ke1NURUFEWV9TVEFURV9NSU5JTVVNOi0wfQoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKd2l0aGluKCkgewogIGxvY2FsIGxhc3Q9JDEgY3VycmVudD0kMiB0aHJlc2hvbGQ9JDMKICBsb2NhbCBkZWx0YT0wIHBjaGFuZ2UKICBkZWx0YT0kKCggY3VycmVudCAtIGxhc3QgKSkKICBpZiBbWyAkY3VycmVudCAtZXEgJGxhc3QgXV07IHRoZW4KICAgIHBjaGFuZ2U9MAogIGVsaWYgW1sgJGxhc3QgLWVxIDAgXV07IHRoZW4KICAgIHBjaGFuZ2U9MTAwMDAwMAogIGVsc2UKICAgIHBjaGFuZ2U9JCgoICggIiRkZWx0YSIgKiAxMDApIC8gbGFzdCApKQogIGZpCiAgZWNobyAtbiAibGFzdDokbGFzdCBjdXJyZW50OiRjdXJyZW50IGRlbHRhOiRkZWx0YSBwY2hhbmdlOiR7cGNoYW5nZX0lOiAiCiAgbG9jYWwgYWJzb2x1dGUgbGltaXQKICBjYXNlICR0aHJlc2hvbGQgaW4KICAgIColKQogICAgICBhYnNvbHV0ZT0ke3BjaGFuZ2UjIy19ICMgYWJzb2x1dGUgdmFsdWUKICAgICAgbGltaXQ9JHt0aHJlc2hvbGQlJSV9CiAgICAgIDs7CiAgICAqKQogICAgICBhYnNvbHV0ZT0ke2RlbHRhIyMtfSAjIGFic29sdXRlIHZhbHVlCiAgICAgIGxpbWl0PSR0aHJlc2hvbGQKICAgICAgOzsKICBlc2FjCiAgaWYgW1sgJGFic29sdXRlIC1sZSAkbGltaXQgXV07IHRoZW4KICAgIGVjaG8gIndpdGhpbiAoKy8tKSR0aHJlc2hvbGQiCiAgICByZXR1cm4gMAogIGVsc2UKICAgIGVjaG8gIm91dHNpZGUgKCsvLSkkdGhyZXNob2xkIgogICAgcmV0dXJuIDEKICBmaQp9CgpzdGVhZHlzdGF0ZSgpIHsKICBsb2NhbCBsYXN0PSQxIGN1cnJlbnQ9JDIKICBpZiBbWyAkbGFzdCAtbHQgJFNURUFEWV9TVEFURV9NSU5JTVVNIF1dOyB0aGVuCiAgICBlY2hvICJsYXN0OiRsYXN0IGN1cnJlbnQ6JGN1cnJlbnQgV2FpdGluZyB0byByZWFjaCAkU1RFQURZX1NUQVRFX01JTklNVU0gYmVmb3JlIGNoZWNraW5nIGZvciBzdGVhZHktc3RhdGUiCiAgICByZXR1cm4gMQogIGZpCiAgd2l0aGluICIkbGFzdCIgIiRjdXJyZW50IiAiJFNURUFEWV9TVEFURV9USFJFU0hPTEQiCn0KCndhaXRGb3JSZWFkeSgpIHsKICBsb2dnZXIgIlJlY292ZXJ5OiBXYWl0aW5nICR7TUFYSU1VTV9XQUlUX1RJTUV9cyBmb3IgdGhlIGluaXRpYWxpemF0aW9uIHRvIGNvbXBsZXRlIgogIGxvY2FsIHQ9MCBzPTEwCiAgbG9jYWwgbGFzdENjb3VudD0wIGNjb3VudD0wIHN0ZWFkeVN0YXRlVGltZT0wCiAgd2hpbGUgW1sgJHQgLWx0ICRNQVhJTVVNX1dBSVRfVElNRSBdXTsgZG8KICAgIHNsZWVwICRzCiAgICAoKHQgKz0gcykpCiAgICAjIERldGVjdCBzdGVhZHktc3RhdGUgcG9kIGNvdW50CiAgICBjY291bnQ9JChjcmljdGwgcHMgMj4vZGV2L251bGwgfCB3YyAtbCkKICAgIGlmIFtbICRjY291bnQgLWd0IDAgXV0gJiYgc3RlYWR5c3RhdGUgIiRsYXN0Q2NvdW50IiAiJGNjb3VudCI7IHRoZW4KICAgICAgKChzdGVhZHlTdGF0ZVRpbWUgKz0gcykpCiAgICAgIGVjaG8gIlN0ZWFkeS1zdGF0ZSBmb3IgJHtzdGVhZHlTdGF0ZVRpbWV9cy8ke1NURUFEWV9TVEFURV9XSU5ET1d9cyIKICAgICAgaWYgW1sgJHN0ZWFkeVN0YXRlVGltZSAtZ2UgJFNURUFEWV9TVEFURV9XSU5ET1cgXV07IHRoZW4KICAgICAgICBsb2dnZXIgIlJlY292ZXJ5OiBTdGVhZHktc3RhdGUgKCsvLSAkU1RFQURZX1NUQVRFX1RIUkVTSE9MRCkgZm9yICR7U1RFQURZX1NUQVRFX1dJTkRPV31zOiBEb25lIgogICAgICAgIHJldHVybiAwCiAgICAgIGZpCiAgICBlbHNlCiAgICAgIGlmIFtbICRzdGVhZHlTdGF0ZVRpbWUgLWd0IDAgXV07IHRoZW4KICAgICAgICBlY2hvICJSZXNldHRpbmcgc3RlYWR5LXN0YXRlIHRpbWVyIgogICAgICAgIHN0ZWFkeVN0YXRlVGltZT0wCiAgICAgIGZpCiAgICBmaQogICAgbGFzdENjb3VudD0kY2NvdW50CiAgZG9uZQogIGxvZ2dlciAiUmVjb3Zlcnk6IFJlY292ZXJ5IENvbXBsZXRlIFRpbWVvdXQiCn0KCnNldFJjdU5vcm1hbCgpIHsKICBlY2hvICJTZXR0aW5nIHJjdV9ub3JtYWwgdG8gMSIKICBlY2hvIDEgPiAvc3lzL2tlcm5lbC9yY3Vfbm9ybWFsCn0KCm1haW4oKSB7CiAgd2FpdEZvclJlYWR5CiAgZWNobyAiV2FpdGluZyBmb3Igc3RlYWR5IHN0YXRlIHRvb2s6ICQoYXdrICd7cHJpbnQgaW50KCQxLzM2MDApImgiLCBpbnQoKCQxJTM2MDApLzYwKSJtIiwgaW50KCQxJTYwKSJzIn0nIC9wcm9jL3VwdGltZSkiCiAgc2V0UmN1Tm9ybWFsCn0KCmlmIFtbICIke0JBU0hfU09VUkNFWzBdfSIgPSAiJHswfSIgXV07IHRoZW4KICBtYWluICIke0B9IgogIGV4aXQgJD8KZmkK
  17. mode: 493
  18. path: /usr/local/bin/set-rcu-normal.sh
  19. systemd:
  20. units:
  21. - contents: |
  22. [Unit]
  23. Description=Disable rcu_expedited after node has finished booting by setting rcu_normal to 1
  24. [Service]
  25. Type=simple
  26. ExecStart=/usr/local/bin/set-rcu-normal.sh
  27. # Maximum wait time is 600s = 10m:
  28. Environment=MAXIMUM_WAIT_TIME=600
  29. # Steady-state threshold = 2%
  30. # Allowed values:
  31. # 4 - absolute pod count (+/-)
  32. # 4% - percent change (+/-)
  33. # -1 - disable the steady-state check
  34. # Note: '%' must be escaped as '%%' in systemd unit files
  35. Environment=STEADY_STATE_THRESHOLD=2%%
  36. # Steady-state window = 120s
  37. # If the running pod count stays within the given threshold for this time
  38. # period, return CPU utilization to normal before the maximum wait time has
  39. # expires
  40. Environment=STEADY_STATE_WINDOW=120
  41. # Steady-state minimum = 40
  42. # Increasing this will skip any steady-state checks until the count rises above
  43. # this number to avoid false positives if there are some periods where the
  44. # count doesn't increase but we know we can't be at steady-state yet.
  45. Environment=STEADY_STATE_MINIMUM=40
  46. [Install]
  47. WantedBy=multi-user.target
  48. enabled: true
  49. name: set-rcu-normal.service

Automatic kernel crash dumps with kdump

kdump is a Linux kernel feature that creates a kernel crash dump when the kernel crashes. kdump is enabled with the following MachineConfig CRs.

Recommended MachineConfig to remove ice driver (05-kdump-config-master.yaml)

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: master
  8. name: 05-kdump-config-master
  9. spec:
  10. config:
  11. ignition:
  12. version: 3.2.0
  13. systemd:
  14. units:
  15. - enabled: true
  16. name: kdump-remove-ice-module.service
  17. contents: |
  18. [Unit]
  19. Description=Remove ice module when doing kdump
  20. Before=kdump.service
  21. [Service]
  22. Type=oneshot
  23. RemainAfterExit=true
  24. ExecStart=/usr/local/bin/kdump-remove-ice-module.sh
  25. [Install]
  26. WantedBy=multi-user.target
  27. storage:
  28. files:
  29. - contents:
  30. source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo=
  31. mode: 448
  32. path: /usr/local/bin/kdump-remove-ice-module.sh

Recommended kdump configuration (06-kdump-master.yaml)

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: master
  8. name: 06-kdump-enable-master
  9. spec:
  10. config:
  11. ignition:
  12. version: 3.2.0
  13. systemd:
  14. units:
  15. - enabled: true
  16. name: kdump.service
  17. kernelArguments:
  18. - crashkernel=512M

Disable automatic CRI-O cache wipe

After an uncontrolled host shutdown or cluster reboot, CRI-O automatically deletes the entire CRI-O cache, causing all images to be pulled from the registry when the node reboots. This can result in unacceptably slow recovery times or recovery failures. To prevent this from happening in single-node OpenShift clusters that you install with GitOps ZTP, disable the CRI-O delete cache feature during cluster installation.

Recommended MachineConfig CR to disable CRI-O cache wipe on control plane nodes (99-crio-disable-wipe-master.yaml)

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: master
  8. name: 99-crio-disable-wipe-master
  9. spec:
  10. config:
  11. ignition:
  12. version: 3.2.0
  13. storage:
  14. files:
  15. - contents:
  16. source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
  17. mode: 420
  18. path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml

Recommended MachineConfig CR to disable CRI-O cache wipe on worker nodes (99-crio-disable-wipe-worker.yaml)

  1. # Automatically generated by extra-manifests-builder
  2. # Do not make changes directly.
  3. apiVersion: machineconfiguration.openshift.io/v1
  4. kind: MachineConfig
  5. metadata:
  6. labels:
  7. machineconfiguration.openshift.io/role: worker
  8. name: 99-crio-disable-wipe-worker
  9. spec:
  10. config:
  11. ignition:
  12. version: 3.2.0
  13. storage:
  14. files:
  15. - contents:
  16. source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
  17. mode: 420
  18. path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml

Configuring crun as the default container runtime

The following ContainerRuntimeConfig custom resources (CRs) configure crun as the default OCI container runtime for control plane and worker nodes. The crun container runtime is fast and lightweight and has a low memory footprint.

For optimal performance, enable crun for control plane and worker nodes in single-node OpenShift, three-node OpenShift, and standard clusters. To avoid the cluster rebooting when the CR is applied, apply the change as a GitOps ZTP additional Day 0 install-time manifest.

Recommended ContainerRuntimeConfig CR for control plane nodes (enable-crun-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: ContainerRuntimeConfig
  3. metadata:
  4. name: enable-crun-master
  5. spec:
  6. machineConfigPoolSelector:
  7. matchLabels:
  8. pools.operator.machineconfiguration.openshift.io/master: ""
  9. containerRuntimeConfig:
  10. defaultRuntime: crun

Recommended ContainerRuntimeConfig CR for worker nodes (enable-crun-worker.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: ContainerRuntimeConfig
  3. metadata:
  4. name: enable-crun-worker
  5. spec:
  6. machineConfigPoolSelector:
  7. matchLabels:
  8. pools.operator.machineconfiguration.openshift.io/worker: ""
  9. containerRuntimeConfig:
  10. defaultRuntime: crun

When the cluster installation is complete, the ZTP pipeline applies the following custom resources (CRs) that are required to run DU workloads.

In GitOps ZTP v4.10 and earlier, you configure UEFI secure boot with a MachineConfig CR. This is no longer required in GitOps ZTP v4.11 and later. In v4.11, you configure UEFI secure boot for single-node OpenShift clusters by updating the spec.clusters.nodes.bootMode field in the SiteConfig CR that you use to install the cluster. For more information, see Deploying a managed cluster with SiteConfig and GitOps ZTP.

Operator namespaces and Operator groups

Single-node OpenShift clusters that run DU workloads require the following OperatorGroup and Namespace custom resources (CRs):

  • Local Storage Operator

  • Logging Operator

  • PTP Operator

  • SR-IOV Network Operator

The following CRs are required:

Recommended Storage Operator Namespace and OperatorGroup configuration

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-local-storage
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. ---
  9. apiVersion: operators.coreos.com/v1
  10. kind: OperatorGroup
  11. metadata:
  12. name: openshift-local-storage
  13. namespace: openshift-local-storage
  14. spec:
  15. targetNamespaces:
  16. - openshift-local-storage

Recommended Cluster Logging Operator Namespace and OperatorGroup configuration

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-logging
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. ---
  9. apiVersion: operators.coreos.com/v1
  10. kind: OperatorGroup
  11. metadata:
  12. name: cluster-logging
  13. namespace: openshift-logging
  14. spec:
  15. targetNamespaces:
  16. - openshift-logging

Recommended PTP Operator Namespace and OperatorGroup configuration

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-ptp
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. labels:
  9. openshift.io/cluster-monitoring: "true"
  10. ---
  11. apiVersion: operators.coreos.com/v1
  12. kind: OperatorGroup
  13. metadata:
  14. name: ptp-operators
  15. namespace: openshift-ptp
  16. spec:
  17. targetNamespaces:
  18. - openshift-ptp

Recommended SR-IOV Operator Namespace and OperatorGroup configuration

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-sriov-network-operator
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. ---
  9. apiVersion: operators.coreos.com/v1
  10. kind: OperatorGroup
  11. metadata:
  12. name: sriov-network-operators
  13. namespace: openshift-sriov-network-operator
  14. spec:
  15. targetNamespaces:
  16. - openshift-sriov-network-operator

Operator subscriptions

Single-node OpenShift clusters that run DU workloads require the following Subscription CRs. The subscription provides the location to download the following Operators:

  • Local Storage Operator

  • Logging Operator

  • PTP Operator

  • SR-IOV Network Operator

For each Operator subscription, specify the channel to get the Operator from. The recommended channel is stable.

You can specify Manual or Automatic updates. In Automatic mode, the Operator automatically updates to the latest versions in the channel as they become available in the registry. In Manual mode, new Operator versions are installed only when they are explicitly approved.

Use Manual mode for subscriptions. This allows you to control the timing of Operator updates to fit within planned/scheduled maintenance windows.

Recommended Local Storage Operator subscription

  1. apiVersion: operators.coreos.com/v1alpha1
  2. kind: Subscription
  3. metadata:
  4. name: local-storage-operator
  5. namespace: openshift-local-storage
  6. spec:
  7. channel: "stable"
  8. name: local-storage-operator
  9. source: redhat-operators
  10. sourceNamespace: openshift-marketplace
  11. installPlanApproval: Manual
  12. status:
  13. state: AtLatestKnown

Recommended SR-IOV Operator subscription

  1. apiVersion: operators.coreos.com/v1alpha1
  2. kind: Subscription
  3. metadata:
  4. name: sriov-network-operator-subscription
  5. namespace: openshift-sriov-network-operator
  6. spec:
  7. channel: "stable"
  8. name: sriov-network-operator
  9. source: redhat-operators
  10. sourceNamespace: openshift-marketplace
  11. installPlanApproval: Manual
  12. status:
  13. state: AtLatestKnown

Recommended PTP Operator subscription

  1. ---
  2. apiVersion: operators.coreos.com/v1alpha1
  3. kind: Subscription
  4. metadata:
  5. name: ptp-operator-subscription
  6. namespace: openshift-ptp
  7. spec:
  8. channel: "stable"
  9. name: ptp-operator
  10. source: redhat-operators
  11. sourceNamespace: openshift-marketplace
  12. installPlanApproval: Manual
  13. status:
  14. state: AtLatestKnown

Recommended Cluster Logging Operator subscription

  1. apiVersion: operators.coreos.com/v1alpha1
  2. kind: Subscription
  3. metadata:
  4. name: cluster-logging
  5. namespace: openshift-logging
  6. spec:
  7. channel: "stable"
  8. name: cluster-logging
  9. source: redhat-operators
  10. sourceNamespace: openshift-marketplace
  11. installPlanApproval: Manual
  12. status:
  13. state: AtLatestKnown

Cluster logging and log forwarding

Single-node OpenShift clusters that run DU workloads require logging and log forwarding for debugging. The following ClusterLogging and ClusterLogForwarder custom resources (CRs) are required.

Recommended cluster logging and log forwarding configuration

  1. apiVersion: logging.openshift.io/v1
  2. kind: ClusterLogging
  3. metadata:
  4. name: instance
  5. namespace: openshift-logging
  6. spec:
  7. managementState: "Managed"
  8. curation:
  9. type: "curator"
  10. curator:
  11. schedule: "30 3 * * *"
  12. collection:
  13. logs:
  14. type: "fluentd"
  15. fluentd: {}

Recommended log forwarding configuration

  1. apiVersion: "logging.openshift.io/v1"
  2. kind: ClusterLogForwarder
  3. metadata:
  4. name: instance
  5. namespace: openshift-logging
  6. spec:
  7. outputs:
  8. - type: "kafka"
  9. name: kafka-open
  10. url: tcp://10.46.55.190:9092/test
  11. inputs:
  12. - name: infra-logs
  13. infrastructure: {}
  14. pipelines:
  15. - name: audit-logs
  16. inputRefs:
  17. - audit
  18. outputRefs:
  19. - kafka-open
  20. - name: infrastructure-logs
  21. inputRefs:
  22. - infrastructure
  23. outputRefs:
  24. - kafka-open

Set the spec.outputs.url field to the URL of the Kafka server where the logs are forwarded to.

Performance profile

Single-node OpenShift clusters that run DU workloads require a Node Tuning Operator performance profile to use real-time host capabilities and services.

In earlier versions of OKD, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OKD 4.11 and later, this functionality is part of the Node Tuning Operator.

The following example PerformanceProfile CR illustrates the required single-node OpenShift cluster configuration.

Recommended performance profile configuration

  1. apiVersion: performance.openshift.io/v2
  2. kind: PerformanceProfile
  3. metadata:
  4. name: openshift-node-performance-profile
  5. spec:
  6. additionalKernelArgs:
  7. - "rcupdate.rcu_normal_after_boot=0"
  8. - "efi=runtime"
  9. - "module_blacklist=irdma"
  10. cpu:
  11. isolated: 2-51,54-103
  12. reserved: 0-1,52-53
  13. hugepages:
  14. defaultHugepagesSize: 1G
  15. pages:
  16. - count: 32
  17. size: 1G
  18. node: 0
  19. machineConfigPoolSelector:
  20. pools.operator.machineconfiguration.openshift.io/master: ""
  21. nodeSelector:
  22. node-role.kubernetes.io/master: ''
  23. numa:
  24. topologyPolicy: "restricted"
  25. realTimeKernel:
  26. enabled: true
  27. workloadHints:
  28. realTime: true
  29. highPowerConsumption: false
  30. perPodPowerManagement: false
Table 3. PerformanceProfile CR options for single-node OpenShift clusters
PerformanceProfile CR fieldDescription

metadata.name

Ensure that name matches the following fields set in related GitOps ZTP custom resources (CRs):

  • include=openshift-node-performance-${PerformanceProfile.metadata.name} in TunedPerformancePatch.yaml

  • name: 50-performance-${PerformanceProfile.metadata.name} in validatorCRs/informDuValidator.yaml

spec.additionalKernelArgs

“efi=runtime” Configures UEFI secure boot for the cluster host.

spec.cpu.isolated

Set the isolated CPUs. Ensure all of the Hyper-Threading pairs match.

The reserved and isolated CPU pools must not overlap and together must span all available cores. CPU cores that are not accounted for cause an undefined behaviour in the system.

spec.cpu.reserved

Set the reserved CPUs. When workload partitioning is enabled, system processes, kernel threads, and system container threads are restricted to these CPUs. All CPUs that are not isolated should be reserved.

spec.hugepages.pages

  • Set the number of huge pages (count)

  • Set the huge pages size (size).

  • Set node to the NUMA node where the hugepages are allocated (node)

spec.realTimeKernel

Set enabled to true to use the realtime kernel.

spec.workloadHints

Use workloadHints to define the set of top level flags for different type of workloads. The example configuration configures the cluster for low latency and high performance.

PTP

Single-node OpenShift clusters use Precision Time Protocol (PTP) for network time synchronization. The following example PtpConfig CR illustrates the required PTP slave configuration.

Recommended PTP configuration

  1. apiVersion: ptp.openshift.io/v1
  2. kind: PtpConfig
  3. metadata:
  4. name: slave
  5. namespace: openshift-ptp
  6. spec:
  7. profile:
  8. - name: "slave"
  9. # The interface name is hardware-specific
  10. interface: ens5f0
  11. ptp4lOpts: "-2 -s"
  12. phc2sysOpts: "-a -r -n 24"
  13. ptpSchedulingPolicy: SCHED_FIFO
  14. ptpSchedulingPriority: 10
  15. ptpSettings:
  16. logReduce: "true"
  17. ptp4lConf: |
  18. [global]
  19. #
  20. # Default Data Set
  21. #
  22. twoStepFlag 1
  23. slaveOnly 0
  24. priority1 128
  25. priority2 128
  26. domainNumber 24
  27. #utc_offset 37
  28. clockClass 255
  29. clockAccuracy 0xFE
  30. offsetScaledLogVariance 0xFFFF
  31. free_running 0
  32. freq_est_interval 1
  33. dscp_event 0
  34. dscp_general 0
  35. dataset_comparison G.8275.x
  36. G.8275.defaultDS.localPriority 128
  37. #
  38. # Port Data Set
  39. #
  40. logAnnounceInterval -3
  41. logSyncInterval -4
  42. logMinDelayReqInterval -4
  43. logMinPdelayReqInterval -4
  44. announceReceiptTimeout 3
  45. syncReceiptTimeout 0
  46. delayAsymmetry 0
  47. fault_reset_interval 4
  48. neighborPropDelayThresh 20000000
  49. masterOnly 0
  50. G.8275.portDS.localPriority 128
  51. #
  52. # Run time options
  53. #
  54. assume_two_step 0
  55. logging_level 6
  56. path_trace_enabled 0
  57. follow_up_info 0
  58. hybrid_e2e 0
  59. inhibit_multicast_service 0
  60. net_sync_monitor 0
  61. tc_spanning_tree 0
  62. tx_timestamp_timeout 50
  63. unicast_listen 0
  64. unicast_master_table 0
  65. unicast_req_duration 3600
  66. use_syslog 1
  67. verbose 0
  68. summary_interval 0
  69. kernel_leap 1
  70. check_fup_sync 0
  71. #
  72. # Servo Options
  73. #
  74. pi_proportional_const 0.0
  75. pi_integral_const 0.0
  76. pi_proportional_scale 0.0
  77. pi_proportional_exponent -0.3
  78. pi_proportional_norm_max 0.7
  79. pi_integral_scale 0.0
  80. pi_integral_exponent 0.4
  81. pi_integral_norm_max 0.3
  82. step_threshold 2.0
  83. first_step_threshold 0.00002
  84. max_frequency 900000000
  85. clock_servo pi
  86. sanity_freq_limit 200000000
  87. ntpshm_segment 0
  88. #
  89. # Transport options
  90. #
  91. transportSpecific 0x0
  92. ptp_dst_mac 01:1B:19:00:00:00
  93. p2p_dst_mac 01:80:C2:00:00:0E
  94. udp_ttl 1
  95. udp6_scope 0x0E
  96. uds_address /var/run/ptp4l
  97. #
  98. # Default interface options
  99. #
  100. clock_type OC
  101. network_transport L2
  102. delay_mechanism E2E
  103. time_stamping hardware
  104. tsproc_mode filter
  105. delay_filter moving_median
  106. delay_filter_length 10
  107. egressLatency 0
  108. ingressLatency 0
  109. boundary_clock_jbod 0
  110. #
  111. # Clock description
  112. #
  113. productDescription ;;
  114. revisionData ;;
  115. manufacturerIdentity 00:00:00
  116. userDescription ;
  117. timeSource 0xA0
  118. recommend:
  119. - profile: "slave"
  120. priority: 4
  121. match:
  122. - nodeLabel: "node-role.kubernetes.io/master"

Extended Tuned profile

Single-node OpenShift clusters that run DU workloads require additional performance tuning configurations necessary for high-performance workloads. The following example Tuned CR extends the Tuned profile:

Recommended extended Tuned profile configuration

  1. apiVersion: tuned.openshift.io/v1
  2. kind: Tuned
  3. metadata:
  4. name: performance-patch
  5. namespace: openshift-cluster-node-tuning-operator
  6. spec:
  7. profile:
  8. - name: performance-patch
  9. data: |
  10. [main]
  11. summary=Configuration changes profile inherited from performance created tuned
  12. include=openshift-node-performance-openshift-node-performance-profile
  13. [sysctl]
  14. kernel.timer_migration=1
  15. [scheduler]
  16. group.ice-ptp=0:f:10:*:ice-ptp.*
  17. group.ice-gnss=0:f:10:*:ice-gnss.*
  18. [service]
  19. service.stalld=start,enable
  20. service.chronyd=stop,disable
  21. recommend:
  22. - machineConfigLabels:
  23. machineconfiguration.openshift.io/role: "master"
  24. priority: 19
  25. profile: performance-patch
Table 4. Tuned CR options for single-node OpenShift clusters
Tuned CR fieldDescription

spec.profile.data

  • The include line that you set in spec.profile.data must match the associated PerformanceProfile CR name. For example, include=openshift-node-performance-${PerformanceProfile.metadata.name}.

  • When using the non-realtime kernel, remove the timer_migration override line from the [sysctl] section.

SR-IOV

Single root I/O virtualization (SR-IOV) is commonly used to enable fronthaul and midhaul networks. The following YAML example configures SR-IOV for a single-node OpenShift cluster.

The configuration of the SriovNetwork CR will vary depending on your specific network and infrastructure requirements.

Recommended SriovOperatorConfig configuration

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovOperatorConfig
  3. metadata:
  4. name: default
  5. namespace: openshift-sriov-network-operator
  6. spec:
  7. configDaemonNodeSelector:
  8. "node-role.kubernetes.io/master": ""
  9. enableInjector: true
  10. enableOperatorWebhook: true
Table 5. SriovOperatorConfig CR options for single-node OpenShift clusters
SriovOperatorConfig CR fieldDescription

spec.enableInjector

Disable Injector pods to reduce the number of management pods. Start with the Injector pods enabled, and only disable them after verifying the user manifests. If the injector is disabled, containers that use SR-IOV resources must explicitly assign them in the requests and limits section of the container spec.

For example:

  1. containers:
  2. - name: my-sriov-workload-container
  3. resources:
  4. limits:
  5. openshift.io/<resource_name>: 1
  6. requests:
  7. openshift.io/<resource_name>: 1

spec.enableOperatorWebhook

Disable OperatorWebhook pods to reduce the number of management pods. Start with the OperatorWebhook pods enabled, and only disable them after verifying the user manifests.

Recommended SriovNetwork configuration

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovNetwork
  3. metadata:
  4. name: ""
  5. namespace: openshift-sriov-network-operator
  6. spec:
  7. resourceName: "du_mh"
  8. networkNamespace: openshift-sriov-network-operator
  9. vlan: "150"
  10. spoofChk: ""
  11. ipam: ""
  12. linkState: ""
  13. maxTxRate: ""
  14. minTxRate: ""
  15. vlanQoS: ""
  16. trust: ""
  17. capabilities: ""
Table 6. SriovNetwork CR options for single-node OpenShift clusters
SriovNetwork CR fieldDescription

spec.vlan

Configure vlan with the VLAN for the midhaul network.

Recommended SriovNetworkNodePolicy configuration

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovNetworkNodePolicy
  3. metadata:
  4. name: $name
  5. namespace: openshift-sriov-network-operator
  6. spec:
  7. # Attributes for Mellanox/Intel based NICs
  8. deviceType: netdevice/vfio-pci
  9. isRdma: true/false
  10. nicSelector:
  11. # The exact physical function name must match the hardware used
  12. pfNames: [ens7f0]
  13. nodeSelector:
  14. node-role.kubernetes.io/master: ""
  15. numVfs: 8
  16. priority: 10
  17. resourceName: du_mh
Table 7. SriovNetworkPolicy CR options for single-node OpenShift clusters
SriovNetworkNodePolicy CR fieldDescription

spec.deviceType

Configure deviceType as vfio-pci or netdevice.

spec.nicSelector.pfNames

Specifies the interface connected to the fronthaul network.

spec.numVfs

Specifies the number of VFs for the fronthaul network.

Console Operator

Use the cluster capabilities feature to prevent the Console Operator from being installed. When the node is centrally managed it is not needed. Removing the Operator provides additional space and capacity for application workloads.

To disable the Console Operator during the installation of the managed cluster, set the following in the spec.clusters.0.installConfigOverrides field of the SiteConfig custom resource (CR):

  1. installConfigOverrides: "{\"capabilities\":{\"baselineCapabilitySet\": \"None\" }}"

Alertmanager

Single-node OpenShift clusters that run DU workloads require reduced CPU resources consumed by the OKD monitoring components. The following ConfigMap custom resource (CR) disables Alertmanager.

Recommended cluster monitoring configuration (ReduceMonitoringFootprint.yaml)

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: cluster-monitoring-config
  5. namespace: openshift-monitoring
  6. annotations:
  7. ran.openshift.io/ztp-deploy-wave: "1"
  8. data:
  9. config.yaml: |
  10. alertmanagerMain:
  11. enabled: false
  12. prometheusK8s:
  13. retention: 24h

LVM Storage

You can dynamically provision local storage on single-node OpenShift clusters with Logical volume manager storage (LVM Storage).

The recommended storage solution for single-node OpenShift is the Local Storage Operator. Alternatively, you can use LVM Storage but it requires additional CPU resources to be allocated.

The following YAML example configures the storage of the node to be available to OKD applications.

Recommended LVMCluster configuration (StorageLVMCluster.yaml)

  1. apiVersion: lvm.topolvm.io/v1alpha1
  2. kind: LVMCluster
  3. metadata:
  4. name: odf-lvmcluster
  5. namespace: openshift-storage
  6. spec:
  7. storage:
  8. deviceClasses:
  9. - name: vg1
  10. deviceSelector:
  11. paths:
  12. - /usr/disk/by-path/pci-0000:11:00.0-nvme-1
  13. thinPoolConfig:
  14. name: thin-pool-1
  15. overprovisionRatio: 10
  16. sizePercent: 90
Table 8. LVMCluster CR options for single-node OpenShift clusters
LVMCluster CR fieldDescription

deviceSelector.paths

Configure the disks used for LVM storage. If no disks are specified, the LVM Storage uses all the unused disks in the specified thin pool.

Network diagnostics

Single-node OpenShift clusters that run DU workloads require less inter-pod network connectivity checks to reduce the additional load created by these pods. The following custom resource (CR) disables these checks.

Recommended network diagnostics configuration (DisableSnoNetworkDiag.yaml)

  1. apiVersion: operator.openshift.io/v1
  2. kind: Network
  3. metadata:
  4. name: cluster
  5. spec:
  6. disableNetworkDiagnostics: true

Additional resources