Deploying distributed units at scale in a disconnected environment

Use zero touch provisioning (ZTP) to provision distributed units at new edge sites in a disconnected environment. The workflow starts when the site is connected to the network and ends with the CNF workload deployed and running on the site nodes.

Provisioning edge sites at scale

Telco edge computing presents extraordinary challenges with managing hundreds to tens of thousands of clusters in hundreds of thousands of locations. These challenges require fully-automated management solutions with, as closely as possible, zero human interaction.

Zero touch provisioning (ZTP) allows you to provision new edge sites with declarative configurations of bare-metal equipment at remote sites. Template or overlay configurations install OKD features that are required for CNF workloads. End-to-end functional test suites are used to verify CNF related features. All configurations are declarative in nature.

You start the workflow by creating declarative configurations for ISO images that are delivered to the edge nodes to begin the installation process. The images are used to repeatedly provision large numbers of nodes efficiently and quickly, allowing you keep up with requirements from the field for far edge nodes.

Service providers are deploying a more distributed mobile network architecture allowed by the modular functional framework defined for 5G. This allows service providers to move from appliance-based radio access networks (RAN) to open cloud RAN architecture, gaining flexibility and agility in delivering services to end users.

The following diagram shows how ZTP works within a far edge framework.

ZTP in a far edge framework

About ZTP and distributed units on OpenShift clusters

You can install a distributed unit (DU) on OKD clusters at scale with Red Hat Advanced Cluster Management (RHACM) using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment.

RHACM manages clusters in a hub-and-spoke architecture, where a single hub cluster manages many spoke clusters. RHACM applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of OKD on each cluster.

The AI service handles provisioning of OKD on single node clusters, three-node clusters, or standard clusters running on bare metal. ACM ships with and deploys the AI when the MultiClusterHub custom resource is installed.

With ZTP and AI, you can provision OKD clusters to run your DUs at scale. A high-level overview of ZTP for distributed units in a disconnected environment is as follows:

  • A hub cluster running Red Hat Advanced Cluster Management (RHACM) manages a disconnected internal registry that mirrors the OKD release images. The internal registry is used to provision the spoke clusters.

  • You manage the bare metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository.

  • You install the DU bare metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare metal host:

    • Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster.

    • Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC. ZTP manages the spoke cluster definition CRs, with the exception of the BMCSecret CR, which you create manually. These define the relevant elements for the managed clusters.

The GitOps approach

ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager (OCM) for multisite deployment.

One of the motivators for a GitOps approach is the requirement for reliability at scale. This is a significant challenge that GitOps helps solve.

GitOps addresses the reliability issue by providing traceability, RBAC, and a single source of truth for the desired state of each site. Scale issues are addressed by GitOps providing structure, tooling, and event driven operations through webhooks.

Zero touch provisioning building blocks

Red Hat Advanced Cluster Management (RHACM) leverages zero touch provisioning (ZTP) to deploy single-node OKD clusters, three-node clusters, and standard clusters. The initial site plan is divided into smaller components and initial configuration data is stored in a Git repository. ZTP uses a declarative GitOps approach to deploy these clusters.

The deployment of the clusters includes:

  • Installing the host operating system (RHCOS) on a blank server.

  • Deploying OKD.

  • Creating cluster policies and site subscriptions.

  • Leveraging a GitOps deployment topology for a develop once, deploy anywhere model.

  • Making the necessary network configurations to the server operating system.

  • Deploying profile Operators and performing any needed software-related configuration, such as performance profile, PTP, and SR-IOV.

  • Downloading images needed to run workloads (CNFs).

How to plan your RAN policies

Zero touch provisioning (ZTP) uses Red Hat Advanced Cluster Management (RHACM) to apply the radio access network (RAN) configuration using a policy-based governance approach to apply the configuration.

The policy generator or PolicyGen is a part of the GitOps ZTP tooling that facilitates creating RHACM policies from a set of predefined custom resources. There are three main items: policy categorization, source CR policy, and the PolicyGenTemplate CR. PolicyGen uses these to generate the policies and their placement bindings and rules.

The following diagram shows how the RAN policy generator interacts with GitOps and RHACM.

RAN policy generator

RAN policies are categorized into three main groups:

Common

A policy that exists in the Common category is applied to all clusters to be represented by the site plan. Cluster types include single node, three-node, and standard clusters.

Groups

A policy that exists in the Groups category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the Groups category. For example, Groups/group1 can have its own policies that are applied to the clusters belonging to group1. You can also define a group for each cluster type: single node, three-node, and standard clusters.

Sites

A policy that exists in the Sites category is applied to a specific cluster. Any cluster could have its own policies that exist in the Sites category. For example, Sites/cluster1 has its own policies applied to cluster1. You can also define an example site-specific configuration for each cluster type: single node, three-node, and standard clusters.

Low latency for distributed units (DUs)

Low latency is an integral part of the development of 5G networks. Telecommunications networks require as little signal delay as possible to ensure quality of service in a variety of critical use cases.

Low latency processing is essential for any communication with timing constraints that affect functionality and security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data.

Low latency systems are about guarantees with regards to response and processing times. This includes keeping a communication protocol running smoothly, ensuring device security with fast responses to error conditions, or just making sure a system is not lagging behind when receiving a lot of data. Low latency is key for optimal synchronization of radio transmissions.

OKD enables low latency processing for DUs running on COTS hardware by using a number of technologies and specialized hardware devices:

Real-time kernel for RHCOS

Ensures workloads are handled with a high degree of process determinism.

CPU isolation

Avoids CPU scheduling delays and ensures CPU capacity is available consistently.

NUMA awareness

Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the NUMA node. This decreases latency and improves performance of the node.

Huge pages memory management

Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.

Precision timing synchronization using PTP

Allows synchronization between nodes in the network with sub-microsecond accuracy.

Preparing the disconnected environment

Before you can provision distributed units (DU) at scale, you must install Red Hat Advanced Cluster Management (RHACM), which handles the provisioning of the DUs.

RHACM is deployed as an Operator on the OKD hub cluster. It controls clusters and applications from a single console with built-in security policies. RHACM provisions and manage your DU hosts. To install RHACM in a disconnected environment, you create a mirror registry that mirrors the Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster.

You also use a disconnected mirror host to serve the FCOS ISO and RootFS disk images that provision the DU bare-metal host operating system.

Additional resources

Adding FCOS ISO and RootFS images to the disconnected mirror host

Before you install a cluster on infrastructure that you provision, you must create Fedora CoreOS (FCOS) machines for it to use. Use a disconnected mirror to host the FCOS images you require to provision your distributed unit (DU) bare-metal hosts.

Prerequisites

  • Deploy and configure an HTTP server to host the FCOS image resources on the network. You must be able to access the HTTP server from your computer, and from the machines that you create.

The FCOS images might not change with every release of OKD. You must download images with the highest version that is less than or equal to the OKD version that you install. Use the image versions that match your OKD version if they are available. You require ISO and RootFS images to install FCOS on the DU hosts. FCOS qcow2 images are not supported for this installation type.

Procedure

  1. Log in to the mirror host.

  2. Obtain the FCOS ISO and RootFS images from mirror.openshift.com, for example:

    1. Export the required image names and OKD version as environment variables:

      1. $ export ISO_IMAGE_NAME=<iso_image_name> (1)
      1. $ export ROOTFS_IMAGE_NAME=<rootfs_image_name> (2)
      1. $ export OCP_VERSION=<ocp_version> (3)
      1ISO image name, for example, rhcos-4.11.0-fc.1-x86_64-live.x86_64.iso
      2RootFS image name, for example, rhcos-4.11.0-fc.1-x86_64-live-rootfs.x86_64.img
      3OKD version, for example, latest-4.11
    2. Download the required images:

      1. $ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ISO_IMAGE_NAME} -O /var/www/html/${ISO_IMAGE_NAME}
      1. $ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ROOTFS_IMAGE_NAME} -O /var/www/html/${ROOTFS_IMAGE_NAME}

Verification steps

  • Verify that the images downloaded successfully and are being served on the disconnected mirror host, for example:

    1. $ wget http://$(hostname)/${ISO_IMAGE_NAME}

    Expected output

    1. ...
    2. Saving to: rhcos-4.11.0-fc.1-x86_64-live.x86_64.iso
    3. rhcos-4.11.0-fc.1-x86_64- 11%[====> ] 10.01M 4.71MB/s
    4. ...

Installing Red Hat Advanced Cluster Management in a disconnected environment

You use Red Hat Advanced Cluster Management (RHACM) on a hub cluster in the disconnected environment to manage the deployment of distributed unit (DU) profiles on multiple managed spoke clusters.

Prerequisites

  • Install the OKD CLI (oc).

  • Log in as a user with cluster-admin privileges.

  • Configure a disconnected mirror registry for use in the cluster.

    If you want to deploy Operators to the spoke clusters, you must also add them to this registry. See Mirroring an Operator catalog for more information.

Procedure

Enabling assisted installer service on bare metal

The Assisted Installer Service (AIS) deploys OKD clusters. Red Hat Advanced Cluster Management (RHACM) ships with AIS. AIS is deployed when you enable the MultiClusterHub Operator on the RHACM hub cluster.

For distributed units (DUs), RHACM supports OKD deployments that run on a single bare-metal host, three-node clusters, or standard clusters. In the case of single node clusters or three-node clusters, all nodes act as both control plane and worker nodes.

Prerequisites

  • Install OKD 4.11 on a hub cluster.

  • Install RHACM and create the MultiClusterHub resource.

  • Create persistent volume custom resources (CR) for database and file system storage.

  • You have installed the OpenShift CLI (oc).

Create a persistent volume resource for image storage. Failure to specify persistent volume storage for images can affect cluster performance.

Procedure

  1. Modify the Provisioning resource to allow the Bare Metal Operator to watch all namespaces:

    1. $ oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'
  2. Create the AgentServiceConfig CR.

    1. Save the following YAML in the agent_service_config.yaml file:

      1. apiVersion: agent-install.openshift.io/v1beta1
      2. kind: AgentServiceConfig
      3. metadata:
      4. name: agent
      5. spec:
      6. databaseStorage:
      7. accessModes:
      8. - ReadWriteOnce
      9. resources:
      10. requests:
      11. storage: <database_volume_size> (1)
      12. filesystemStorage:
      13. accessModes:
      14. - ReadWriteOnce
      15. resources:
      16. requests:
      17. storage: <file_storage_volume_size> (2)
      18. imageStorage:
      19. accessModes:
      20. - ReadWriteOnce
      21. resources:
      22. requests:
      23. storage: <image_storage_volume_size> (3)
      24. osImages: (4)
      25. - openshiftVersion: "<ocp_version>" (5)
      26. version: "<ocp_release_version>" (6)
      27. url: "<iso_url>" (7)
      28. rootFSUrl: "<root_fs_url>" (8)
      29. cpuArchitecture: "x86_64"
      1Volume size for the databaseStorage field, for example 10Gi.
      2Volume size for the filesystemStorage field, for example 20Gi.
      3Volume size for the imageStorage field, for example 2Gi.
      4List of OS image details, for example a single OKD OS version.
      5OKD version to install, in either “x.y” (major.minor) or “x.y.z” (major.minor.patch) formats.
      6Specific install version, for example, 47.83.202103251640-0.
      7ISO url, for example, https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-4.7.7-x86_64-live.x86_64.iso.
      8Root FS image URL, for example https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-live-rootfs.x86_64.img.
    2. Create the AgentServiceConfig CR by running the following command:

      1. $ oc create -f agent_service_config.yaml

      Example output

      1. agentserviceconfig.agent-install.openshift.io/agent created

ZTP custom resources

Zero touch provisioning (ZTP) uses custom resource (CR) objects to extend the Kubernetes API or introduce your own API into a project or a cluster. These CRs contain the site-specific data required to install and configure a cluster for RAN applications.

A custom resource definition (CRD) file defines your own object kinds. Deploying a CRD into the managed cluster causes the Kubernetes API server to begin serving the specified CR for the entire lifecycle.

For each CR in the <site>.yaml file on the managed cluster, ZTP uses the data to create installation CRs in a directory named for the cluster.

ZTP provides two ways for defining and installing CRs on managed clusters: a manual approach when you are provisioning a single cluster and an automated approach when provisioning multiple clusters.

Manual CR creation for single clusters

Use this method when you are creating CRs for a single cluster. This is a good way to test your CRs before deploying on a larger scale.

Automated CR creation for multiple managed clusters

Use the automated SiteConfig method when you are installing multiple managed clusters, for example, in batches of up to 100 clusters. SiteConfig uses ArgoCD as the engine for the GitOps method of site deployment. After completing a site plan that contains all of the required parameters for deployment, a policy generator creates the manifests and applies them to the hub cluster.

Both methods create the CRs shown in the following table. On the cluster site, an automated Discovery image ISO file creates a directory with the site name and a file with the cluster name. Every cluster has its own namespace, and all of the CRs are under that namespace. The namespace and the CR names match the cluster name.

ResourceDescriptionUsage

BareMetalHost

Contains the connection information for the Baseboard Management Controller (BMC) of the target bare-metal host.

Provides access to the BMC in order to load and boot the Discovery image ISO on the target server by using the Redfish protocol.

InfraEnv

Contains information for pulling OKD onto the target bare-metal host.

Used with ClusterDeployment to generate the Discovery ISO for the managed cluster.

AgentClusterInstall

Specifies the managed cluster’s configuration such as networking and the number of supervisor (control plane) nodes. Shows the kubeconfig and credentials when the installation is complete.

Specifies the managed cluster configuration information and provides status during the installation of the cluster.

ClusterDeployment

References the AgentClusterInstall to use.

Used with InfraEnv to generate the Discovery ISO for the managed cluster.

NMStateConfig

Provides network configuration information such as MAC to IP mapping, DNS server, default route, and other network settings. This is not needed if DHCP is used.

Sets up a static IP address for the managed cluster’s Kube API server.

Agent

Contains hardware information about the target bare-metal host.

Created automatically on the hub when the target machine’s Discovery image ISO boots.

ManagedCluster

When a cluster is managed by the hub, it must be imported and known. This Kubernetes object provides that interface.

The hub uses this resource to manage and show the status of managed clusters.

KlusterletAddonConfig

Contains the list of services provided by the hub to be deployed to a ManagedCluster.

Tells the hub which addon services to deploy to a ManagedCluster.

Namespace

Logical space for ManagedCluster resources existing on the hub. Unique per site.

Propagates resources to the ManagedCluster.

Secret

Two custom resources are created: BMC Secret and Image Pull Secret.

  • BMC Secret authenticates into the target bare-metal host using its username and password.

  • Image Pull Secret contains authentication information for the OKD image installed on the target bare-metal host.

ClusterImageSet

Contains OKD image information such as the repository and image name.

Passed into resources to provide OKD images.

ZTP support for single node clusters, three-node clusters, and standard clusters requires updates to these CRs, including multiple instantiations of some.

ZTP provides support for deploying single node clusters, three-node clusters, and standard OpenShift clusters. This includes the installation of OpenShift and deployment of the distributed units (DUs) at scale.

The overall flow is identical to the ZTP support for single node clusters, with some differences in configuration depending on the type of cluster:

SiteConfig file:

  • For single node clusters, the SiteConfig file must have exactly one entry in the nodes section.

  • For three-node clusters, the SiteConfig file must have exactly three entries defined in the nodes section.

  • For standard clusters, the SiteConfig file must have exactly three entries in the nodes section with role: master and one or more additional entries with role: worker.

PolicyGenTemplate file:

  • The example common PolicyGenTemplate file is common across all types of clusters.

  • There are example group PolicyGenTemplate files for single node, three-node, and standard clusters.

  • Site-specific PolicyGenTemplate files are still specific to each site.

PolicyGenTemplate CRs for RAN deployments

You use PolicyGenTemplate custom resources (CRs) to customize the configuration applied to the cluster using the GitOps zero touoch provisioning (ZTP) pipeline. The baseline configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use PolicyGenTemplate CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.

The baseline PolicyGenTemplate CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate. See “Preparing the ZTP Git repository” for further details.

The PolicyGenTemplate CRs can be found in the ./out/argocd/example/policygentemplates folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenTemplate CR refers to other CRs that can be found in the ./out/source-crs folder.

The PolicyGenTemplate CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenTemplate CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.

Table 1. PolicyGenTemplate CRs for RAN deployments
PolicyGenTemplate CRDescription

common-ranGen.yaml

Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning.

group-du-3node-ranGen.yaml

Contains the RAN policies for three-node clusters only.

group-du-sno-ranGen.yaml

Contains the RAN policies for single-node clusters only.

group-du-standard-ranGen.yaml

Contains the RAN policies for standard three control-plane clusters.

Additional resources

About the PolicyGenTemplate

The PolicyGenTemplate.yaml file is a custom resource definition (CRD) that tells the PolicyGen policy generator what CRs to include in the configuration, how to categorize the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.

The following example shows a PolicyGenTemplate.yaml file:

  1. ---
  2. apiVersion: ran.openshift.io/v1
  3. kind: PolicyGenTemplate
  4. metadata:
  5. name: "group-du-sno"
  6. namespace: "group-du-sno-policies"
  7. spec:
  8. bindingRules:
  9. group-du-sno: ""
  10. mcp: "master"
  11. sourceFiles:
  12. - fileName: ConsoleOperatorDisable.yaml
  13. policyName: "console-policy"
  14. - fileName: ClusterLogForwarder.yaml
  15. policyName: "log-forwarder-policy"
  16. spec:
  17. outputs:
  18. - type: "kafka"
  19. name: kafka-open
  20. # below url is an example
  21. url: tcp://10.46.55.190:9092/test
  22. pipelines:
  23. - name: audit-logs
  24. inputRefs:
  25. - audit
  26. outputRefs:
  27. - kafka-open
  28. - name: infrastructure-logs
  29. inputRefs:
  30. - infrastructure
  31. outputRefs:
  32. - kafka-open
  33. - fileName: ClusterLogging.yaml
  34. policyName: "log-policy"
  35. spec:
  36. curation:
  37. curator:
  38. schedule: "30 3 * * *"
  39. collection:
  40. logs:
  41. type: "fluentd"
  42. fluentd: {}
  43. - fileName: MachineConfigSctp.yaml
  44. policyName: "mc-sctp-policy"
  45. metadata:
  46. labels:
  47. machineconfiguration.openshift.io/role: master
  48. - fileName: PtpConfigSlave.yaml
  49. policyName: "ptp-config-policy"
  50. metadata:
  51. name: "du-ptp-slave"
  52. spec:
  53. profile:
  54. - name: "slave"
  55. interface: "ens5f0"
  56. ptp4lOpts: "-2 -s --summary_interval -4"
  57. phc2sysOpts: "-a -r -n 24"
  58. - fileName: SriovOperatorConfig.yaml
  59. policyName: "sriov-operconfig-policy"
  60. spec:
  61. disableDrain: true
  62. - fileName: MachineConfigAcceleratedStartup.yaml
  63. policyName: "mc-accelerated-policy"
  64. metadata:
  65. name: 04-accelerated-container-startup-master
  66. labels:
  67. machineconfiguration.openshift.io/role: master
  68. - fileName: DisableSnoNetworkDiag.yaml
  69. policyName: "disable-network-diag"
  70. metadata:
  71. labels:
  72. machineconfiguration.openshift.io/role: master

The group-du-ranGen.yaml file defines a group of policies under a group named group-du. A Red Hat Advanced Cluster Management (RHACM) policy is generated for every source file that exists in sourceFiles. And, a single placement binding and placement rule is generated to apply the cluster selection rule for group-du policies.

Using the source file PtpConfigSlave.yaml as an example, the PtpConfigSlave has a definition of a PtpConfig custom resource (CR). The generated policy for the PtpConfigSlave example is named group-du-ptp-config-policy. The PtpConfig CR defined in the generated group-du-ptp-config-policy is named du-ptp-slave. The spec defined in PtpConfigSlave.yaml is placed under du-ptp-slave along with the other spec items defined under the source file.

The following example shows the group-du-ptp-config-policy:

  1. apiVersion: policy.open-cluster-management.io/v1
  2. kind: Policy
  3. metadata:
  4. name: group-du-ptp-config-policy
  5. namespace: groups-sub
  6. annotations:
  7. policy.open-cluster-management.io/categories: CM Configuration Management
  8. policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
  9. policy.open-cluster-management.io/standards: NIST SP 800-53
  10. spec:
  11. remediationAction: enforce
  12. disabled: false
  13. policy-templates:
  14. - objectDefinition:
  15. apiVersion: policy.open-cluster-management.io/v1
  16. kind: ConfigurationPolicy
  17. metadata:
  18. name: group-du-ptp-config-policy-config
  19. spec:
  20. remediationAction: enforce
  21. severity: low
  22. namespaceselector:
  23. exclude:
  24. - kube-*
  25. include:
  26. - '*'
  27. object-templates:
  28. - complianceType: musthave
  29. objectDefinition:
  30. apiVersion: ptp.openshift.io/v1
  31. kind: PtpConfig
  32. metadata:
  33. name: slave
  34. namespace: openshift-ptp
  35. spec:
  36. recommend:
  37. - match:
  38. - nodeLabel: node-role.kubernetes.io/worker-du
  39. priority: 4
  40. profile: slave
  41. profile:
  42. - interface: ens5f0
  43. name: slave
  44. phc2sysOpts: -a -r -n 24
  45. ptp4lConf: |
  46. [global]
  47. #
  48. # Default Data Set
  49. #
  50. twoStepFlag 1
  51. slaveOnly 0
  52. priority1 128
  53. priority2 128
  54. domainNumber 24
  55. .....

Best practices when customizing PolicyGenTemplate CRs

Consider the following best practices when customizing site configuration PolicyGenTemplate CRs:

  • Use as few policies as necessary. Using fewer policies means using less resources. Each additional policy creates overhead for the hub cluster and the deployed spoke cluster. CRs are combined into policies based on the policyName field in the PolicyGenTemplate CR. CRs in the same PolicyGenTemplate which have the same value for policyName are managed under a single policy.

  • Use a single catalog source for all Operators. In disconnected environments, configure the registry as a single index containing all Operators. Each additional CatalogSource on the spoke clusters increases CPU usage.

  • MachineConfig CRs should be included as extraManifests in the SiteConfig CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications.

  • PolicyGenTemplates should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.

Additional resources

Scaling the hub cluster to managing large numbers of spoke clusters is affected by the number of policies created on the hub cluster. Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common/group/site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configuration into a single policy.

Creating the PolicyGenTemplate CR

Use this procedure to create the PolicyGenTemplate custom resource (CR) for your site in your local clone of the Git repository.

Procedure

  1. Choose an appropriate example from out/argocd/example/policygentemplates. This directory demonstrates a three-level policy framework that represents a well-supported low-latency profile tuned for the needs of 5G Telco DU deployments:

    • A single common-ranGen.yaml file that should apply to all types of sites.

    • A set of shared group-du-*-ranGen.yaml files, each of which should be common across a set of similar clusters.

    • An example example-*-site.yaml that can be copied and updated for each individual site.

  2. Ensure that the labels defined in your PolicyGenTemplate bindingRules section correspond to the labels that are defined in the SiteConfig files of the clusters you are managing.

  3. Ensure that the content of the overlaid spec files matches your desired end state. As a reference, the out/source-crs directory contains the full list of source-crs available to be included and overlaid by your PolicyGenTemplate templates.

    Depending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single PerformancePolicy.yaml file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.

  4. Define all the policy namespaces in a YAML file similar to the example out/argocd/example/policygentemplates/ns.yaml file.

  5. Add all the PolicyGenTemplate files and ns.yaml file to the kustomization.yaml file, similar to the example out/argocd/example/policygentemplates/kustomization.yaml file.

  6. Commit the PolicyGenTemplate CRs, ns.yaml file, and the associated kustomization.yaml file in the Git repository.

Creating ZTP custom resources for multiple managed clusters

If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and SiteConfig files to manage the processes that create the CRs and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach.

Installing and deploying the clusters is a two stage process, as shown here:

GitOps approach for Installing and deploying the clusters

Using PolicyGenTemplate CRs to override source CRs content

PolicyGenTemplate CRs allow you to overlay additional configuration details on top of the base source CRs provided in the ztp-site-generate container. You can think of PolicyGenTemplate CRs as a logical merge or patch to the base CR. Use PolicyGenTemplate CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR.

The following example procedure describes how to update fields in the generated PerformanceProfile CR for the reference configuration based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml file. Use the procedure as a basis for modifying other parts of the PolicyGenTemplate based on your requirements.

Prerequisites

  • Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.

Procedure

  1. Review the baseline source CR for existing content. You can review the source CRs listed in the reference PolicyGenTemplate CRs by extracting them from the zero touch provisioning (ZTP) container.

    1. Create an /out folder:

      1. $ mkdir -p ./out
    2. Extract the source CRs:

      1. $ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
  2. Review the baseline PerformanceProfile CR in ./out/source-crs/PerformanceProfile.yaml:

    1. apiVersion: performance.openshift.io/v2
    2. kind: PerformanceProfile
    3. metadata:
    4. name: $name
    5. annotations:
    6. ran.openshift.io/ztp-deploy-wave: "10"
    7. spec:
    8. additionalKernelArgs:
    9. - "idle=poll"
    10. - "rcupdate.rcu_normal_after_boot=0"
    11. cpu:
    12. isolated: $isolated
    13. reserved: $reserved
    14. hugepages:
    15. defaultHugepagesSize: $defaultHugepagesSize
    16. pages:
    17. - size: $size
    18. count: $count
    19. node: $node
    20. machineConfigPoolSelector:
    21. pools.operator.machineconfiguration.openshift.io/$mcp: ""
    22. net:
    23. userLevelNetworking: true
    24. nodeSelector:
    25. node-role.kubernetes.io/$mcp: ''
    26. numa:
    27. topologyPolicy: "restricted"
    28. realTimeKernel:
    29. enabled: true

    Any fields in the source CR which contain $…​ are removed from the generated CR if they are not provided in the PolicyGenTemplate CR.

  3. Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file. The following example PolicyGenTemplate CR stanza supplies appropriate CPU specifications, sets the hugepages configuration, and adds a new field that sets globallyDisableIrqLoadBalancing to false.

    1. - fileName: PerformanceProfile.yaml
    2. policyName: "config-policy"
    3. metadata:
    4. name: openshift-node-performance-profile
    5. spec:
    6. cpu:
    7. # These must be tailored for the specific hardware platform
    8. isolated: "2-19,22-39"
    9. reserved: "0-1,20-21"
    10. hugepages:
    11. defaultHugepagesSize: 1G
    12. pages:
    13. - size: 1G
    14. count: 10
    15. globallyDisableIrqLoadBalancing: false
  4. Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application.

Example output

The ZTP application generates an ACM policy that contains the generated PerformanceProfile CR. The contents of that CR are derived by merging the metadata and spec contents from the PerformanceProfile entry in the PolicyGenTemplate onto the source CR. The resulting CR has the following content:

  1. ---
  2. apiVersion: performance.openshift.io/v2
  3. kind: PerformanceProfile
  4. metadata:
  5. name: openshift-node-performance-profile
  6. spec:
  7. additionalKernelArgs:
  8. - idle=poll
  9. - rcupdate.rcu_normal_after_boot=0
  10. cpu:
  11. isolated: 2-19,22-39
  12. reserved: 0-1,20-21
  13. globallyDisableIrqLoadBalancing: false
  14. hugepages:
  15. defaultHugepagesSize: 1G
  16. pages:
  17. - count: 10
  18. size: 1G
  19. machineConfigPoolSelector:
  20. pools.operator.machineconfiguration.openshift.io/master: ""
  21. net:
  22. userLevelNetworking: true
  23. nodeSelector:
  24. node-role.kubernetes.io/master: ""
  25. numa:
  26. topologyPolicy: restricted
  27. realTimeKernel:
  28. enabled: true

In the /source-crs folder that you extract from the ztp-site-generate container, the $ syntax is not used for template substitution as implied by the syntax. Rather, if the policyGen tool sees the $ prefix for a string and you do not specify a value for that field in the related PolicyGenTemplate CR, the field is omitted from the output CR entirely.

An exception to this is the $mcp variable in /source-crs YAML files that is substituted with the specified value for mcp from the PolicyGenTemplate CR. For example, in example/policygentemplates/group-du-standard-ranGen.yaml, the value for mcp is worker:

  1. spec:
  2. bindingRules:
  3. group-du-standard: “”
  4. mcp: worker

The policyGen tool replace instances of $mcp with worker in the output CRs.

Filtering custom resources using SiteConfig filters

By using filters, you can easily customize SiteConfig custom resources (CRs) to include or exclude other CRs for use in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline.

You can specify an inclusionDefault value of include or exclude for the SiteConfig CR, along with a list of the specific extraManifest RAN CRs that you want to include or exclude. Setting inclusionDefault to include makes the ZTP pipeline apply all the files in /source-crs/extra-manifest during installation. Setting inclusionDefault to exclude does the opposite.

You can exclude individual CRs from the /source-crs/extra-manifest folder that are otherwise included by default. The following example configures a custom single-node OpenShift SiteConfig CR to exclude the /source-crs/extra-manifest/03-sctp-machine-config-worker.yaml CR at installation time.

Some additional optional filtering scenarios are also described.

Prerequisites

  • You configured the hub cluster for generating the required installation and policy CRs.

  • You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.

Procedure

  1. To prevent the ZTP pipeline from applying the 03-sctp-machine-config-worker.yaml CR file, apply the following YAML in the SiteConfig CR:

    1. apiVersion: ran.openshift.io/v1
    2. kind: SiteConfig
    3. metadata:
    4. name: "site1-sno-du"
    5. namespace: "site1-sno-du"
    6. spec:
    7. baseDomain: "example.com"
    8. pullSecretRef:
    9. name: "assisted-deployment-pull-secret"
    10. clusterImageSetNameRef: "openshift-4.11"
    11. sshPublicKey: "<ssh_public_key>"
    12. clusters:
    13. - clusterName: "site1-sno-du"
    14. extraManifests:
    15. filter:
    16. exclude:
    17. - 03-sctp-machine-config-worker.yaml

    The ZTP pipeline skips the 03-sctp-machine-config-worker.yaml CR during installation. All other CRs in /source-crs/extra-manifest are applied.

  2. Save the SiteConfig CR and and push the changes to the site configuration repository.

    The ZTP pipeline monitors and adjusts what CRs it applies based on the SiteConfig filter instructions.

  3. Optional: To prevent the ZTP pipeline from applying all the /source-crs/extra-manifest CRs during cluster installation, apply the following YAML in the SiteConfig CR:

    1. - clusterName: "site1-sno-du"
    2. extraManifests:
    3. filter:
    4. inclusionDefault: exclude
  4. Optional: To exclude all the /source-crs/extra-manifest RAN CRs and instead include a custom CR file during installation, edit the custom SiteConfig CR to set the custom manifests folder and the include file, for example:

    1. clusters:
    2. - clusterName: "site1-sno-du"
    3. extraManifestPath: "<custom_manifest_folder>" (1)
    4. extraManifests:
    5. filter:
    6. inclusionDefault: exclude (2)
    7. include:
    8. - custom-sctp-machine-config-worker.yaml
    1Replace <custom_manifest_folder> with the name of the folder that contains the custom installation CRs, for example, user-custom-manifest/.
    2Set inclusionDefault to exclude to prevent the ZTP pipeline from applying the files in /source-crs/extra-manifest during installation.

    The following example illustrates the custom folder structure:

    1. siteconfig
    2. ├── site1-sno-du.yaml
    3. └── user-custom-manifest
    4. └── custom-sctp-machine-config-worker.yaml

Configuring PTP fast events using PolicyGenTemplate CRs

You can configure PTP fast events for vRAN clusters that are deployed using the GitOps Zero Touch Provisioning (ZTP) pipeline. Use PolicyGenTemplate custom resources (CRs) as the basis to create a hierarchy of configuration files tailored to your specific site requirements.

Prerequisites

  • Create a Git repository where you manage your custom site configuration data.

Procedure

  1. Add the following YAML into .spec.sourceFiles in the common-ranGen.yaml file to configure the AMQP Operator:

    1. #AMQ interconnect operator for fast events
    2. - fileName: AmqSubscriptionNS.yaml
    3. policyName: "subscriptions-policy"
    4. - fileName: AmqSubscriptionOperGroup.yaml
    5. policyName: "subscriptions-policy"
    6. - fileName: AmqSubscription.yaml
    7. policyName: "subscriptions-policy"
  2. Apply the following PolicyGenTemplate changes to group-du-3node-ranGen.yaml, group-du-sno-ranGen.yaml, or group-du-standard-ranGen.yaml files according to your requirements:

    1. In .sourceFiles, add the PtpOperatorConfig CR file that configures the AMQ transport host to the config-policy:

      1. - fileName: PtpOperatorConfigForEvent.yaml
      2. policyName: "config-policy"
    2. Configure the linuxptp and phc2sys for the PTP clock type and interface. For example, add the following stanza into .sourceFiles:

      1. - fileName: PtpConfigSlave.yaml (1)
      2. policyName: "config-policy"
      3. metadata:
      4. name: "du-ptp-slave"
      5. spec:
      6. profile:
      7. - name: "slave"
      8. interface: "ens5f1" (2)
      9. ptp4lOpts: "-2 -s --summary_interval -4" (3)
      10. phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"
      11. ptpClockThreshold: (4)
      12. holdOverTimeout: 30 #secs
      13. maxOffsetThreshold: 100 #nano secs
      14. minOffsetThreshold: -100 #nano secs
      1Can be one PtpConfigMaster.yaml, PtpConfigSlave.yaml, or PtpConfigSlaveCvl.yaml depending on your requirements. PtpConfigSlaveCvl.yaml configures linuxptp services for an Intel E810 Columbiaville NIC. For configurations based on group-du-sno-ranGen.yaml or group-du-3node-ranGen.yaml, use PtpConfigSlave.yaml.
      2Device specific interface name.
      3You must append the —summary_interval -4 value to ptp4lOpts in .spec.sourceFiles.spec.profile to enable PTP fast events.
      4ptpClockThreshold configues how long the clock stays in clock holdover state. Holdover state is the period between local and master clock synchronizations. Offset is the time difference between the local and master clock.
  3. Apply the following PolicyGenTemplate changes to your specific site YAML files, for example, example-sno-site.yaml:

    1. In .sourceFiles, add the Interconnect CR file that configures the AMQ router to the config-policy:

      1. - fileName: AmqInstance.yaml
      2. policyName: "config-policy"
  4. Merge any other required changes and files with your custom site repository.

  5. Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.

Configuring UEFI secure boot for clusters using PolicyGenTemplate CRs

You can configure UEFI secure boot for vRAN clusters that are deployed using the GitOps zero touch provisioning (ZTP) pipeline.

Prerequisites

  • Create a Git repository where you manage your custom site configuration data.

Procedure

  1. Create the following MachineConfig resource and save it in the uefi-secure-boot.yaml file:

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfig
    3. metadata:
    4. labels:
    5. machineconfiguration.openshift.io/role: master
    6. name: uefi-secure-boot
    7. spec:
    8. config:
    9. ignition:
    10. version: 3.1.0
    11. kernelArguments:
    12. - efi=runtime
  2. In your Git repository custom /siteconfig directory, create a /sno-extra-manifest folder and add the uefi-secure-boot.yaml file, for example:

    1. siteconfig
    2. ├── site1-sno-du.yaml
    3. ├── site2-standard-du.yaml
    4. └── sno-extra-manifest
    5. └── uefi-secure-boot.yaml
  3. In your cluster SiteConfig CR, specify the required values for extraManifestPath and bootMode:

    1. Enter the directory name in the .spec.clusters.extraManifestPath field, for example:

      1. clusters:
      2. - clusterName: "example-cluster"
      3. extraManifestPath: sno-extra-manifest/
    2. Set the value for .spec.clusters.nodes.bootMode to UEFISecureBoot, for example:

      1. nodes:
      2. - hostName: "ran.example.lab"
      3. bootMode: "UEFISecureBoot"
  4. Deploy the cluster using the GitOps ZTP pipeline.

Verification

  1. Open a remote shell to the deployed cluster, for example:

    1. $ oc debug node/node-1.example.com
  2. Verify that the SecureBoot feature is enabled:

    1. sh-4.4# mokutil --sb-state

    Example output

    1. SecureBoot enabled

Configuring bare-metal event monitoring using PolicyGenTemplate CRs

You can configure bare-metal hardware events for vRAN clusters that are deployed using the GitOps Zero Touch Provisioning (ZTP) pipeline.

Prerequisites

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  • Create a Git repository where you manage your custom site configuration data.

Multiple HardwareEvent resources are not permitted.

Procedure

  1. To configure the AMQ Interconnect Operator and the Bare Metal Event Relay Operator, add the following YAML to spec.sourceFiles in the common-ranGen.yaml file:

    1. # AMQ interconnect operator for fast events
    2. - fileName: AmqSubscriptionNS.yaml
    3. policyName: "subscriptions-policy"
    4. - fileName: AmqSubscriptionOperGroup.yaml
    5. policyName: "subscriptions-policy"
    6. - fileName: AmqSubscription.yaml
    7. policyName: "subscriptions-policy"
    8. # Bare Metal Event Rely operator
    9. - fileName: BareMetalEventRelaySubscriptionNS.yaml
    10. policyName: "subscriptions-policy"
    11. - fileName: BareMetalEventRelaySubscriptionOperGroup.yaml
    12. policyName: "subscriptions-policy"
    13. - fileName: BareMetalEventRelaySubscription.yaml
    14. policyName: "subscriptions-policy"
  2. Add the Interconnect CR to .spec.sourceFiles in the site configuration file, for example, the example-sno-site.yaml file:

    1. - fileName: AmqInstance.yaml
    2. policyName: "config-policy"
  3. Add the HardwareEvent CR to spec.sourceFiles in your specific group configuration file, for example, in the group-du-sno-ranGen.yaml file:

    1. - fileName: HardwareEvent.yaml
    2. policyName: "config-policy"
    3. spec:
    4. nodeSelector: {}
    5. transportHost: "amqp://<amq_interconnect_name>.<amq_interconnect_namespace>.svc.cluster.local" (1)
    6. logLevel: "info"
    1The transportHost URL is composed of the existing AMQ Interconnect CR name and namespace. For example, in transportHost: “amqp://amq-router.amq-router.svc.cluster.local”, the AMQ Interconnect name and namespace are both set to amq-router.
  4. Commit the PolicyGenTemplate change in Git, and then push the changes to your site configuration repository to deploy bare-metal events monitoring to new sites using GitOps ZTP.

  5. Create the Redfish Secret by running the following command:

    1. $ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \
    2. --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \
    3. --from-literal=hostaddr="<bmc_host_ip_addr>"

Additional resources

Installing the GitOps ZTP pipeline

The procedures in this section tell you how to complete the following tasks:

  • Prepare the Git repository you need to host site configuration data.

  • Configure the hub cluster for generating the required installation and policy custom resources (CR).

  • Deploy the managed clusters using zero touch provisioning (ZTP).

Preparing the ZTP Git repository

Create a Git repository for hosting site configuration data. The zero touch provisioning (ZTP) pipeline requires read access to this repository.

Procedure

  1. Create a directory structure with separate paths for the SiteConfig and PolicyGenTemplate custom resources (CR).

  2. Export the argocd directory from the ztp-site-generate container image using the following commands:

    1. $ podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10
    1. $ mkdir -p ./out
    1. $ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
  3. Check that the out directory contains the following subdirectories:

    • out/extra-manifest contains the source CR files that SiteConfig uses to generate extra manifest configMap.

    • out/source-crs contains the source CR files that PolicyGenTemplate uses to generate the Red Hat Advanced Cluster Management (RHACM) policies.

    • out/argocd/deployment contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure.

    • out/argocd/example contains the examples for SiteConfig and PolicyGenTemplate files that represent the recommended configuration.

The directory structure under out/argocd/example serves as a reference for the structure and content of your Git repository. The example includes SiteConfig and PolicyGenTemplate reference CRs for single-node, three-node, and standard clusters. Remove references to cluster types that you are not using. The following example describes a set of CRs for a network of single-node clusters:

  1. example/
  2. ├── policygentemplates
  3. ├── common-ranGen.yaml
  4. ├── example-sno-site.yaml
  5. ├── group-du-sno-ranGen.yaml
  6. ├── group-du-sno-validator-ranGen.yaml
  7. ├── kustomization.yaml
  8. └── ns.yaml
  9. └── siteconfig
  10. ├── example-sno.yaml
  11. ├── KlusterletAddonConfigOverride.yaml
  12. └── kustomization.yaml

Keep SiteConfig and PolicyGenTemplate CRs in separate directories. Both the SiteConfig and PolicyGenTemplate directories must contain a kustomization.yaml file that explicitly includes the files in that directory.

This directory structure and the kustomization.yaml files must be committed and pushed to your Git repository. The initial push to Git should include the kustomization.yaml files. The SiteConfig (example-sno.yaml) and PolicyGenTemplate (common-ranGen.yaml, group-du-sno*.yaml, and example-sno-site.yaml) files can be omitted and pushed at a later time as required when deploying a site.

The KlusterletAddonConfigOverride.yaml file is only required if one or more SiteConfig CRs which make reference to it are committed and pushed to Git. See example-sno.yaml for an example of how this is used.

Preparing the hub cluster for ZTP

You can configure your hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CR) for each site based on a zero touch provisioning (ZTP) GitOps flow.

Prerequisites

  • Openshift Cluster 4.8 or 4.9 as the hub cluster

  • Red Hat Advanced Cluster Management (RHACM) Operator 2.3 or 2.4 installed on the hub cluster

  • Red Hat OpenShift GitOps Operator 1.3 on the hub cluster

Procedure

  1. Install the Topology Aware Lifecycle Manager (TALM), which coordinates with any new sites added by ZTP and manages application of the PolicyGenTemplate-generated policies.

  2. Prepare the ArgoCD pipeline configuration:

    1. Create a Git repository with the directory structure similar to the example directory. For more information, see “Preparing the ZTP Git repository”.

    2. Configure access to the repository using the ArgoCD UI. Under Settings configure the following:

      • Repositories - Add the connection information. The URL must end in .git, for example, [https://repo.example.com/repo.git](https://repo.example.com/repo.git) and credentials.

      • Certificates - Add the public certificate for the repository, if needed.

    3. Modify the two ArgoCD Applications, out/argocd/deployment/clusters-app.yaml and out/argocd/deployment/policies-app.yaml, based on your Git repository:

      • Update the URL to point to the Git repository. The URL must end with .git, for example, [https://repo.example.com/repo.git](https://repo.example.com/repo.git).

      • The targetRevision must indicate which Git repository branch to monitor.

      • The path should specify the path to the SiteConfig or PolicyGenTemplate CRs, respectively.

  1. To patch the ArgoCD instance in the hub cluster by using the patch file previously extracted into the out/argocd/deployment/ directory, enter the following command:

    1. $ oc patch argocd openshift-gitops \
    2. -n openshift-gitops --type=merge \
    3. --patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
  2. Apply the pipeline configuration to your hub cluster by using the following command:

    1. $ oc apply -k out/argocd/deployment

Deploying additional changes to clusters

Custom resources (CRs) that are deployed through the GitOps zero touch provisioning (ZTP) pipeline support two goals:

  1. Deploying additional Operators to spoke clusters that are required by typical RAN DU applications running at the network far-edge.

  2. Customizing the OKD installation to provide a high performance platform capable of meeting the strict timing requirements in a minimal CPU budget.

If you require cluster configuration changes outside of the base GitOps ZTP pipeline configuration, there are three options:

Apply the additional configuration after the ZTP pipeline is complete

When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget.

Add content to the ZTP library

The base source CRs that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required.

Create extra manifests for the cluster installation

Extra manifests are applied during installation and makes the installation process more efficient.

Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of OKD.

Additional resources

Adding new content to the GitOps ZTP pipeline

The source CRs in the GitOps ZTP site generator container provide a set of critical features and node tuning settings for RAN Distributed Unit (DU) applications. These are applied to the clusters that you deploy with ZTP. To add or modify existing source CRs in the ztp-site-generate container, rebuild the ztp-site-generate container and make it available to the hub cluster, typically from the disconnected registry associated with the hub cluster. Any valid OKD CR can be added.

Perform the following procedure to add new content to the ZTP pipeline.

Procedure

  1. Create a directory containing a Containerfile and the source CR YAML files that you want to include in the updated ztp-site-generate container, for example:

    1. ztp-update/
    2. ├── example-cr1.yaml
    3. ├── example-cr2.yaml
    4. └── ztp-update.in
  2. Add the following content to the ztp-update.in Containerfile:

    1. FROM registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10
    2. ADD example-cr2.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/
    3. ADD example-cr1.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/
  3. Open a terminal at the ztp-update/ folder and rebuild the container:

    1. $ podman build -t ztp-site-generate-rhel8-custom:v4.10-custom-1
  4. Push the built container image to your disconnected registry, for example:

    1. $ podman push localhost/ztp-site-generate-rhel8-custom:v4.10-custom-1 registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1
  5. Patch the Argo CD instance on the hub cluster to point to the newly built container image:

    1. $ oc patch -n openshift-gitops argocd openshift-gitops --type=json -p '[{"op": "replace", "path":"/spec/repo/initContainers/0/image", "value": "registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1"} ]'

    When the Argo CD instance is patched, the openshift-gitops-repo-server pod automatically restarts.

Verification

  1. Verify that the new openshift-gitops-repo-server pod has completed initialization and that the previous repo pod is terminated:

    1. $ oc get pods -n openshift-gitops | grep openshift-gitops-repo-server

    Example output

    1. openshift-gitops-server-7df86f9774-db682 1/1 Running 1 28s

    You must wait until the new openshift-gitops-repo-server pod has completed initialization and the previous pod is terminated before the newly added container image content is available.

Additional resources

  • Alternatively, you can patch the Argo CD instance as described in Preparing the hub cluster for ZTP by modifying argocd-openshift-gitops-patch.json with an updated initContainer image before applying the patch file.

Customizing extra installation manifests in the ZTP GitOps pipeline

You can define a set of extra manifests for inclusion in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline. These manifests are linked to the SiteConfig custom resources (CRs) and are applied to the cluster during installation. Including MachineConfig CRs at install time makes the installation process more efficient.

Prerequisites

  • Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.

Procedure

  1. Create a set of extra manifest CRs that the ZTP pipeline uses to customize the cluster installs.

  2. In your custom /siteconfig directory, create an /extra-manifest folder for your extra manifests. The following example illustrates a sample /siteconfig with /extra-manifest folder:

    1. siteconfig
    2. ├── site1-sno-du.yaml
    3. ├── site2-standard-du.yaml
    4. └── extra-manifest
    5. └── 01-example-machine-config.yaml
  3. Add your custom extra manifest CRs to the siteconfig/extra-manifest directory.

  4. In your SiteConfig CR, enter the directory name in the extraManifestPath field, for example:

    1. clusters:
    2. - clusterName: "example-sno"
    3. networkType: "OVNKubernetes"
    4. extraManifestPath: extra-manifest
  5. Save the SiteConfig CRs and /extra-manifest CRs and push them to the site configuration repo.

The ZTP pipeline appends the CRs in the /extra-manifest directory to the default set of extra manifests during cluster provisioning.

Deploying a site

Use the following procedure to prepare the hub cluster for site deployment and initiate zero touch provisioning (ZTP) by pushing custom resources (CRs) to your Git repository.

Procedure

  1. Create the required secrets for the site. These resources must be in a namespace with a name matching the cluster name. In out/argocd/example/siteconfig/example-sno.yaml, the cluster name and namespace is example-sno.

    Create the namespace for the cluster using the following commands:

    1. $ export CLUSTERNS=example-sno
    1. $ oc create namespace $CLUSTERNS
  2. Create a pull secret for the cluster. The pull secret must contain all the credentials necessary for installing OKD and all required Operators. In all of the example SiteConfig CRs, the pull secret is named assisted-deployment-pull-secret, as shown below:

    1. $ oc apply -f - <<EOF
    2. apiVersion: v1
    3. kind: Secret
    4. metadata:
    5. name: assisted-deployment-pull-secret
    6. namespace: $CLUSTERNS
    7. type: kubernetes.io/dockerconfigjson
    8. data:
    9. .dockerconfigjson: $(base64 <pull-secret.json)
    10. EOF
  3. Create a BMC authentication secret for each host you are deploying:

    1. $ oc apply -f - <<EOF
    2. apiVersion: v1
    3. kind: Secret
    4. metadata:
    5. name: $(read -p 'Hostname: ' tmp; printf $tmp)-bmc-secret
    6. namespace: $CLUSTERNS
    7. type: Opaque
    8. data:
    9. username: $(read -p 'Username: ' tmp; printf $tmp | base64)
    10. password: $(read -s -p 'Password: ' tmp; printf $tmp | base64)
    11. EOF

    The secrets are referenced from the SiteConfig custom resource (CR) by name. The namespace must match the SiteConfig namespace.

  4. Create a SiteConfig CR for your cluster in your local clone of the Git repository:

    1. Choose the appropriate example for your CR from the out/argocd/example/siteconfig/ folder. The folder includes example files for single node, three-node, and standard clusters:

      • example-sno.yaml

      • example-3node.yaml

      • example-standard.yaml

    2. Change the cluster and host details in the example file to match the type of cluster you want. The following file is a composite of the three files that explains the configuration of each cluster type:

      1. # example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno
      2. ---
      3. apiVersion: ran.openshift.io/v1
      4. kind: SiteConfig
      5. metadata:
      6. name: "example-sno"
      7. namespace: "example-sno"
      8. spec:
      9. baseDomain: "example.com"
      10. pullSecretRef:
      11. name: "assisted-deployment-pull-secret"
      12. clusterImageSetNameRef: "openshift-4.10" (1)
      13. sshPublicKey: "ssh-rsa AAAA..."
      14. clusters:
      15. - clusterName: "example-sno"
      16. networkType: "OVNKubernetes"
      17. clusterLabels: (2)
      18. # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates:
      19. # ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true'
      20. common: true
      21. # ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""'
      22. group-du-sno: ""
      23. # ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"'
      24. # Normally this should match or contain the cluster name so it only applies to a single cluster
      25. sites : "example-sno"
      26. clusterNetwork:
      27. - cidr: 1001:1::/48
      28. hostPrefix: 64
      29. machineNetwork: (3)
      30. - cidr: 1111:2222:3333:4444::/64
      31. # For 3-node and standard clusters with static IPs, the API and Ingress IPs must be configured here
      32. apiVIP: 1111:2222:3333:4444::1:1 (4)
      33. ingressVIP: 1111:2222:3333:4444::1:2 (5)
      34. serviceNetwork:
      35. - 1001:2::/112
      36. additionalNTPSources:
      37. - 1111:2222:3333:4444::2
      38. nodes:
      39. - hostName: "example-node1.example.com" (6)
      40. role: "master"
      41. bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" (7)
      42. bmcCredentialsName:
      43. name: "example-node1-bmh-secret" (8)
      44. bootMACAddress: "AA:BB:CC:DD:EE:11"
      45. bootMode: "UEFI"
      46. rootDeviceHints:
      47. hctl: '0:1:0'
      48. cpuset: "0-1,52-53"
      49. nodeNetwork: (9)
      50. interfaces:
      51. - name: eno1
      52. macAddress: "AA:BB:CC:DD:EE:11"
      53. config:
      54. interfaces:
      55. - name: eno1
      56. type: ethernet
      57. state: up
      58. macAddress: "AA:BB:CC:DD:EE:11"
      59. ipv4:
      60. enabled: false
      61. ipv6:
      62. enabled: true
      63. address:
      64. - ip: 1111:2222:3333:4444::1:1
      65. prefix-length: 64
      66. dns-resolver:
      67. config:
      68. search:
      69. - example.com
      70. server:
      71. - 1111:2222:3333:4444::2
      72. routes:
      73. config:
      74. - destination: ::/0
      75. next-hop-interface: eno1
      76. next-hop-address: 1111:2222:3333:4444::1
      77. table-id: 254
      1Applies to all cluster types. The value must match an image set available on the hub cluster. To see the list of supported versions on your hub, run oc get clusterimagesets.
      2Applies to all cluster types. These values must correspond to the PolicyGenTemplate labels that you define in a later step.
      3Applies to single node clusters. The value defines the cluster network sections for a single node deployment.
      4Applies to three-node and standard clusters. The value defines the cluster network sections.
      5Applies to three-node and standard clusters. The value defines the cluster network sections.
      6Applies to all cluster types. For single node deployments, define one host. For three-node deployments, define three hosts. For standard deployments, define three hosts with role: master and two or more hosts defined with role: worker.
      7Applies to all cluster types. Specifies the BMC address.
      8Applies to all cluster types. Specifies the BMC credentials.
      9Applies to all cluster types. Specifies the network settings for the node.
    3. You can inspect the default set of extra-manifest MachineConfig CRs in out/argocd/extra-manifest. It is automatically applied to the cluster when it is installed.

      Optional: To provision additional install-time manifests on the provisioned cluster, create a directory in your Git repository, for example, sno-extra-manifest/, and add your custom manifest CRs to this directory. If your SiteConfig.yaml refers to this directory in the extraManifestPath field, any CRs in this referenced directory are appended to the default set of extra manifests.

  5. Add the SiteConfig CR to the kustomization.yaml file in the generators section, similar to the example shown in out/argocd/example/siteconfig/kustomization.yaml.

  6. Commit your SiteConfig CR and associated kustomization.yaml in your Git repository.

  7. Push your changes to the Git repository. The ArgoCD pipeline detects the changes and begins the site deployment. You can push the changes to the SiteConfig CR and the PolicyGenTemplate CR simultaneously.

    The SiteConfig CR creates the following CRs on the hub cluster:

    • Namespace - Unique per site

    • AgentClusterInstall

    • BareMetalHost - One per node

    • ClusterDeployment

    • InfraEnv

    • NMStateConfig - One per node

    • ExtraManifestsConfigMap - Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more.

    • ManagedCluster

    • KlusterletAddonConfig

GitOps ZTP and Topology Aware Lifecycle Manager

GitOps zero touch provisioning (ZTP) generates installation and configuration CRs from manifests stored in Git. These artifacts are applied to a centralized hub cluster where Red Hat Advanced Cluster Management (RHACM), assisted installer service, and the Topology Aware Lifecycle Manager (TALM) use the CRs to install and configure the spoke cluster. The configuration phase of the ZTP pipeline uses the TALM to orchestrate the application of the configuration CRs to the cluster. There are several key integration points between GitOps ZTP and the TALM.

Inform policies

By default, GitOps ZTP creates all policies with a remediation action of inform. These policies cause RHACM to report on compliance status of clusters relevant to the policies but does not apply the desired configuration. During the ZTP installation, the TALM steps through the created inform policies, creates a copy for the target spoke cluster(s) and changes the remediation action of the copy to enforce. This pushes the configuration to the spoke cluster. Outside of the ZTP phase of the cluster lifecycle, this setup allows changes to be made to policies without the risk of immediately rolling those changes out to all affected spoke clusters in the network. You can control the timing and the set of clusters that are remediated using TALM.

Automatic creation of ClusterGroupUpgrade CRs

The TALM monitors the state of all ManagedCluster CRs on the hub cluster. Any ManagedCluster CR which does not have a ztp-done label applied, including newly created ManagedCluster CRs, causes the TALM to automatically create a ClusterGroupUpgrade CR with the following characteristics:

  • The ClusterGroupUpgrade CR is created and enabled in the ztp-install namespace.

  • ClusterGroupUpgrade CR has the same name as the ManagedCluster CR.

  • The cluster selector includes only the cluster associated with that ManagedCluster CR.

  • The set of managed policies includes all policies that RHACM has bound to the cluster at the time the ClusterGroupUpgrade is created.

  • Pre-caching is disabled.

  • Timeout set to 4 hours (240 minutes).

    The automatic creation of an enabled ClusterGroupUpgrade ensures that initial zero-touch deployment of clusters proceeds without the need for user intervention. Additionally, the automatic creation of a ClusterGroupUpgrade CR for any ManagedCluster without the ztp-done label allows a failed ZTP installation to be restarted by simply deleting the ClusterGroupUpgrade CR for the cluster.

Waves

Each policy generated from a PolicyGenTemplate CR includes a ztp-deploy-wave annotation. This annotation is based on the same annotation from each CR which is included in that policy. The wave annotation is used to order the policies in the auto-generated ClusterGroupUpgrade CR.

All CRs in the same policy must have the same setting for the ztp-deploy-wave annotation. The default value of this annotation for each CR can be overridden in the PolicyGenTemplate. The wave annotation in the source CR is used for determining and setting the policy wave annotation. This annotation is removed from each built CR which is included in the generated policy at runtime.

The TALM applies the configuration policies in the order specified by the wave annotations. The TALM waits for each policy to be compliant before moving to the next policy. It is important to ensure that the wave annotation for each CR takes into account any prerequisites for those CRs to be applied to the cluster. For example, an Operator must be installed before or concurrently with the configuration for the Operator. Similarly, the CatalogSource for an Operator must be installed in a wave before or concurrently with the Operator Subscription. The default wave value for each CR takes these prerequisites into account.

Multiple CRs and policies can share the same wave number. Having fewer policies can result in faster deployments and lower CPU usage. It is a best practice to group many CRs into relatively few waves.

To check the default wave value in each source CR, run the following command against the out/source-crs directory that is extracted from the ztp-site-generate container image:

  1. $ grep -r "ztp-deploy-wave" out/source-crs

Phase labels

The ClusterGroupUpgrade CR is automatically created and includes directives to annotate the ManagedCluster CR with labels at the start and end of the ZTP process.

When ZTP configuration post-installation commences, the ManagedCluster has the ztp-running label applied. When all policies are remediated to the cluster and are fully compliant, these directives cause the TALM to remove the ztp-running label and apply the ztp-done label.

For deployments which make use of the informDuValidator policy, the ztp-done label is applied when the cluster is fully ready for deployment of applications. This includes all reconciliation and resulting effects of the ZTP applied configuration CRs.

Linked CRs

The automatically created ClusterGroupUpgrade CR has the owner reference set as the ManagedCluster from which it was derived. This reference ensures that deleting the ManagedCluster CR causes the instance of the ClusterGroupUpgrade to be deleted along with any supporting resources.

Monitoring deployment progress

The ArgoCD pipeline uses the SiteConfig and PolicyGenTemplate CRs in Git to generate the cluster configuration CRs and RHACM policies and then sync them to the hub. You can monitor the progress of this synchronization can be monitored in the ArgoCD dashboard.

Procedure

When the synchronization is complete, the installation generally proceeds as follows:

  1. The Assisted Service Operator installs OKD on the cluster. You can monitor the progress of cluster installation from the RHACM dashboard or from the command line:

    1. $ export CLUSTER=<clusterName>
    1. $ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
    1. $ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
  2. The Topology Aware Lifecycle Manager (TALM) applies the configuration policies that are bound to the cluster.

    After the cluster installation is complete and the cluster becomes Ready, a ClusterGroupUpgrade CR corresponding to this cluster, with a list of ordered policies defined by the ran.openshift.io/ztp-deploy-wave annotations, is automatically created by the TALM. The cluster’s policies are applied in the order listed in ClusterGroupUpgrade CR. You can monitor the high-level progress of configuration policy reconciliation using the following commands:

    1. $ export CLUSTER=<clusterName>
    1. $ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
  3. You can monitor the detailed policy compliant status using the RHACM dashboard or the command line:

    1. $ oc get policies -n $CLUSTER

The final policy that becomes compliant is the one defined in the *-du-validator-policy policies. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.

After all policies become complaint, the ztp-done label is added to the cluster, indicating the entire ZTP pipeline is complete for the cluster.

Indication of done for ZTP installations

Zero touch provisioning (ZTP) simplifies the process of checking the ZTP installation status for a cluster. The ZTP status moves through three phases: cluster installation, cluster configuration, and ZTP done.

Cluster installation phase

The cluster installation phase is shown by the ManagedCluster CR ManagedClusterJoined condition. If the ManagedCluster CR does not have this condition, or the condition is set to False, the cluster is still in the installation phase. Additional details about installation are available from the AgentClusterInstall and ClusterDeployment CRs. For more information, see “Troubleshooting GitOps ZTP”.

Cluster configuration phase

The cluster configuration phase is shown by a ztp-running label applied the ManagedCluster CR for the cluster.

ZTP done

Cluster installation and configuration is complete in the ZTP done phase. This is shown by the removal of the ztp-running label and addition of the ztp-done label to the ManagedCluster CR. The ztp-done label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.

The transition to the ZTP done state is conditional on the compliant state of a Red Hat Advanced Cluster Management (RHACM) static validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the spoke cluster is complete.

The validator inform policy ensures the configuration of the distributed unit (DU) cluster is fully applied and Operators have completed their initialization. The policy validates the following:

  • The target MachineConfigPool contains the expected entries and has finished updating. All nodes are available and not degraded.

  • The SR-IOV Operator has completed initialization as indicated by at least one SriovNetworkNodeState with syncStatus: Succeeded.

  • The PTP Operator daemon set exists.

    The policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the spoke cluster is complete.

    The validator inform policy is included in the reference group PolicyGenTemplate CRs. For reliable indication of the ZTP done state, this validator inform policy must be included in the ZTP pipeline.

Creating a validator inform policy

Use the following procedure to create a validator inform policy that provides an indication of when the zero touch provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy can be used for deployments of single node clusters, three-node clusters, and standard clusters.

Procedure

  1. Create a stand-alone PolicyGenTemplate custom resource (CR) that contains the source file validatorCRs/informDuValidator.yaml. You only need one stand-alone PolicyGenTemplate CR for each cluster type.

    Single node clusters

    1. group-du-sno-validator-ranGen.yaml
    2. apiVersion: ran.openshift.io/v1
    3. kind: PolicyGenTemplate
    4. metadata:
    5. name: "group-du-sno-validator" (1)
    6. namespace: "ztp-group" (2)
    7. spec:
    8. bindingRules:
    9. group-du-sno: "" (3)
    10. bindingExcludedRules:
    11. ztp-done: "" (4)
    12. mcp: "master" (5)
    13. sourceFiles:
    14. - fileName: validatorCRs/informDuValidator.yaml
    15. remediationAction: inform (6)
    16. policyName: "du-policy" (7)

    Three-node clusters

    1. group-du-3node-validator-ranGen.yaml
    2. apiVersion: ran.openshift.io/v1
    3. kind: PolicyGenTemplate
    4. metadata:
    5. name: "group-du-3node-validator" (1)
    6. namespace: "ztp-group" (2)
    7. spec:
    8. bindingRules:
    9. group-du-3node: "" (3)
    10. bindingExcludedRules:
    11. ztp-done: "" (4)
    12. mcp: "master" (5)
    13. sourceFiles:
    14. - fileName: validatorCRs/informDuValidator.yaml
    15. remediationAction: inform (6)
    16. policyName: "du-policy" (7)

    Standard clusters

    1. group-du-standard-validator-ranGen.yaml
    2. apiVersion: ran.openshift.io/v1
    3. kind: PolicyGenTemplate
    4. metadata:
    5. name: "group-du-standard-validator" (1)
    6. namespace: "ztp-group" (2)
    7. spec:
    8. bindingRules:
    9. group-du-standard: "" (3)
    10. bindingExcludedRules:
    11. ztp-done: "" (4)
    12. mcp: "worker" (5)
    13. sourceFiles:
    14. - fileName: validatorCRs/informDuValidator.yaml
    15. remediationAction: inform (6)
    16. policyName: "du-policy" (7)
    1The name of PolicyGenTemplates object. This name is also used as part of the names for the placementBinding, placementRule, and policy that are created in the requested namespace.
    2This value should match the namespace used in the group PolicyGenTemplates.
    3The group-du-* label defined in bindingRules must exist in the SiteConfig files.
    4The label defined in bindingExcludedRules must beztp-done:. The ztp-done label is used in coordination with the Topology Aware Lifecycle Manager.
    5mcp defines the MachineConfigPool object that is used in the source file validatorCRs/informDuValidator.yaml. It should be master for single node and three-node cluster deployments and worker for standard cluster deployments.
    6Optional. The default value is inform.
    7This value is used as part of the name for the generated RHACM policy. The generated validator policy for the single node example is named group-du-sno-validator-du-policy.
  2. Push the files to the ZTP Git repository.

Querying the policy compliance status for each cluster

After you have created the validator inform policies for your clusters and pushed them to the the zero touch provisioning (ZTP) Git repository, you can check the status of each cluster for policy compliance.

Procedure

  1. To query the status of the spoke clusters, use either the Red Hat Advanced Cluster Management (RHACM) web console or the CLI:

    • To query status from the RHACM web console, perform the following actions:

      1. Click GovernanceFind policies.

      2. Search for du-validator-policy.

      3. Click into the policy.

    • To query status using the CLI, run the following command:

      1. $ oc get policies du-validator-policy -n <namespace_for_common> -o jsonpath={'.status.status'} | jq

      When all of the policies including the validator inform policy applied to the cluster become compliant, ZTP installation and configuration for this cluster is complete.

  2. To query the cluster violation/compliant status from the ACM web console, click GovernanceCluster violations.

  3. Check the validator policy compliant status for a cluster using the following commands:

    1. Export the cluster name:

      1. $ export CLUSTER=<cluster_name>
    2. Get the policy:

      1. $ oc get policies -n $CLUSTER | grep <validator_policy_name>

    Alternatively, you can use the following command:

    1. $ oc get policies -n <namespace-for-group> <validatorPolicyName> -o jsonpath="{.status.status[?(@.clustername=='$CLUSTER')]}" | jq

    After the *-validator-du-policy RHACM policy becomes compliant for the cluster, the validator policy is unbound for this cluster and the ztp-done label is added to the cluster. This acts as a persistent indicator that the whole ZTP pipeline has completed for the cluster.

Node Tuning Operator

The Node Tuning Operator provides the ability to enable advanced node performance tunings on a set of nodes.

OKD provides a Node Tuning Operator to implement automatic tuning to achieve low latency performance for OKD applications. The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way.

The administrator can specify updating the kernel to rt-kernel, reserving CPUs for management workloads, and using CPUs for running the workloads.

In earlier versions of OKD, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OKD 4.11, these functions are part of the Node Tuning Operator.

Troubleshooting GitOps ZTP

The ArgoCD pipeline uses the SiteConfig and PolicyGenTemplate custom resources (CRs) from Git to generate the cluster configuration CRs and Red Hat Advanced Cluster Management (RHACM) policies. Use the following steps to troubleshoot issues that might occur during this process.

file// Module included in the following assemblies:

Validating the generation of installation CRs

The GitOps zero touch provisioning (ZTP) infrastructure generates a set of installation CRs on the hub cluster in response to a SiteConfig CR pushed to your Git repository. You can check that the installation CRs were created by using the following command:

  1. $ oc get AgentClusterInstall -n <cluster_name>

If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from SiteConfig files to the installation CRs.

Procedure

  1. Verify that the SiteConfig→ManagedCluster was generated to the hub cluster:

    1. $ oc get managedcluster
  2. If the SiteConfig ManagedCluster is missing, see if the clusters application failed to synchronize the files from the Git repository to the hub:

    1. $ oc describe -n openshift-gitops application clusters
  3. Check for Status: Conditions: to view the error logs. For example, setting an invalid value for extraManifestPath: in the siteConfig file raises an error as shown below:

    1. Status:
    2. Conditions:
    3. Last Transition Time: 2021-11-26T17:21:39Z
    4. Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/siteconfigs/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not create extra-manifest ranSite1.extra-manifest3 stat extra-manifest3: no such file or directory
    5. 2021/11/26 17:21:40 Error: could not build the entire SiteConfig defined by /tmp/kust-plugin-config-913473579: stat extra-manifest3: no such file or directory
    6. Error: failure in plugin configured via /tmp/kust-plugin-config-913473579; exit status 1: exit status 1
    7. Type: ComparisonError
  4. Check for Status: Sync:. If there are log errors, Status: Sync: could indicate an Unknown error:

    1. Status:
    2. Sync:
    3. Compared To:
    4. Destination:
    5. Namespace: clusters-sub
    6. Server: https://kubernetes.default.svc
    7. Source:
    8. Path: sites-config
    9. Repo URL: https://git.com/ran-sites/siteconfigs/.git
    10. Target Revision: master
    11. Status: Unknown

Validating the generation of configuration policy CRs

Policy custom resources (CRs) are generated in the same namespace as the PolicyGenTemplate from which they are created. The same troubleshooting flow applies to all policy CRs generated from a PolicyGenTemplate regardless of whether they are ztp-common, ztp-group, or ztp-site based, as shown using the following commands:

  1. $ export NS=<namespace>
  1. $ oc get policy -n $NS

The expected set of policy-wrapped CRs should be displayed.

If the policies failed synchronization, use the following troubleshooting steps.

Procedure

  1. To display detailed information about the policies, run the following command:

    1. $ oc describe -n openshift-gitops application policies
  2. Check for Status: Conditions: to show the error logs. For example, setting an invalid sourceFile→fileName: generates the error shown below:

    1. Status:
    2. Conditions:
    3. Last Transition Time: 2021-11-26T17:21:39Z
    4. Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory
    5. Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1
    6. Type: ComparisonError
  3. Check for Status: Sync:. If there are log errors at Status: Conditions:, the Status: Sync: shows Unknown or Error:

    1. Status:
    2. Sync:
    3. Compared To:
    4. Destination:
    5. Namespace: policies-sub
    6. Server: https://kubernetes.default.svc
    7. Source:
    8. Path: policies
    9. Repo URL: https://git.com/ran-sites/policies/.git
    10. Target Revision: master
    11. Status: Error
  4. When Red Hat Advanced Cluster Management (RHACM) recognizes that policies apply to a ManagedCluster object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:

    1. $ oc get policy -n $CLUSTER

    Example output:

    1. NAME REMEDIATION ACTION COMPLIANCE STATE AGE
    2. ztp-common.common-config-policy inform Compliant 13d
    3. ztp-common.common-subscriptions-policy inform Compliant 13d
    4. ztp-group.group-du-sno-config-policy inform Compliant 13d
    5. Ztp-group.group-du-sno-validator-du-policy inform Compliant 13d
    6. ztp-site.example-sno-config-policy inform Compliant 13d

    RHACM copies all applicable policies into the cluster namespace. The copied policy names have the format: <policyGenTemplate.Namespace>.<policyGenTemplate.Name>-<policyName>.

  5. Check the placement rule for any policies not copied to the cluster namespace. The matchSelector in the PlacementRule for those policies should match labels on the ManagedCluster object:

    1. $ oc get placementrule -n $NS
  6. Note the PlacementRule name appropriate for the missing policy, common, group, or site, using the following command:

    1. $ oc get placementrule -n $NS <placementRuleName> -o yaml
    • The status-decisions should include your cluster name.

    • The key-value pair of the matchSelector in the spec must match the labels on your managed cluster.

  7. Check the labels on the ManagedCluster object using the following command:

    1. $ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
  8. Check to see which policies are compliant using the following command:

    1. $ oc get policy -n $CLUSTER

    If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the spoke cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.

Restarting policies reconciliation

Use the following procedure to restart policies reconciliation in the event of unexpected compliance issues. This procedure is required when the ClusterGroupUpgrade CR has timed out.

Procedure

  1. A ClusterGroupUpgrade CR is generated in the namespace ztp-install by the Topology Aware Lifecycle Manager after the managed spoke cluster becomes Ready:

    1. $ export CLUSTER=<clusterName>
    1. $ oc get clustergroupupgrades -n ztp-install $CLUSTER
  2. If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the ClusterGroupUpgrade CR shows UpgradeTimedOut:

    1. $ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
  3. A ClusterGroupUpgrade CR in the UpgradeTimedOut state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existing ClusterGroupUpgrade CR. This triggers the automatic creation of a new ClusterGroupUpgrade CR that begins reconciling the policies immediately:

    1. $ oc delete clustergroupupgrades -n ztp-install $CLUSTER

Note that when the ClusterGroupUpgrade CR completes with status UpgradeCompleted and the managed spoke cluster has the label ztp-done applied, you can make additional configuration changes using PolicyGenTemplate. Deleting the existing ClusterGroupUpgrade CR will not make the TALM generate a new CR.

At this point, ZTP has completed its interaction with the cluster and any further interactions should be treated as an upgrade.

Additional resources

Site cleanup

Remove a site and the associated installation and configuration policy CRs by removing the SiteConfig and PolicyGenTemplate file names from the kustomization.yaml file. When you run the ZTP pipeline again, the generated CRs are removed. If you want to permanently remove a site, you should also remove the SiteConfig and site-specific PolicyGenTemplate files from the Git repository. If you want to remove a site temporarily, for example when redeploying a site, you can leave the SiteConfig and site-specific PolicyGenTemplate CRs in the Git repository.

After removing the SiteConfig file, if the corresponding clusters remain in the detach process, check Red Hat Advanced Cluster Management (RHACM) for information about cleaning up the detached managed cluster.

Additional resources

Removing obsolete content

If a change to the PolicyGenTemplate file configuration results in obsolete policies, for example, policies are renamed, use the following procedure to remove those policies in an automated way.

Procedure

  1. Remove the affected PolicyGenTemplate files from the Git repository, commit and push to the remote repository.

  2. Wait for the changes to synchronize through the application and the affected policies to be removed from the hub cluster.

  3. Add the updated PolicyGenTemplate files back to the Git repository, and then commit and push to the remote repository.

Note that removing the zero touch provisioning (ZTP) distributed unit (DU) profile policies from the Git repository, and as a result also removing them from the hub cluster, does not affect any configuration of the managed spoke clusters. Removing a policy from the hub cluster does not delete it from the spoke cluster and the CRs managed by that policy.

As an alternative, after making changes to PolicyGenTemplate files that result in obsolete policies, you can remove these policies from the hub cluster manually. You can delete policies from the RHACM console using the Governance tab or by using the following command:

  1. $ oc delete policy -n <namespace> <policyName>

Tearing down the pipeline

If you need to remove the ArgoCD pipeline and all generated artifacts follow this procedure:

Procedure

  1. Detach all clusters from RHACM.

  2. Delete the kustomization.yaml file in the deployment directory using the following command:

    1. $ oc delete -k out/argocd/deployment

Upgrading GitOps ZTP

You can upgrade the Gitops zero touch provisioning (ZTP) infrastructure independently from the underlying cluster, Red Hat Advanced Cluster Management (RHACM), and OKD version running on the spoke clusters. This procedure guides you through the upgrade process to avoid impact on the spoke clusters. However, any changes to the content or settings of policies, including adding recommended content, results in changes that must be rolled out and reconciled to the spoke clusters.

Prerequisites

  • This procedure assumes that you have a fully operational hub cluster running the earlier version of the GitOps ZTP infrastructure.

Procedure

At a high level, the strategy for upgrading the GitOps ZTP infrastructure is:

  1. Label all existing clusters with the ztp-done label.

  2. Stop the ArgoCD applications.

  3. Install the new tooling.

  4. Update required content and optional changes in the Git repository.

  5. Update and restart the application configuration.

Preparing for the upgrade

Use the following procedure to prepare your site for the GitOps zero touch provisioning (ZTP) upgrade.

Procedure

  1. Obtain the latest version of the GitOps ZTP container from which you can extract a set of custom resources (CRs) used to configure the GitOps operator on the hub cluster for use in the GitOps ZTP solution.

  2. Extract the argocd/deployment directory using the following commands:

    1. $ mkdir -p ./out
    1. $ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out

    The /out directory contains the following subdirectories:

    • out/extra-manifest: contains the source CR files that the SiteConfig CR uses to generate the extra manifest configMap.

    • out/source-crs: contains the source CR files that the PolicyGenTemplate CR uses to generate the Red Hat Advanced Cluster Management (RHACM) policies.

    • out/argocd/deployment: contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure.

    • out/argocd/example: contains example SiteConfig and PolicyGenTemplate files that represent the recommended configuration.

  3. Update the clusters-app.yaml and policies-app.yaml files to reflect the name of your applications and the URL, branch, and path for your Git repository.

If the upgrade includes changes to policies that may result in obsolete policies, these policies should be removed prior to performing the upgrade.

Labeling the existing clusters

To ensure that existing clusters remain untouched by the tooling updates, all existing managed clusters must be labeled with the ztp-done label.

Procedure

  1. Find a label selector that lists the managed clusters that were deployed with zero touch provisioning (ZTP), such as local-cluster!=true:

    1. $ oc get managedcluster -l 'local-cluster!=true'
  2. Ensure that the resulting list contains all the managed clusters that were deployed with ZTP, and then use that selector to add the ztp-done label:

    1. $ oc label managedcluster -l 'local-cluster!=true' ztp-done=

Stopping the existing GitOps ZTP applications

Removing the existing applications ensures that any changes to existing content in the Git repository are not rolled out until the new version of the tooling is available.

Use the application files from the deployment directory. If you used custom names for the applications, update the names in these files first.

Procedure

  1. Perform a non-cascaded delete on the clusters application to leave all generated resources in place:

    1. $ oc delete -f out/argocd/deployment/clusters-app.yaml
  2. Perform a cascaded delete on the policies application to remove all previous policies:

    1. $ oc patch -f policies-app.yaml -p '{"metadata": {"finalizers": ["resources-finalizer.argocd.argoproj.io"]}}' --type merge
    1. $ oc delete -f out/argocd/deployment/policies-app.yaml

Topology Aware Lifecycle Manager

Install the Topology Aware Lifecycle Manager (TALM) on the hub cluster.

Additional resources

Required changes to the Git repository

When upgrading from an earlier release to OKD 4.10, additional requirements are placed on the contents of the Git repository. Existing content in the repository must be updated to reflect these changes.

  • Changes to PolicyGenTemplate files:

    All PolicyGenTemplate files must be created in a Namespace prefixed with ztp. This ensures that the GitOps zero touch provisioning (ZTP) application is able to manage the policy CRs generated by GitOps ZTP without conflicting with the way Red Hat Advanced Cluster Management (RHACM) manages the policies internally.

  • Remove the pre-sync.yaml and post-sync.yaml files:

    This step is optional but recommended. When the kustomization.yaml files are added, the pre-sync.yaml and post-sync.yaml files are no longer used. They must be removed to avoid confusion and can potentially cause errors if kustomization files are inadvertantly removed. Note that there is a set of pre-sync.yaml and post-sync.yaml files under both the SiteConfig and PolicyGenTemplate trees.

  • Add the kustomization.yaml file to the repository:

    All SiteConfig and PolicyGenTemplate CRs must be included in a kustomization.yaml file under their respective directory trees. For example:

    1. ├── policygentemplates
    2. ├── site1-ns.yaml
    3. ├── site1.yaml
    4. ├── site2-ns.yaml
    5. ├── site2.yaml
    6. ├── common-ns.yaml
    7. ├── common-ranGen.yaml
    8. ├── group-du-sno-ranGen-ns.yaml
    9. ├── group-du-sno-ranGen.yaml
    10. └── kustomization.yaml
    11. └── siteconfig
    12. ├── site1.yaml
    13. ├── site2.yaml
    14. └── kustomization.yaml

    The files listed in the generator sections must contain either SiteConfig or PolicyGenTemplate CRs only. If your existing YAML files contain other CRs, for example, Namespace, these other CRs must be pulled out into separate files and listed in the resources section.

    The PolicyGenTemplate kustomization file must contain all PolicyGenTemplate YAML files in the generator section and Namespace CRs in the resources section. For example:

    1. apiVersion: kustomize.config.k8s.io/v1beta1
    2. kind: Kustomization
    3. generators:
    4. - common-ranGen.yaml
    5. - group-du-sno-ranGen.yaml
    6. - site1.yaml
    7. - site2.yaml
    8. resources:
    9. - common-ns.yaml
    10. - group-du-sno-ranGen-ns.yaml
    11. - site1-ns.yaml
    12. - site2-ns.yaml

    The SiteConfig kustomization file must contain all SiteConfig YAML files in the generator section and any other CRs in the resources:

    1. apiVersion: kustomize.config.k8s.io/v1beta1
    2. kind: Kustomization
    3. generators:
    4. - site1.yaml
    5. - site2.yaml
  • Review and incorporate recommended changes

    Each release may include additional recommended changes to the configuration applied to deployed clusters. Typically these changes result in lower CPU use by the OpenShift platform, additional features, or improved tuning of the platform.

    Review the reference SiteConfig and PolicyGenTemplate CRs applicable to the types of cluster in your network. These examples can be found in the argocd/example directory extracted from the GitOps ZTP container.

Installing the new GitOps ZTP applications

Using the extracted argocd/deployment directory, and after ensuring that the applications point to your Git repository, apply the full contents of the deployment directory. Applying the full contents of the directory ensures that all necessary resources for the applications are correctly configured.

Procedure

  1. To patch the ArgoCD instance in the hub cluster by using the patch file previously extracted into the out/argocd/deployment/ directory, enter the following command:

    1. $ oc patch argocd openshift-gitops \
    2. -n openshift-gitops --type=merge \
    3. --patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
  2. To apply the contents of the argocd/deployment directory, enter the following command:

    1. $ oc apply -k out/argocd/deployment

Roll out the configuration changes

If any configuration changes were included in the upgrade due to implementing recommended changes, the upgrade process results in a set of policy CRs on the hub cluster in the Non-Compliant state. As of the OKD 4.10 release, these policies are set to inform mode and are not pushed to the spoke clusters without an additional step by the user. This ensures that potentially disruptive changes to the clusters can be managed in terms of when the changes are made, for example, during a maintenance window, and how many clusters are updated concurrently.

To roll out the changes, create one or more ClusterGroupUpgrade CRs as detailed in the TALM documentation. The CR must contain the list of Non-Compliant policies that you want to push out to the spoke clusters as well as a list or selector of which clusters should be included in the update.

Additional resources

Manually install a single managed cluster

This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the SiteConfig method described in “Creating ZTP custom resources for multiple managed clusters”.

Prerequisites

  • Enable the Assisted Installer service.

  • Ensure network connectivity:

    • The container within the hub must be able to reach the Baseboard Management Controller (BMC) address of the target bare-metal host.

    • The managed cluster must be able to resolve and reach the hub’s API hostname and *.app hostname. Here is an example of the hub’s API and *.app hostname:

      1. console-openshift-console.apps.hub-cluster.internal.domain.com
      2. api.hub-cluster.internal.domain.com
    • The hub must be able to resolve and reach the API and *.app hostname of the managed cluster. Here is an example of the managed cluster’s API and *.app hostname:

      1. console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
      2. api.sno-managed-cluster-1.internal.domain.com
    • A DNS server that is IP reachable from the target bare-metal host.

  • A target bare-metal host for the managed cluster with the following hardware minimums:

    • 4 CPU or 8 vCPU

    • 32 GiB RAM

    • 120 GiB disk for root file system

  • When working in a disconnected environment, the release image must be mirrored. Use this command to mirror the release image:

    1. $ oc adm release mirror -a <pull_secret.json>
    2. --from=quay.io/openshift-release-dev/ocp-release:{{ mirror_version_spoke_release }}
    3. --to={{ provisioner_cluster_registry }}/ocp4 --to-release-image={{
    4. provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }}
  • You mirrored the ISO and rootfs used to generate the spoke cluster ISO to an HTTP server and configured the settings to pull images from there.

    The images must match the version of the ClusterImageSet. To deploy a 4.9.0 version, the rootfs and ISO must be set at 4.9.0.

Procedure

  1. Create a ClusterImageSet for each specific cluster version that needs to be deployed. A ClusterImageSet has the following format:

    1. apiVersion: hive.openshift.io/v1
    2. kind: ClusterImageSet
    3. metadata:
    4. name: openshift-4.9.0-rc.0 (1)
    5. spec:
    6. releaseImage: quay.io/openshift-release-dev/ocp-release:4.9.0-x86_64 (2)
    1The descriptive version that you want to deploy.
    2Specifies the releaseImage to deploy and determines the OS Image version. The discovery ISO is based on an OS image version as the releaseImage, or latest if the exact version is unavailable.
  2. Create the Namespace definition for the managed cluster:

    1. apiVersion: v1
    2. kind: Namespace
    3. metadata:
    4. name: <cluster_name> (1)
    5. labels:
    6. name: <cluster_name> (1)
    1The name of the managed cluster to provision.
  3. Create the BMC Secret custom resource:

    1. apiVersion: v1
    2. data:
    3. password: <bmc_password> (1)
    4. username: <bmc_username> (2)
    5. kind: Secret
    6. metadata:
    7. name: <cluster_name>-bmc-secret
    8. namespace: <cluster_name>
    9. type: Opaque
    1The password to the target bare-metal host. Must be base-64 encoded.
    2The username to the target bare-metal host. Must be base-64 encoded.
  4. Create the Image Pull Secret custom resource:

    1. apiVersion: v1
    2. data:
    3. .dockerconfigjson: <pull_secret> (1)
    4. kind: Secret
    5. metadata:
    6. name: assisted-deployment-pull-secret
    7. namespace: <cluster_name>
    8. type: kubernetes.io/dockerconfigjson
    1The OKD pull secret. Must be base-64 encoded.
  5. Create the AgentClusterInstall custom resource:

    1. apiVersion: extensions.hive.openshift.io/v1beta1
    2. kind: AgentClusterInstall
    3. metadata:
    4. # Only include the annotation if using OVN, otherwise omit the annotation
    5. annotations:
    6. agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
    7. name: <cluster_name>
    8. namespace: <cluster_name>
    9. spec:
    10. clusterDeploymentRef:
    11. name: <cluster_name>
    12. imageSetRef:
    13. name: <cluster_image_set> (1)
    14. networking:
    15. clusterNetwork:
    16. - cidr: <cluster_network_cidr> (2)
    17. hostPrefix: 23
    18. machineNetwork:
    19. - cidr: <machine_network_cidr> (3)
    20. serviceNetwork:
    21. - <service_network_cidr> (4)
    22. provisionRequirements:
    23. controlPlaneAgents: 1
    24. workerAgents: 0
    25. sshPublicKey: <public_key> (5)
    1The name of the ClusterImageSet custom resource used to install OKD on the bare-metal host.
    2A block of IPv4 or IPv6 addresses in CIDR notation used for communication among cluster nodes.
    3A block of IPv4 or IPv6 addresses in CIDR notation used for the target bare-metal host external communication. Also used to determine the API and Ingress VIP addresses when provisioning DU single-node clusters.
    4A block of IPv4 or IPv6 addresses in CIDR notation used for cluster services internal communication.
    5A plain text string. You can use the public key to SSH into the node after it has finished installing.

    If you want to configure a static IP address for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters.

  6. Create the ClusterDeployment custom resource:

    1. apiVersion: hive.openshift.io/v1
    2. kind: ClusterDeployment
    3. metadata:
    4. name: <cluster_name>
    5. namespace: <cluster_name>
    6. spec:
    7. baseDomain: <base_domain> (1)
    8. clusterInstallRef:
    9. group: extensions.hive.openshift.io
    10. kind: AgentClusterInstall
    11. name: <cluster_name>
    12. version: v1beta1
    13. clusterName: <cluster_name>
    14. platform:
    15. agentBareMetal:
    16. agentSelector:
    17. matchLabels:
    18. cluster-name: <cluster_name>
    19. pullSecretRef:
    20. name: assisted-deployment-pull-secret
    1The managed cluster’s base domain.
  7. Create the KlusterletAddonConfig custom resource:

    1. apiVersion: agent.open-cluster-management.io/v1
    2. kind: KlusterletAddonConfig
    3. metadata:
    4. name: <cluster_name>
    5. namespace: <cluster_name>
    6. spec:
    7. clusterName: <cluster_name>
    8. clusterNamespace: <cluster_name>
    9. clusterLabels:
    10. cloud: auto-detect
    11. vendor: auto-detect
    12. applicationManager:
    13. enabled: true
    14. certPolicyController:
    15. enabled: false
    16. iamPolicyController:
    17. enabled: false
    18. policyController:
    19. enabled: true
    20. searchCollector:
    21. enabled: false (1)
    1Keep searchCollector disabled. Set to true to enable the KlusterletAddonConfig CR or false to disable the KlusterletAddonConfig CR.
  8. Create the ManagedCluster custom resource:

    1. apiVersion: cluster.open-cluster-management.io/v1
    2. kind: ManagedCluster
    3. metadata:
    4. name: <cluster_name>
    5. spec:
    6. hubAcceptsClient: true
  9. Create the InfraEnv custom resource:

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: InfraEnv
    3. metadata:
    4. name: <cluster_name>
    5. namespace: <cluster_name>
    6. spec:
    7. clusterRef:
    8. name: <cluster_name>
    9. namespace: <cluster_name>
    10. sshAuthorizedKey: <public_key> (1)
    11. agentLabelSelector:
    12. matchLabels:
    13. cluster-name: <cluster_name>
    14. pullSecretRef:
    15. name: assisted-deployment-pull-secret
    1Entered as plain text. You can use the public key to SSH into the target bare-metal host when it boots from the ISO.
  10. Create the BareMetalHost custom resource:

    1. apiVersion: metal3.io/v1alpha1
    2. kind: BareMetalHost
    3. metadata:
    4. name: <cluster_name>
    5. namespace: <cluster_name>
    6. annotations:
    7. inspect.metal3.io: disabled
    8. labels:
    9. infraenvs.agent-install.openshift.io: "<cluster_name>"
    10. spec:
    11. bootMode: "UEFI"
    12. bmc:
    13. address: <bmc_address> (1)
    14. disableCertificateVerification: true
    15. credentialsName: <cluster_name>-bmc-secret
    16. bootMACAddress: <mac_address> (2)
    17. automatedCleaningMode: disabled
    18. online: true
    1The baseboard management console address of the installation ISO on the target bare-metal host.
    2The MAC address of the target bare-metal host.

    Optionally, you can add bmac.agent-install.openshift.io/hostname: <host-name> as an annotation to set the managed cluster’s hostname. If you don’t add the annotation, the hostname will default to either a hostname from the DHCP server or local host.

  11. After you have created the custom resources, push the entire directory of generated custom resources to the Git repository you created for storing the custom resources.

Next steps

To provision additional clusters, repeat this procedure for each cluster.

Configuring BIOS for distributed unit bare-metal hosts

Distributed unit (DU) hosts require the BIOS to be configured before the host can be provisioned. The BIOS configuration is dependent on the specific hardware that runs your DUs and the particular requirements of your installation.

Procedure

  1. Set the UEFI/BIOS Boot Mode to UEFI.

  2. In the host boot sequence order, set Hard drive first.

  3. Apply the specific BIOS configuration for your hardware. The following table describes a representative BIOS configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.

    The exact BIOS configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.

    Table 2. Sample BIOS configuration for an Intel Xeon Skylake or Cascade Lake server
    BIOS SettingConfiguration

    CPU Power and Performance Policy

    Performance

    Uncore Frequency Scaling

    Disabled

    Performance P-limit

    Disabled

    Enhanced Intel SpeedStep ® Tech

    Enabled

    Intel Configurable TDP

    Enabled

    Configurable TDP Level

    Level 2

    Intel® Turbo Boost Technology

    Enabled

    Energy Efficient Turbo

    Disabled

    Hardware P-States

    Disabled

    Package C-State

    C0/C1 state

    C1E

    Disabled

    Processor C6

    Disabled

Enable global SR-IOV and VT-d settings in the BIOS for the host. These settings are relevant to bare-metal environments.

Configuring static IP addresses for managed clusters

Optionally, after creating the AgentClusterInstall custom resource, you can configure static IP addresses for the managed clusters.

You must create this custom resource before creating the ClusterDeployment custom resource.

Prerequisites

  • Deploy and configure the AgentClusterInstall custom resource.

Procedure

  1. Create a NMStateConfig custom resource:

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: NMStateConfig
    3. metadata:
    4. name: <cluster_name>
    5. namespace: <cluster_name>
    6. labels:
    7. sno-cluster-<cluster-name>: <cluster_name>
    8. spec:
    9. config:
    10. interfaces:
    11. - name: eth0
    12. type: ethernet
    13. state: up
    14. ipv4:
    15. enabled: true
    16. address:
    17. - ip: <ip_address> (1)
    18. prefix-length: <public_network_prefix> (2)
    19. dhcp: false
    20. dns-resolver:
    21. config:
    22. server:
    23. - <dns_resolver> (3)
    24. routes:
    25. config:
    26. - destination: 0.0.0.0/0
    27. next-hop-address: <gateway> (4)
    28. next-hop-interface: eth0
    29. table-id: 254
    30. interfaces:
    31. - name: "eth0" (5)
    32. macAddress: <mac_address> (6)
    1The static IP address of the target bare-metal host.
    2The static IP address’s subnet prefix for the target bare-metal host.
    3The DNS server for the target bare-metal host.
    4The gateway for the target bare-metal host.
    5Must match the name specified in the interfaces section.
    6The mac address of the interface.
  2. When creating the BareMetalHost custom resource, ensure that one of its mac addresses matches a mac address in the NMStateConfig target bare-metal host.

  3. When creating the InfraEnv custom resource, reference the label from the NMStateConfig custom resource in the InfraEnv custom resource:

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: InfraEnv
    3. metadata:
    4. name: <cluster_name>
    5. namespace: <cluster_name>
    6. spec:
    7. clusterRef:
    8. name: <cluster_name>
    9. namespace: <cluster_name>
    10. sshAuthorizedKey: <public_key>
    11. agentLabelSelector:
    12. matchLabels:
    13. cluster-name: <cluster_name>
    14. pullSecretRef:
    15. name: assisted-deployment-pull-secret
    16. nmStateConfigLabelSelector:
    17. matchLabels:
    18. sno-cluster-<cluster-name>: <cluster_name> # Match this label

Automated Discovery image ISO process for provisioning clusters

After you create the custom resources, the following actions happen automatically:

  1. A Discovery image ISO file is generated and booted on the target machine.

  2. When the ISO file successfully boots on the target machine it reports the hardware information of the target machine.

  3. After all hosts are discovered, OKD is installed.

  4. When OKD finishes installing, the hub installs the klusterlet service on the target cluster.

  5. The requested add-on services are installed on the target cluster.

The Discovery image ISO process finishes when the Agent custom resource is created on the hub for the managed cluster.

Checking the managed cluster status

Ensure that cluster provisioning was successful by checking the cluster status.

Prerequisites

  • All of the custom resources have been configured and provisioned, and the Agent custom resource is created on the hub for the managed cluster.

Procedure

  1. Check the status of the managed cluster:

    1. $ oc get managedcluster

    True indicates the managed cluster is ready.

  2. Check the agent status:

    1. $ oc get agent -n <cluster_name>
  3. Use the describe command to provide an in-depth description of the agent’s condition. Statuses to be aware of include BackendError, InputError, ValidationsFailing, InstallationFailed, and AgentIsConnected. These statuses are relevant to the Agent and AgentClusterInstall custom resources.

    1. $ oc describe agent -n <cluster_name>
  4. Check the cluster provisioning status:

    1. $ oc get agentclusterinstall -n <cluster_name>
  5. Use the describe command to provide an in-depth description of the cluster provisioning status:

    1. $ oc describe agentclusterinstall -n <cluster_name>
  6. Check the status of the managed cluster’s add-on services:

    1. $ oc get managedclusteraddon -n <cluster_name>
  7. Retrieve the authentication information of the kubeconfig file for the managed cluster:

    1. $ oc get secret -n <cluster_name> <cluster_name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster_name>-kubeconfig

Configuring a managed cluster for a disconnected environment

After you have completed the preceding procedure, follow these steps to configure the managed cluster for a disconnected environment.

Prerequisites

  • A disconnected installation of Red Hat Advanced Cluster Management (RHACM) 2.3.

  • Host the rootfs and iso images on an HTTPD server.

If you enable TLS for the HTTPD server, you must confirm the root certificate is signed by an authority trusted by the client and verify the trusted certificate chain between your OKD hub and spoke clusters and the HTTPD server. Using a server configured with an untrusted certificate prevents the images from being downloaded to the image creation service. Using untrusted HTTPS servers is not supported.

Procedure

  1. Create a ConfigMap containing the mirror registry config:

    1. apiVersion: v1
    2. kind: ConfigMap
    3. metadata:
    4. name: assisted-installer-mirror-config
    5. namespace: assisted-installer
    6. labels:
    7. app: assisted-service
    8. data:
    9. ca-bundle.crt: <certificate> (1)
    10. registries.conf: | (2)
    11. unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
    12. [[registry]]
    13. location = <mirror_registry_url> (3)
    14. insecure = false
    15. mirror-by-digest-only = true
    1The mirror registry’s certificate used when creating the mirror registry.
    2The configuration for the mirror registry.
    3The URL of the mirror registry.

    This updates mirrorRegistryRef in the AgentServiceConfig custom resource, as shown below:

    Example output

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: AgentServiceConfig
    3. metadata:
    4. name: agent
    5. namespace: assisted-installer
    6. spec:
    7. databaseStorage:
    8. volumeName: <db_pv_name>
    9. accessModes:
    10. - ReadWriteOnce
    11. resources:
    12. requests:
    13. storage: <db_storage_size>
    14. filesystemStorage:
    15. volumeName: <fs_pv_name>
    16. accessModes:
    17. - ReadWriteOnce
    18. resources:
    19. requests:
    20. storage: <fs_storage_size>
    21. mirrorRegistryRef:
    22. name: 'assisted-installer-mirror-config'
    23. osImages:
    24. - openshiftVersion: <ocp_version>
    25. rootfs: <rootfs_url> (1)
    26. url: <iso_url> (1)
    1Must match the URLs of the HTTPD server.
  2. For disconnected installations, you must deploy an NTP clock that is reachable through the disconnected network. You can do this by configuring chrony to act as server, editing the /etc/chrony.conf file, and adding the following allowed IPv6 range:

    1. # Allow NTP client access from local network.
    2. #allow 192.168.0.0/16
    3. local stratum 10
    4. bindcmdaddress ::
    5. allow 2620:52:0:1310::/64

Configuring IPv6 addresses for a disconnected environment

Optionally, when you are creating the AgentClusterInstall custom resource, you can configure IPV6 addresses for the managed clusters.

Procedure

  1. In the AgentClusterInstall custom resource, modify the IP addresses in clusterNetwork and serviceNetwork for IPv6 addresses:

    1. apiVersion: extensions.hive.openshift.io/v1beta1
    2. kind: AgentClusterInstall
    3. metadata:
    4. # Only include the annotation if using OVN, otherwise omit the annotation
    5. annotations:
    6. agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
    7. name: <cluster_name>
    8. namespace: <cluster_name>
    9. spec:
    10. clusterDeploymentRef:
    11. name: <cluster_name>
    12. imageSetRef:
    13. name: <cluster_image_set>
    14. networking:
    15. clusterNetwork:
    16. - cidr: "fd01::/48"
    17. hostPrefix: 64
    18. machineNetwork:
    19. - cidr: <machine_network_cidr>
    20. serviceNetwork:
    21. - "fd02::/112"
    22. provisionRequirements:
    23. controlPlaneAgents: 1
    24. workerAgents: 0
    25. sshPublicKey: <public_key>
  2. Update the NMStateConfig custom resource with the IPv6 addresses you defined.

Generating RAN policies

Prerequisites

Procedure

  1. Configure the kustomization.yaml file to reference the policyGenerator.yaml file. The following example shows the PolicyGenerator definition:

    1. apiVersion: policyGenerator/v1
    2. kind: PolicyGenerator
    3. metadata:
    4. name: acm-policy
    5. namespace: acm-policy-generator
    6. # The arguments should be given and defined as below with same order --policyGenTempPath= --sourcePath= --outPath= --stdout --customResources
    7. argsOneLiner: ./ranPolicyGenTempExamples ./sourcePolicies ./out true false

    Where:

    • policyGenTempPath is the path to the policyGenTemp files.

    • sourcePath: is the path to the source policies.

    • outPath: is the path to save the generated ACM policies.

    • stdout: If true, prints the generated policies to the console.

    • customResources: If true generates the CRs from the sourcePolicies files without ACM policies.

  2. Test PolicyGen by running the following commands:

    1. $ cd cnf-features-deploy/ztp/ztp-policy-generator/
    1. $ XDG_CONFIG_HOME=./ kustomize build --enable-alpha-plugins

    An out directory is created with the expected policies, as shown in this example:

    1. out
    2. ├── common
    3. ├── common-log-sub-ns-policy.yaml
    4. ├── common-log-sub-oper-policy.yaml
    5. ├── common-log-sub-policy.yaml
    6. ├── common-nto-sub-catalog-policy.yaml
    7. ├── common-nto-sub-ns-policy.yaml
    8. ├── common-nto-sub-oper-policy.yaml
    9. ├── common-nto-sub-policy.yaml
    10. ├── common-policies-placementbinding.yaml
    11. ├── common-policies-placementrule.yaml
    12. ├── common-ptp-sub-ns-policy.yaml
    13. ├── common-ptp-sub-oper-policy.yaml
    14. ├── common-ptp-sub-policy.yaml
    15. ├── common-sriov-sub-ns-policy.yaml
    16. ├── common-sriov-sub-oper-policy.yaml
    17. └── common-sriov-sub-policy.yaml
    18. ├── groups
    19. ├── group-du
    20. ├── group-du-mc-chronyd-policy.yaml
    21. ├── group-du-mc-mount-ns-policy.yaml
    22. ├── group-du-mcp-du-policy.yaml
    23. ├── group-du-mc-sctp-policy.yaml
    24. ├── group-du-policies-placementbinding.yaml
    25. ├── group-du-policies-placementrule.yaml
    26. ├── group-du-ptp-config-policy.yaml
    27. └── group-du-sriov-operconfig-policy.yaml
    28. └── group-sno-du
    29. ├── group-du-sno-policies-placementbinding.yaml
    30. ├── group-du-sno-policies-placementrule.yaml
    31. ├── group-sno-du-console-policy.yaml
    32. ├── group-sno-du-log-forwarder-policy.yaml
    33. └── group-sno-du-log-policy.yaml
    34. └── sites
    35. └── site-du-sno-1
    36. ├── site-du-sno-1-policies-placementbinding.yaml
    37. ├── site-du-sno-1-policies-placementrule.yaml
    38. ├── site-du-sno-1-sriov-nn-fh-policy.yaml
    39. ├── site-du-sno-1-sriov-nnp-mh-policy.yaml
    40. ├── site-du-sno-1-sriov-nw-fh-policy.yaml
    41. ├── site-du-sno-1-sriov-nw-mh-policy.yaml
    42. └── site-du-sno-1-.yaml

    The common policies are flat because they will be applied to all clusters. However, the groups and sites have subdirectories for each group and site as they will be applied to different clusters.

Troubleshooting the managed cluster

Use this procedure to diagnose any installation issues that might occur with the managed clusters.

Procedure

  1. Check the status of the managed cluster:

    1. $ oc get managedcluster

    Example output

    1. NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
    2. SNO-cluster true True True 2d19h

    If the status in the AVAILABLE column is True, the managed cluster is being managed by the hub.

    If the status in the AVAILABLE column is Unknown, the managed cluster is not being managed by the hub. Use the following steps to continue checking to get more information.

  2. Check the AgentClusterInstall install status:

    1. $ oc get clusterdeployment -n <cluster_name>

    Example output

    1. NAME PLATFORM REGION CLUSTERTYPE INSTALLED INFRAID VERSION POWERSTATE AGE
    2. Sno0026 agent-baremetal false Initialized
    3. 2d14h

    If the status in the INSTALLED column is false, the installation was unsuccessful.

  3. If the installation failed, enter the following command to review the status of the AgentClusterInstall resource:

    1. $ oc describe agentclusterinstall -n <cluster_name> <cluster_name>
  4. Resolve the errors and reset the cluster:

    1. Remove the cluster’s managed cluster resource:

      1. $ oc delete managedcluster <cluster_name>
    2. Remove the cluster’s namespace:

      1. $ oc delete namespace <cluster_name>

      This deletes all of the namespace-scoped custom resources created for this cluster. You must wait for the ManagedCluster CR deletion to complete before proceeding.

    3. Recreate the custom resources for the managed cluster.

Updating managed policies with the Topology Aware Lifecycle Manager

You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of multiple OpenShift clusters. TALM uses Red Hat Advanced Cluster Management (RHACM) policies to perform changes on the target clusters.

The Topology Aware Lifecycle Manager is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Additional resources

About the auto-created ClusterGroupUpgrade CR for ZTP

TALM has a controller called ManagedClusterForCGU that monitors the Ready state of the ManagedCluster CRs on the hub cluster and creates the ClusterGroupUpgrade CRs for ZTP (zero touch provisioning).

For any managed cluster in the Ready state without a “ztp-done” label applied, the ManagedClusterForCGU controller automatically creates a ClusterGroupUpgrade CR in the ztp-install namespace with its associated RHACM policies that are created during the ZTP process. TALM then remediates the set of configuration policies that are listed in the auto-created ClusterGroupUpgrade CR to push the configuration CRs to the managed cluster.

If the managed cluster has no bound policies when the cluster becomes Ready, no ClusterGroupUpgrade CR is created.

Example of an auto-created ClusterGroupUpgrade CR for ZTP

  1. apiVersion: ran.openshift.io/v1alpha1
  2. kind: ClusterGroupUpgrade
  3. metadata:
  4. generation: 1
  5. name: spoke1
  6. namespace: ztp-install
  7. ownerReferences:
  8. - apiVersion: cluster.open-cluster-management.io/v1
  9. blockOwnerDeletion: true
  10. controller: true
  11. kind: ManagedCluster
  12. name: spoke1
  13. uid: 98fdb9b2-51ee-4ee7-8f57-a84f7f35b9d5
  14. resourceVersion: "46666836"
  15. uid: b8be9cd2-764f-4a62-87d6-6b767852c7da
  16. spec:
  17. actions:
  18. afterCompletion:
  19. addClusterLabels:
  20. ztp-done: "" (1)
  21. deleteClusterLabels:
  22. ztp-running: ""
  23. deleteObjects: true
  24. beforeEnable:
  25. addClusterLabels:
  26. ztp-running: "" (2)
  27. clusters:
  28. - spoke1
  29. enable: true
  30. managedPolicies:
  31. - common-spoke1-config-policy
  32. - common-spoke1-subscriptions-policy
  33. - group-spoke1-config-policy
  34. - spoke1-config-policy
  35. - group-spoke1-validator-du-policy
  36. preCaching: false
  37. remediationStrategy:
  38. maxConcurrency: 1
  39. timeout: 240
1Applied to the managed cluster when TALM completes the cluster configuration.
2Applied to the managed cluster when TALM starts deploying the configuration policies.

End-to-end procedures for updating clusters in a disconnected environment

If you have deployed spoke clusters with distributed unit (DU) profiles using the GitOps ZTP with the Topology Aware Lifecycle Manager (TALM) pipeline described in “Deploying distributed units at scale in a disconnected environment”, this procedure describes how to upgrade your spoke clusters and Operators.

Preparing for the updates

If both the hub and the spoke clusters are running OKD 4.9, you must update ZTP from version 4.9 to 4.10. If OKD 4.10 is used, you can set up the environment.

Setting up the environment

TALM can perform both platform and Operator updates.

You must mirror both the platform image and Operator images that you want to update to in your mirror registry before you can use TALM to update your disconnected clusters. Complete the following steps to mirror the images:

  • For platform updates, you must perform the following steps:

    1. Mirror the desired OKD image repository. Ensure that the desired platform image is mirrored by following the “Mirroring the OKD image repository” procedure linked in the Additional Resources. Save the contents of the imageContentSources section in the imageContentSources.yaml file:

      Example output

      1. imageContentSources:
      2. - mirrors:
      3. - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4
      4. source: quay.io/openshift-release-dev/ocp-release
      5. - mirrors:
      6. - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4
      7. source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
    2. Save the image signature of the desired platform image that was mirrored. You must add the image signature to the PolicyGenTemplate CR for platform updates. To get the image signature, perform the following steps:

      1. Specify the desired OKD tag by running the following command:

        1. $ OCP_RELEASE_NUMBER=<release_version>
      2. Specify the architecture of the server by running the following command:

        1. $ ARCHITECTURE=<server_architecture>
      3. Get the release image digest from Quay by running the following command

        1. $ DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')"
      4. Set the digest algorithm by running the following command:

        1. $ DIGEST_ALGO="${DIGEST%%:*}"
      5. Set the digest signature by running the following command:

        1. $ DIGEST_ENCODED="${DIGEST#*:}"
      6. Get the image signature from the mirror.openshift.com website by running the following command:

        1. $ SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo)
      7. Save the image signature to the checksum-<OCP_RELEASE_NUMBER>.yaml file by running the following commands:

        1. $ cat >checksum-${OCP_RELEASE_NUMBER}.yaml <<EOF
        2. ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64}
        3. EOF
    3. Prepare the update graph. You have two options to prepare the update graph:

      1. Use the OpenShift Update Service.

        For more information about how to set up the graph on the hub cluster, see Deploy the operator for OpenShift Update Service and Build the graph data init container.

      2. Make a local copy of the upstream graph. Host the update graph on an http or https server in the disconnected environment that has access to the spoke cluster. To download the update graph, use the following command:

        1. $ curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.10 -o ~/upgrade-graph_stable-4.10
  • For Operator updates, you must perform the following task:

    • Mirror the Operator catalogs. Ensure that the desired operator images are mirrored by following the procedure in the “Mirroring Operator catalogs for use with disconnected clusters” section.

Additional resources

Performing a platform update

You can perform a platform update with the TALM.

Prerequisites

  • Install the Topology Aware Lifecycle Manager (TALM).

  • Update ZTP to the latest version.

  • Provision one or more managed clusters with ZTP.

  • Mirror the desired image repository.

  • Log in as a user with cluster-admin privileges.

  • Create RHACM policies in the hub cluster.

Procedure

  1. Create a PolicyGenTemplate CR for the platform update:

    1. Save the following contents of the PolicyGenTemplate CR in the du-upgrade.yaml file.

      Example of PolicyGenTemplate for platform update

      1. apiVersion: ran.openshift.io/v1
      2. kind: PolicyGenTemplate
      3. metadata:
      4. name: "du-upgrade"
      5. namespace: "ztp-group-du-sno"
      6. spec:
      7. bindingRules:
      8. group-du-sno: ""
      9. mcp: "master"
      10. remediationAction: inform
      11. sourceFiles:
      12. - fileName: ImageSignature.yaml (1)
      13. policyName: "platform-upgrade-prep"
      14. binaryData:
      15. ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} (2)
      16. - fileName: DisconnectedICSP.yaml
      17. policyName: "platform-upgrade-prep"
      18. metadata:
      19. name: disconnected-internal-icsp-for-ocp
      20. spec:
      21. repositoryDigestMirrors: (3)
      22. - mirrors:
      23. - quay-intern.example.com/ocp4/openshift-release-dev
      24. source: quay.io/openshift-release-dev/ocp-release
      25. - mirrors:
      26. - quay-intern.example.com/ocp4/openshift-release-dev
      27. source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
      28. - fileName: ClusterVersion.yaml (4)
      29. policyName: "platform-upgrade-prep"
      30. metadata:
      31. name: version
      32. annotations:
      33. ran.openshift.io/ztp-deploy-wave: "1"
      34. spec:
      35. channel: "stable-4.10"
      36. upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10
      37. - fileName: ClusterVersion.yaml (5)
      38. policyName: "platform-upgrade"
      39. metadata:
      40. name: version
      41. spec:
      42. channel: "stable-4.10"
      43. upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10
      44. desiredUpdate:
      45. version: 4.10.4
      46. status:
      47. history:
      48. - version: 4.10.4
      49. state: "Completed"
      1The ConfigMap CR contains the signature of the desired release image to update to.
      2Shows the image signature of the desired OKD release. Get the signature from the checksum-${OCP_RELASE_NUMBER}.yaml file you saved when following the procedures in the “Setting up the environment” section.
      3Shows the mirror repository that contains the desired OKD image. Get the mirrors from the imageContentSources.yaml file that you saved when following the procedures in the “Setting up the environment” section.
      4Shows the ClusterVersion CR to update upstream.
      5Shows the ClusterVersion CR to trigger the update. The channel, upstream, and desiredVersion fields are all required for image pre-caching.

      The PolicyGenTemplate CR generates two policies:

      • The du-upgrade-platform-upgrade-prep policy does the preparation work for the platform update. It creates the ConfigMap CR for the desired release image signature, creates the image content source of the mirrored release image repository, and updates the cluster version with the desired update channel and the update graph reachable by the spoke cluster in the disconnected environment.

      • The du-upgrade-platform-upgrade policy is used to perform platform upgrade.

    2. Add the du-upgrade.yaml file contents to the kustomization.yaml file located in the ZTP Git repository for the PolicyGenTemplate CRs and push the changes to the Git repository.

      ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.

    3. Check the created policies by running the following command:

      1. $ oc get policies -A | grep platform-upgrade
  2. Apply the required update resources before starting the platform update with the TALM.

    1. Save the content of the platform-upgrade-prep ClusterUpgradeGroup CR with the du-upgrade-platform-upgrade-prep policy and the target spoke clusters to the cgu-platform-upgrade-prep.yml file, as shown in the following example:

      1. apiVersion: ran.openshift.io/v1alpha1
      2. kind: ClusterGroupUpgrade
      3. metadata:
      4. name: cgu-platform-upgrade-prep
      5. namespace: default
      6. spec:
      7. managedPolicies:
      8. - du-upgrade-platform-upgrade-prep
      9. clusters:
      10. - spoke1
      11. remediationStrategy:
      12. maxConcurrency: 1
      13. enable: true
    2. Apply the policy to the hub cluster by running the following command:

      1. $ oc apply -f cgu-platform-upgrade-prep.yml
    3. Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:

      1. $ oc get policies --all-namespaces
  3. Create the ClusterGroupUpdate CR for the platform update with the spec.enable field set to false.

    1. Save the content of the platform update ClusterGroupUpdate CR with the du-upgrade-platform-upgrade policy and the target clusters to the cgu-platform-upgrade.yml file, as shown in the following example:

      1. apiVersion: ran.openshift.io/v1alpha1
      2. kind: ClusterGroupUpgrade
      3. metadata:
      4. name: cgu-platform-upgrade
      5. namespace: default
      6. spec:
      7. managedPolicies:
      8. - du-upgrade-platform-upgrade
      9. preCaching: false
      10. clusters:
      11. - spoke1
      12. remediationStrategy:
      13. maxConcurrency: 1
      14. enable: false
    2. Apply the ClusterGroupUpdate CR to the hub cluster by running the following command:

      1. $ oc apply -f cgu-platform-upgrade.yml
  4. Optional: Pre-cache the images for the platform update.

    1. Enable pre-caching in the ClusterGroupUpdate CR by running the following command:

      1. $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \
      2. --patch '{"spec":{"preCaching": true}}' --type=merge
    2. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the hub cluster:

      1. $ oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}'
  5. Start the platform update:

    1. Enable the cgu-platform-upgrade policy and disable pre-caching by running the following command:

      1. $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \
      2. --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
    2. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:

      1. $ oc get policies --all-namespaces

Additional resources

Performing an Operator update

You can perform an Operator update with the TALM.

Prerequisites

  • Install the Topology Aware Lifecycle Manager (TALM).

  • Update ZTP to the latest version.

  • Provision one or more managed clusters with ZTP.

  • Mirror the desired index image, bundle images, and all Operator images referenced in the bundle images.

  • Log in as a user with cluster-admin privileges.

  • Create RHACM policies in the hub cluster.

Procedure

  1. Update the PolicyGenTemplate CR for the Operator update.

    1. Update the du-upgrade PolicyGenTemplate CR with the following additional contents in the du-upgrade.yaml file:

      1. apiVersion: ran.openshift.io/v1
      2. kind: PolicyGenTemplate
      3. metadata:
      4. name: "du-upgrade"
      5. namespace: "ztp-group-du-sno"
      6. spec:
      7. bindingRules:
      8. group-du-sno: ""
      9. mcp: "master"
      10. remediationAction: inform
      11. sourceFiles:
      12. - fileName: DefaultCatsrc.yaml
      13. remediationAction: inform
      14. policyName: "operator-catsrc-policy"
      15. metadata:
      16. name: redhat-operators
      17. spec:
      18. displayName: Red Hat Operators Catalog
      19. image: registry.example.com:5000/olm/redhat-operators:v4.10 (1)
      20. updateStrategy: (2)
      21. registryPoll:
      22. interval: 1h
      1The index image URL contains the desired Operator images. If the index images are always pushed to the same image name and tag, this change is not needed.
      2Set how frequently the Operator Lifecycle Manager (OLM) polls the index image for new Operator versions with the registryPoll.interval field. This change is not needed if a new index image tag is always pushed for y-stream and z-stream Operator updates. The registryPoll.interval field can be set to a shorter interval to expedite the update, however shorter intervals increase computational load. To counteract this, you can restore registryPoll.interval to the default value once the update is complete.
    2. This update generates one policy, du-upgrade-operator-catsrc-policy, to update the redhat-operators catalog source with the new index images that contain the desired Operators images.

      If you want to use the image pre-caching for Operators and there are Operators from a different catalog source other than redhat-operators, you must perform the following tasks:

      • Prepare a separate catalog source policy with the new index image or registry poll interval update for the different catalog source.

      • Prepare a separate subscription policy for the desired Operators that are from the different catalog source.

      For example, the desired SRIOV-FEC Operator is available in the certified-operators catalog source. To update the catalog source and the Operator subscription, add the following contents to generate two policies, du-upgrade-fec-catsrc-policy and du-upgrade-subscriptions-fec-policy:

      1. apiVersion: ran.openshift.io/v1
      2. kind: PolicyGenTemplate
      3. metadata:
      4. name: "du-upgrade"
      5. namespace: "ztp-group-du-sno"
      6. spec:
      7. bindingRules:
      8. group-du-sno: ""
      9. mcp: "master"
      10. remediationAction: inform
      11. sourceFiles:
      12. - fileName: DefaultCatsrc.yaml
      13. remediationAction: inform
      14. policyName: "fec-catsrc-policy"
      15. metadata:
      16. name: certified-operators
      17. spec:
      18. displayName: Intel SRIOV-FEC Operator
      19. image: registry.example.com:5000/olm/far-edge-sriov-fec:v4.10
      20. updateStrategy:
      21. registryPoll:
      22. interval: 10m
      23. - fileName: AcceleratorsSubscription.yaml
      24. policyName: "subscriptions-fec-policy"
      25. spec:
      26. channel: "stable"
      27. source: certified-operators
    3. Remove the specified subscriptions channels in the common PolicyGenTemplate CR, if they exist. The default subscriptions channels from the ZTP image are used for the update.

    4. Push the PolicyGenTemplate CRs updates to the ZTP Git repository.

      ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.

    5. Check the created policies by running the following command:

      1. $ oc get policies -A | grep -E "catsrc-policy|subscription"
  2. Apply the required catalog source updates before starting the Operator update.

    1. Save the content of the ClusterGroupUpgrade CR named operator-upgrade-prep with the catalog source policies and the target spoke clusters to the cgu-operator-upgrade-prep.yml file:

      1. apiVersion: ran.openshift.io/v1alpha1
      2. kind: ClusterGroupUpgrade
      3. metadata:
      4. name: cgu-operator-upgrade-prep
      5. namespace: default
      6. spec:
      7. clusters:
      8. - spoke1
      9. enable: true
      10. managedPolicies:
      11. - du-upgrade-operator-catsrc-policy
      12. remediationStrategy:
      13. maxConcurrency: 1
    2. Apply the policy to the hub cluster by running the following command:

      1. $ oc apply -f cgu-operator-upgrade-prep.yml
    3. Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:

      1. $ oc get policies -A | grep -E "catsrc-policy"
  3. Create the ClusterGroupUpgrade CR for the Operator update with the spec.enable field set to false.

    1. Save the content of the Operator update ClusterGroupUpgrade CR with the du-upgrade-operator-catsrc-policy policy and the subscription policies created from the common PolicyGenTemplate and the target clusters to the cgu-operator-upgrade.yml file, as shown in the following example:

      1. apiVersion: ran.openshift.io/v1alpha1
      2. kind: ClusterGroupUpgrade
      3. metadata:
      4. name: cgu-operator-upgrade
      5. namespace: default
      6. spec:
      7. managedPolicies:
      8. - du-upgrade-operator-catsrc-policy (1)
      9. - common-subscriptions-policy (2)
      10. preCaching: false
      11. clusters:
      12. - spoke1
      13. remediationStrategy:
      14. maxConcurrency: 1
      15. enable: false
      1The policy is needed by the image pre-caching feature to retrieve the operator images from the catalog source.
      2The policy contains Operator subscriptions. If you have upgraded ZTP from 4.9 to 4.10 by following “Upgrade ZTP from 4.9 to 4.10”, all Operator subscriptions are grouped into the common-subscriptions-policy policy.

      One ClusterGroupUpgrade CR can only pre-cache the images of the desired Operators defined in the subscription policy from one catalog source included in the ClusterGroupUpgrade CR. If the desired Operators are from different catalog sources, such as in the example of the SRIOV-FEC Operator, another ClusterGroupUpgrade CR must be created with du-upgrade-fec-catsrc-policy and du-upgrade-subscriptions-fec-policy policies for the SRIOV-FEC Operator images pre-caching and update.

    2. Apply the ClusterGroupUpgrade CR to the hub cluster by running the following command:

      1. $ oc apply -f cgu-operator-upgrade.yml
  4. Optional: Pre-cache the images for the Operator update.

    1. Before starting image pre-caching, verify the subscription policy is NonCompliant at this point by running the following command:

      1. $ oc get policy common-subscriptions-policy -n <policy_namespace>

      Example output

      1. NAME REMEDIATION ACTION COMPLIANCE STATE AGE
      2. common-subscriptions-policy inform NonCompliant 27d
    2. Enable pre-caching in the ClusterGroupUpgrade CR by running the following command:

      1. $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \
      2. --patch '{"spec":{"preCaching": true}}' --type=merge
    3. Monitor the process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster:

      1. $ oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}'
    4. Check if the pre-caching is completed before starting the update by running the following command:

      1. $ oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq

      Example output

      1. [
      2. {
      3. "lastTransitionTime": "2022-03-08T20:49:08.000Z",
      4. "message": "The ClusterGroupUpgrade CR is not enabled",
      5. "reason": "UpgradeNotStarted",
      6. "status": "False",
      7. "type": "Ready"
      8. },
      9. {
      10. "lastTransitionTime": "2022-03-08T20:55:30.000Z",
      11. "message": "Precaching is completed",
      12. "reason": "PrecachingCompleted",
      13. "status": "True",
      14. "type": "PrecachingDone"
      15. }
      16. ]
  5. Start the Operator update.

    1. Enable the cgu-operator-upgrade ClusterGroupUpgrade CR and disable pre-caching to start the Operator update by running the following command:

      1. $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \
      2. --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
    2. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:

      1. $ oc get policies --all-namespaces

Additional resources

Performing a platform and an Operator update together

You can perform a platform and an Operator update at the same time.

Prerequisites

  • Install the Topology Aware Lifecycle Manager (TALM).

  • Update ZTP to the latest version.

  • Provision one or more managed clusters with ZTP.

  • Log in as a user with cluster-admin privileges.

  • Create RHACM policies in the hub cluster.

Procedure

  1. Create the PolicyGenTemplate CR for the updates by following the steps described in the “Performing a platform update” and “Performing an Operator update” sections.

  2. Apply the prep work for the platform and the Operator update.

    1. Save the content of the ClusterGroupUpgrade CR with the policies for platform update preparation work, catalog source updates, and target clusters to the cgu-platform-operator-upgrade-prep.yml file, for example:

      1. apiVersion: ran.openshift.io/v1alpha1
      2. kind: ClusterGroupUpgrade
      3. metadata:
      4. name: cgu-platform-operator-upgrade-prep
      5. namespace: default
      6. spec:
      7. managedPolicies:
      8. - du-upgrade-platform-upgrade-prep
      9. - du-upgrade-operator-catsrc-policy
      10. clusterSelector:
      11. - group-du-sno
      12. remediationStrategy:
      13. maxConcurrency: 10
      14. enable: true
    2. Apply the cgu-platform-operator-upgrade-prep.yml file to the hub cluster by running the following command:

      1. $ oc apply -f cgu-platform-operator-upgrade-prep.yml
    3. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:

      1. $ oc get policies --all-namespaces
  3. Create the ClusterGroupUpdate CR for the platform and the Operator update with the spec.enable field set to false.

    1. Save the contents of the platform and Operator update ClusterGroupUpdate CR with the policies and the target clusters to the cgu-platform-operator-upgrade.yml file, as shown in the following example:

      1. apiVersion: ran.openshift.io/v1alpha1
      2. kind: ClusterGroupUpgrade
      3. metadata:
      4. name: cgu-du-upgrade
      5. namespace: default
      6. spec:
      7. managedPolicies:
      8. - du-upgrade-platform-upgrade (1)
      9. - du-upgrade-operator-catsrc-policy (2)
      10. - common-subscriptions-policy (3)
      11. preCaching: true
      12. clusterSelector:
      13. - group-du-sno
      14. remediationStrategy:
      15. maxConcurrency: 1
      16. enable: false
      1This is the platform update policy.
      2This is the policy containing the catalog source information for the Operators to be updated. It is needed for the pre-caching feature to determine which Operator images to download to the spoke cluster.
      3This is the policy to update the Operators.
    2. Apply the cgu-platform-operator-upgrade.yml file to the hub cluster by running the following command:

      1. $ oc apply -f cgu-platform-operator-upgrade.yml
  4. Optional: Pre-cache the images for the platform and the Operator update.

    1. Enable pre-caching in the ClusterGroupUpgrade CR by running the following command:

      1. $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \
      2. --patch '{"spec":{"preCaching": true}}' --type=merge
    2. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster:

      1. $ oc get jobs,pods -n openshift-talm-pre-cache
    3. Check if the pre-caching is completed before starting the update by running the following command:

      1. $ oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}'
  5. Start the platform and Operator update.

    1. Enable the cgu-du-upgrade ClusterGroupUpgrade CR to start the platform and the Operator update by running the following command:

      1. $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \
      2. --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
    2. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:

      1. $ oc get policies --all-namespaces

      The CRs for the platform and Operator updates can be created from the beginning by configuring the setting to spec.enable: true. In this case, the update starts immediately after pre-caching completes and there is no need to manually enable the CR.

      Both pre-caching and the update create extra resources, such as policies, placement bindings, placement rules, managed cluster actions, and managed cluster view, to help complete the procedures. Setting the afterCompletion.deleteObjects field to true deletes all these resources after the updates complete.