- Deploying distributed units at scale in a disconnected environment
- Provisioning edge sites at scale
- About ZTP and distributed units on OpenShift clusters
- The GitOps approach
- Zero touch provisioning building blocks
- How to plan your RAN policies
- Low latency for distributed units (DUs)
- Preparing the disconnected environment
- Installing Red Hat Advanced Cluster Management in a disconnected environment
- Enabling assisted installer service on bare metal
- ZTP custom resources
- PolicyGenTemplate CRs for RAN deployments
- About the PolicyGenTemplate
- Best practices when customizing PolicyGenTemplate CRs
- Creating the PolicyGenTemplate CR
- Creating ZTP custom resources for multiple managed clusters
- Using PolicyGenTemplate CRs to override source CRs content
- Filtering custom resources using SiteConfig filters
- Configuring PTP fast events using PolicyGenTemplate CRs
- Configuring UEFI secure boot for clusters using PolicyGenTemplate CRs
- Configuring bare-metal event monitoring using PolicyGenTemplate CRs
- Installing the GitOps ZTP pipeline
- Adding new content to the GitOps ZTP pipeline
- Customizing extra installation manifests in the ZTP GitOps pipeline
- Deploying a site
- GitOps ZTP and Topology Aware Lifecycle Manager
- Monitoring deployment progress
- Indication of done for ZTP installations
- Troubleshooting GitOps ZTP
- Site cleanup
- Upgrading GitOps ZTP
- Manually install a single managed cluster
- Configuring BIOS for distributed unit bare-metal hosts
- Configuring static IP addresses for managed clusters
- Automated Discovery image ISO process for provisioning clusters
- Checking the managed cluster status
- Configuring a managed cluster for a disconnected environment
- Configuring IPv6 addresses for a disconnected environment
- Generating RAN policies
- Updating managed policies with the Topology Aware Lifecycle Manager
- End-to-end procedures for updating clusters in a disconnected environment
Deploying distributed units at scale in a disconnected environment
Use zero touch provisioning (ZTP) to provision distributed units at new edge sites in a disconnected environment. The workflow starts when the site is connected to the network and ends with the CNF workload deployed and running on the site nodes.
Provisioning edge sites at scale
Telco edge computing presents extraordinary challenges with managing hundreds to tens of thousands of clusters in hundreds of thousands of locations. These challenges require fully-automated management solutions with, as closely as possible, zero human interaction.
Zero touch provisioning (ZTP) allows you to provision new edge sites with declarative configurations of bare-metal equipment at remote sites. Template or overlay configurations install OKD features that are required for CNF workloads. End-to-end functional test suites are used to verify CNF related features. All configurations are declarative in nature.
You start the workflow by creating declarative configurations for ISO images that are delivered to the edge nodes to begin the installation process. The images are used to repeatedly provision large numbers of nodes efficiently and quickly, allowing you keep up with requirements from the field for far edge nodes.
Service providers are deploying a more distributed mobile network architecture allowed by the modular functional framework defined for 5G. This allows service providers to move from appliance-based radio access networks (RAN) to open cloud RAN architecture, gaining flexibility and agility in delivering services to end users.
The following diagram shows how ZTP works within a far edge framework.
About ZTP and distributed units on OpenShift clusters
You can install a distributed unit (DU) on OKD clusters at scale with Red Hat Advanced Cluster Management (RHACM) using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment.
RHACM manages clusters in a hub-and-spoke architecture, where a single hub cluster manages many spoke clusters. RHACM applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of OKD on each cluster.
The AI service handles provisioning of OKD on single node clusters, three-node clusters, or standard clusters running on bare metal. ACM ships with and deploys the AI when the MultiClusterHub
custom resource is installed.
With ZTP and AI, you can provision OKD clusters to run your DUs at scale. A high-level overview of ZTP for distributed units in a disconnected environment is as follows:
A hub cluster running Red Hat Advanced Cluster Management (RHACM) manages a disconnected internal registry that mirrors the OKD release images. The internal registry is used to provision the spoke clusters.
You manage the bare metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository.
You install the DU bare metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare metal host:
Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster.
Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC. ZTP manages the spoke cluster definition CRs, with the exception of the
BMCSecret
CR, which you create manually. These define the relevant elements for the managed clusters.
The GitOps approach
ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager (OCM) for multisite deployment.
One of the motivators for a GitOps approach is the requirement for reliability at scale. This is a significant challenge that GitOps helps solve.
GitOps addresses the reliability issue by providing traceability, RBAC, and a single source of truth for the desired state of each site. Scale issues are addressed by GitOps providing structure, tooling, and event driven operations through webhooks.
Zero touch provisioning building blocks
Red Hat Advanced Cluster Management (RHACM) leverages zero touch provisioning (ZTP) to deploy single-node OKD clusters, three-node clusters, and standard clusters. The initial site plan is divided into smaller components and initial configuration data is stored in a Git repository. ZTP uses a declarative GitOps approach to deploy these clusters.
The deployment of the clusters includes:
Installing the host operating system (RHCOS) on a blank server.
Deploying OKD.
Creating cluster policies and site subscriptions.
Leveraging a GitOps deployment topology for a develop once, deploy anywhere model.
Making the necessary network configurations to the server operating system.
Deploying profile Operators and performing any needed software-related configuration, such as performance profile, PTP, and SR-IOV.
Downloading images needed to run workloads (CNFs).
How to plan your RAN policies
Zero touch provisioning (ZTP) uses Red Hat Advanced Cluster Management (RHACM) to apply the radio access network (RAN) configuration using a policy-based governance approach to apply the configuration.
The policy generator or PolicyGen
is a part of the GitOps ZTP tooling that facilitates creating RHACM policies from a set of predefined custom resources. There are three main items: policy categorization, source CR policy, and the PolicyGenTemplate
CR. PolicyGen
uses these to generate the policies and their placement bindings and rules.
The following diagram shows how the RAN policy generator interacts with GitOps and RHACM.
RAN policies are categorized into three main groups:
Common
A policy that exists in the Common
category is applied to all clusters to be represented by the site plan. Cluster types include single node, three-node, and standard clusters.
Groups
A policy that exists in the Groups
category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the Groups
category. For example, Groups/group1
can have its own policies that are applied to the clusters belonging to group1
. You can also define a group for each cluster type: single node, three-node, and standard clusters.
Sites
A policy that exists in the Sites
category is applied to a specific cluster. Any cluster could have its own policies that exist in the Sites
category. For example, Sites/cluster1
has its own policies applied to cluster1
. You can also define an example site-specific configuration for each cluster type: single node, three-node, and standard clusters.
Low latency for distributed units (DUs)
Low latency is an integral part of the development of 5G networks. Telecommunications networks require as little signal delay as possible to ensure quality of service in a variety of critical use cases.
Low latency processing is essential for any communication with timing constraints that affect functionality and security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data.
Low latency systems are about guarantees with regards to response and processing times. This includes keeping a communication protocol running smoothly, ensuring device security with fast responses to error conditions, or just making sure a system is not lagging behind when receiving a lot of data. Low latency is key for optimal synchronization of radio transmissions.
OKD enables low latency processing for DUs running on COTS hardware by using a number of technologies and specialized hardware devices:
Real-time kernel for RHCOS
Ensures workloads are handled with a high degree of process determinism.
CPU isolation
Avoids CPU scheduling delays and ensures CPU capacity is available consistently.
NUMA awareness
Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the NUMA node. This decreases latency and improves performance of the node.
Huge pages memory management
Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.
Precision timing synchronization using PTP
Allows synchronization between nodes in the network with sub-microsecond accuracy.
Preparing the disconnected environment
Before you can provision distributed units (DU) at scale, you must install Red Hat Advanced Cluster Management (RHACM), which handles the provisioning of the DUs.
RHACM is deployed as an Operator on the OKD hub cluster. It controls clusters and applications from a single console with built-in security policies. RHACM provisions and manage your DU hosts. To install RHACM in a disconnected environment, you create a mirror registry that mirrors the Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster.
You also use a disconnected mirror host to serve the FCOS ISO and RootFS disk images that provision the DU bare-metal host operating system.
Additional resources
For more information about creating the disconnected mirror registry, see Creating a mirror registry.
For more information about mirroring OpenShift Platform image to the disconnected registry, see Mirroring images for a disconnected installation.
Adding FCOS ISO and RootFS images to the disconnected mirror host
Before you install a cluster on infrastructure that you provision, you must create Fedora CoreOS (FCOS) machines for it to use. Use a disconnected mirror to host the FCOS images you require to provision your distributed unit (DU) bare-metal hosts.
Prerequisites
- Deploy and configure an HTTP server to host the FCOS image resources on the network. You must be able to access the HTTP server from your computer, and from the machines that you create.
The FCOS images might not change with every release of OKD. You must download images with the highest version that is less than or equal to the OKD version that you install. Use the image versions that match your OKD version if they are available. You require ISO and RootFS images to install FCOS on the DU hosts. FCOS qcow2 images are not supported for this installation type. |
Procedure
Log in to the mirror host.
Obtain the FCOS ISO and RootFS images from mirror.openshift.com, for example:
Export the required image names and OKD version as environment variables:
$ export ISO_IMAGE_NAME=<iso_image_name> (1)
$ export ROOTFS_IMAGE_NAME=<rootfs_image_name> (2)
$ export OCP_VERSION=<ocp_version> (3)
1 ISO image name, for example, rhcos-4.11.0-fc.1-x86_64-live.x86_64.iso
2 RootFS image name, for example, rhcos-4.11.0-fc.1-x86_64-live-rootfs.x86_64.img
3 OKD version, for example, latest-4.11
Download the required images:
$ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ISO_IMAGE_NAME} -O /var/www/html/${ISO_IMAGE_NAME}
$ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ROOTFS_IMAGE_NAME} -O /var/www/html/${ROOTFS_IMAGE_NAME}
Verification steps
Verify that the images downloaded successfully and are being served on the disconnected mirror host, for example:
$ wget http://$(hostname)/${ISO_IMAGE_NAME}
Expected output
...
Saving to: rhcos-4.11.0-fc.1-x86_64-live.x86_64.iso
rhcos-4.11.0-fc.1-x86_64- 11%[====> ] 10.01M 4.71MB/s
...
Installing Red Hat Advanced Cluster Management in a disconnected environment
You use Red Hat Advanced Cluster Management (RHACM) on a hub cluster in the disconnected environment to manage the deployment of distributed unit (DU) profiles on multiple managed spoke clusters.
Prerequisites
Install the OKD CLI (
oc
).Log in as a user with
cluster-admin
privileges.Configure a disconnected mirror registry for use in the cluster.
If you want to deploy Operators to the spoke clusters, you must also add them to this registry. See Mirroring an Operator catalog for more information.
Procedure
- Install RHACM on the hub cluster in the disconnected environment. See Installing RHACM in a disconnected environment.
Enabling assisted installer service on bare metal
The Assisted Installer Service (AIS) deploys OKD clusters. Red Hat Advanced Cluster Management (RHACM) ships with AIS. AIS is deployed when you enable the MultiClusterHub Operator on the RHACM hub cluster.
For distributed units (DUs), RHACM supports OKD deployments that run on a single bare-metal host, three-node clusters, or standard clusters. In the case of single node clusters or three-node clusters, all nodes act as both control plane and worker nodes.
Prerequisites
Install OKD 4.11 on a hub cluster.
Install RHACM and create the
MultiClusterHub
resource.Create persistent volume custom resources (CR) for database and file system storage.
You have installed the OpenShift CLI (
oc
).
Create a persistent volume resource for image storage. Failure to specify persistent volume storage for images can affect cluster performance. |
Procedure
Modify the
Provisioning
resource to allow the Bare Metal Operator to watch all namespaces:$ oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'
Create the
AgentServiceConfig
CR.Save the following YAML in the
agent_service_config.yaml
file:apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
name: agent
spec:
databaseStorage:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: <database_volume_size> (1)
filesystemStorage:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: <file_storage_volume_size> (2)
imageStorage:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: <image_storage_volume_size> (3)
osImages: (4)
- openshiftVersion: "<ocp_version>" (5)
version: "<ocp_release_version>" (6)
url: "<iso_url>" (7)
rootFSUrl: "<root_fs_url>" (8)
cpuArchitecture: "x86_64"
1 Volume size for the databaseStorage
field, for example10Gi
.2 Volume size for the filesystemStorage
field, for example20Gi
.3 Volume size for the imageStorage
field, for example2Gi
.4 List of OS image details, for example a single OKD OS version. 5 OKD version to install, in either “x.y” (major.minor) or “x.y.z” (major.minor.patch) formats. 6 Specific install version, for example, 47.83.202103251640-0
.7 ISO url, for example, https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-4.7.7-x86_64-live.x86_64.iso
.8 Root FS image URL, for example https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-live-rootfs.x86_64.img
.Create the
AgentServiceConfig
CR by running the following command:$ oc create -f agent_service_config.yaml
Example output
agentserviceconfig.agent-install.openshift.io/agent created
ZTP custom resources
Zero touch provisioning (ZTP) uses custom resource (CR) objects to extend the Kubernetes API or introduce your own API into a project or a cluster. These CRs contain the site-specific data required to install and configure a cluster for RAN applications.
A custom resource definition (CRD) file defines your own object kinds. Deploying a CRD into the managed cluster causes the Kubernetes API server to begin serving the specified CR for the entire lifecycle.
For each CR in the <site>.yaml
file on the managed cluster, ZTP uses the data to create installation CRs in a directory named for the cluster.
ZTP provides two ways for defining and installing CRs on managed clusters: a manual approach when you are provisioning a single cluster and an automated approach when provisioning multiple clusters.
Manual CR creation for single clusters
Use this method when you are creating CRs for a single cluster. This is a good way to test your CRs before deploying on a larger scale.
Automated CR creation for multiple managed clusters
Use the automated SiteConfig method when you are installing multiple managed clusters, for example, in batches of up to 100 clusters. SiteConfig uses ArgoCD as the engine for the GitOps method of site deployment. After completing a site plan that contains all of the required parameters for deployment, a policy generator creates the manifests and applies them to the hub cluster.
Both methods create the CRs shown in the following table. On the cluster site, an automated Discovery image ISO file creates a directory with the site name and a file with the cluster name. Every cluster has its own namespace, and all of the CRs are under that namespace. The namespace and the CR names match the cluster name.
Resource | Description | Usage |
---|---|---|
| Contains the connection information for the Baseboard Management Controller (BMC) of the target bare-metal host. | Provides access to the BMC in order to load and boot the Discovery image ISO on the target server by using the Redfish protocol. |
| Contains information for pulling OKD onto the target bare-metal host. | Used with ClusterDeployment to generate the Discovery ISO for the managed cluster. |
| Specifies the managed cluster’s configuration such as networking and the number of supervisor (control plane) nodes. Shows the | Specifies the managed cluster configuration information and provides status during the installation of the cluster. |
| References the | Used with |
| Provides network configuration information such as | Sets up a static IP address for the managed cluster’s Kube API server. |
| Contains hardware information about the target bare-metal host. | Created automatically on the hub when the target machine’s Discovery image ISO boots. |
| When a cluster is managed by the hub, it must be imported and known. This Kubernetes object provides that interface. | The hub uses this resource to manage and show the status of managed clusters. |
| Contains the list of services provided by the hub to be deployed to a | Tells the hub which addon services to deploy to a |
| Logical space for | Propagates resources to the |
| Two custom resources are created: |
|
| Contains OKD image information such as the repository and image name. | Passed into resources to provide OKD images. |
ZTP support for single node clusters, three-node clusters, and standard clusters requires updates to these CRs, including multiple instantiations of some.
ZTP provides support for deploying single node clusters, three-node clusters, and standard OpenShift clusters. This includes the installation of OpenShift and deployment of the distributed units (DUs) at scale.
The overall flow is identical to the ZTP support for single node clusters, with some differences in configuration depending on the type of cluster:
SiteConfig
file:
For single node clusters, the
SiteConfig
file must have exactly one entry in thenodes
section.For three-node clusters, the
SiteConfig
file must have exactly three entries defined in thenodes
section.For standard clusters, the
SiteConfig
file must have exactly three entries in thenodes
section withrole: master
and one or more additional entries withrole: worker
.
PolicyGenTemplate
file:
The example common
PolicyGenTemplate
file is common across all types of clusters.There are example group
PolicyGenTemplate
files for single node, three-node, and standard clusters.Site-specific
PolicyGenTemplate
files are still specific to each site.
PolicyGenTemplate CRs for RAN deployments
You use PolicyGenTemplate
custom resources (CRs) to customize the configuration applied to the cluster using the GitOps zero touoch provisioning (ZTP) pipeline. The baseline configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use PolicyGenTemplate
CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.
The baseline PolicyGenTemplate
CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate
. See “Preparing the ZTP Git repository” for further details.
The PolicyGenTemplate
CRs can be found in the ./out/argocd/example/policygentemplates
folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenTemplate
CR refers to other CRs that can be found in the ./out/source-crs
folder.
The PolicyGenTemplate
CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenTemplate
CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.
PolicyGenTemplate CR | Description |
---|---|
| Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning. |
| Contains the RAN policies for three-node clusters only. |
| Contains the RAN policies for single-node clusters only. |
| Contains the RAN policies for standard three control-plane clusters. |
Additional resources
- For more information about extracting the
/argocd
directory from theztp-site-generate
container image, see Preparing the ZTP Git repository.
About the PolicyGenTemplate
The PolicyGenTemplate.yaml
file is a custom resource definition (CRD) that tells the PolicyGen
policy generator what CRs to include in the configuration, how to categorize the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.
The following example shows a PolicyGenTemplate.yaml
file:
---
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-sno"
namespace: "group-du-sno-policies"
spec:
bindingRules:
group-du-sno: ""
mcp: "master"
sourceFiles:
- fileName: ConsoleOperatorDisable.yaml
policyName: "console-policy"
- fileName: ClusterLogForwarder.yaml
policyName: "log-forwarder-policy"
spec:
outputs:
- type: "kafka"
name: kafka-open
# below url is an example
url: tcp://10.46.55.190:9092/test
pipelines:
- name: audit-logs
inputRefs:
- audit
outputRefs:
- kafka-open
- name: infrastructure-logs
inputRefs:
- infrastructure
outputRefs:
- kafka-open
- fileName: ClusterLogging.yaml
policyName: "log-policy"
spec:
curation:
curator:
schedule: "30 3 * * *"
collection:
logs:
type: "fluentd"
fluentd: {}
- fileName: MachineConfigSctp.yaml
policyName: "mc-sctp-policy"
metadata:
labels:
machineconfiguration.openshift.io/role: master
- fileName: PtpConfigSlave.yaml
policyName: "ptp-config-policy"
metadata:
name: "du-ptp-slave"
spec:
profile:
- name: "slave"
interface: "ens5f0"
ptp4lOpts: "-2 -s --summary_interval -4"
phc2sysOpts: "-a -r -n 24"
- fileName: SriovOperatorConfig.yaml
policyName: "sriov-operconfig-policy"
spec:
disableDrain: true
- fileName: MachineConfigAcceleratedStartup.yaml
policyName: "mc-accelerated-policy"
metadata:
name: 04-accelerated-container-startup-master
labels:
machineconfiguration.openshift.io/role: master
- fileName: DisableSnoNetworkDiag.yaml
policyName: "disable-network-diag"
metadata:
labels:
machineconfiguration.openshift.io/role: master
The group-du-ranGen.yaml
file defines a group of policies under a group named group-du
. A Red Hat Advanced Cluster Management (RHACM) policy is generated for every source file that exists in sourceFiles
. And, a single placement binding and placement rule is generated to apply the cluster selection rule for group-du
policies.
Using the source file PtpConfigSlave.yaml
as an example, the PtpConfigSlave
has a definition of a PtpConfig
custom resource (CR). The generated policy for the PtpConfigSlave
example is named group-du-ptp-config-policy
. The PtpConfig
CR defined in the generated group-du-ptp-config-policy
is named du-ptp-slave
. The spec
defined in PtpConfigSlave.yaml
is placed under du-ptp-slave
along with the other spec
items defined under the source file.
The following example shows the group-du-ptp-config-policy
:
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
name: group-du-ptp-config-policy
namespace: groups-sub
annotations:
policy.open-cluster-management.io/categories: CM Configuration Management
policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
policy.open-cluster-management.io/standards: NIST SP 800-53
spec:
remediationAction: enforce
disabled: false
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: group-du-ptp-config-policy-config
spec:
remediationAction: enforce
severity: low
namespaceselector:
exclude:
- kube-*
include:
- '*'
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
name: slave
namespace: openshift-ptp
spec:
recommend:
- match:
- nodeLabel: node-role.kubernetes.io/worker-du
priority: 4
profile: slave
profile:
- interface: ens5f0
name: slave
phc2sysOpts: -a -r -n 24
ptp4lConf: |
[global]
#
# Default Data Set
#
twoStepFlag 1
slaveOnly 0
priority1 128
priority2 128
domainNumber 24
.....
Best practices when customizing PolicyGenTemplate CRs
Consider the following best practices when customizing site configuration PolicyGenTemplate
CRs:
Use as few policies as necessary. Using fewer policies means using less resources. Each additional policy creates overhead for the hub cluster and the deployed spoke cluster. CRs are combined into policies based on the
policyName
field in thePolicyGenTemplate
CR. CRs in the samePolicyGenTemplate
which have the same value forpolicyName
are managed under a single policy.Use a single catalog source for all Operators. In disconnected environments, configure the registry as a single index containing all Operators. Each additional
CatalogSource
on the spoke clusters increases CPU usage.MachineConfig
CRs should be included asextraManifests
in theSiteConfig
CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications.PolicyGenTemplates
should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.
Additional resources
- For details about best practice for scaling clusters with Red Hat Advanced Cluster Management (RHACM), see ACM performance and scalability considerations.
Scaling the hub cluster to managing large numbers of spoke clusters is affected by the number of policies created on the hub cluster. Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common/group/site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configuration into a single policy. |
Creating the PolicyGenTemplate CR
Use this procedure to create the PolicyGenTemplate
custom resource (CR) for your site in your local clone of the Git repository.
Procedure
Choose an appropriate example from
out/argocd/example/policygentemplates
. This directory demonstrates a three-level policy framework that represents a well-supported low-latency profile tuned for the needs of 5G Telco DU deployments:A single
common-ranGen.yaml
file that should apply to all types of sites.A set of shared
group-du-*-ranGen.yaml
files, each of which should be common across a set of similar clusters.An example
example-*-site.yaml
that can be copied and updated for each individual site.
Ensure that the labels defined in your
PolicyGenTemplate
bindingRules
section correspond to the labels that are defined in theSiteConfig
files of the clusters you are managing.Ensure that the content of the overlaid spec files matches your desired end state. As a reference, the
out/source-crs
directory contains the full list ofsource-crs
available to be included and overlaid by yourPolicyGenTemplate
templates.Depending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single
PerformancePolicy.yaml
file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.Define all the policy namespaces in a YAML file similar to the example
out/argocd/example/policygentemplates/ns.yaml
file.Add all the
PolicyGenTemplate
files andns.yaml
file to thekustomization.yaml
file, similar to the exampleout/argocd/example/policygentemplates/kustomization.yaml
file.Commit the
PolicyGenTemplate
CRs,ns.yaml
file, and the associatedkustomization.yaml
file in the Git repository.
Creating ZTP custom resources for multiple managed clusters
If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and SiteConfig
files to manage the processes that create the CRs and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach.
Installing and deploying the clusters is a two stage process, as shown here:
Using PolicyGenTemplate CRs to override source CRs content
PolicyGenTemplate
CRs allow you to overlay additional configuration details on top of the base source CRs provided in the ztp-site-generate
container. You can think of PolicyGenTemplate
CRs as a logical merge or patch to the base CR. Use PolicyGenTemplate
CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR.
The following example procedure describes how to update fields in the generated PerformanceProfile
CR for the reference configuration based on the PolicyGenTemplate
CR in the group-du-sno-ranGen.yaml
file. Use the procedure as a basis for modifying other parts of the PolicyGenTemplate
based on your requirements.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.
Procedure
Review the baseline source CR for existing content. You can review the source CRs listed in the reference
PolicyGenTemplate
CRs by extracting them from the zero touch provisioning (ZTP) container.Create an
/out
folder:$ mkdir -p ./out
Extract the source CRs:
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
Review the baseline
PerformanceProfile
CR in./out/source-crs/PerformanceProfile.yaml
:apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: $name
annotations:
ran.openshift.io/ztp-deploy-wave: "10"
spec:
additionalKernelArgs:
- "idle=poll"
- "rcupdate.rcu_normal_after_boot=0"
cpu:
isolated: $isolated
reserved: $reserved
hugepages:
defaultHugepagesSize: $defaultHugepagesSize
pages:
- size: $size
count: $count
node: $node
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/$mcp: ""
net:
userLevelNetworking: true
nodeSelector:
node-role.kubernetes.io/$mcp: ''
numa:
topologyPolicy: "restricted"
realTimeKernel:
enabled: true
Any fields in the source CR which contain
$…
are removed from the generated CR if they are not provided in thePolicyGenTemplate
CR.Update the
PolicyGenTemplate
entry forPerformanceProfile
in thegroup-du-sno-ranGen.yaml
reference file. The following examplePolicyGenTemplate
CR stanza supplies appropriate CPU specifications, sets thehugepages
configuration, and adds a new field that setsgloballyDisableIrqLoadBalancing
to false.- fileName: PerformanceProfile.yaml
policyName: "config-policy"
metadata:
name: openshift-node-performance-profile
spec:
cpu:
# These must be tailored for the specific hardware platform
isolated: "2-19,22-39"
reserved: "0-1,20-21"
hugepages:
defaultHugepagesSize: 1G
pages:
- size: 1G
count: 10
globallyDisableIrqLoadBalancing: false
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application.
Example output
The ZTP application generates an ACM policy that contains the generated PerformanceProfile
CR. The contents of that CR are derived by merging the metadata
and spec
contents from the PerformanceProfile
entry in the PolicyGenTemplate
onto the source CR. The resulting CR has the following content:
---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: openshift-node-performance-profile
spec:
additionalKernelArgs:
- idle=poll
- rcupdate.rcu_normal_after_boot=0
cpu:
isolated: 2-19,22-39
reserved: 0-1,20-21
globallyDisableIrqLoadBalancing: false
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 10
size: 1G
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master: ""
net:
userLevelNetworking: true
nodeSelector:
node-role.kubernetes.io/master: ""
numa:
topologyPolicy: restricted
realTimeKernel:
enabled: true
In the An exception to this is the
The |
Filtering custom resources using SiteConfig filters
By using filters, you can easily customize SiteConfig
custom resources (CRs) to include or exclude other CRs for use in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline.
You can specify an inclusionDefault
value of include
or exclude
for the SiteConfig
CR, along with a list of the specific extraManifest
RAN CRs that you want to include or exclude. Setting inclusionDefault
to include
makes the ZTP pipeline apply all the files in /source-crs/extra-manifest
during installation. Setting inclusionDefault
to exclude
does the opposite.
You can exclude individual CRs from the /source-crs/extra-manifest
folder that are otherwise included by default. The following example configures a custom single-node OpenShift SiteConfig
CR to exclude the /source-crs/extra-manifest/03-sctp-machine-config-worker.yaml
CR at installation time.
Some additional optional filtering scenarios are also described.
Prerequisites
You configured the hub cluster for generating the required installation and policy CRs.
You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
To prevent the ZTP pipeline from applying the
03-sctp-machine-config-worker.yaml
CR file, apply the following YAML in theSiteConfig
CR:apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
name: "site1-sno-du"
namespace: "site1-sno-du"
spec:
baseDomain: "example.com"
pullSecretRef:
name: "assisted-deployment-pull-secret"
clusterImageSetNameRef: "openshift-4.11"
sshPublicKey: "<ssh_public_key>"
clusters:
- clusterName: "site1-sno-du"
extraManifests:
filter:
exclude:
- 03-sctp-machine-config-worker.yaml
The ZTP pipeline skips the
03-sctp-machine-config-worker.yaml
CR during installation. All other CRs in/source-crs/extra-manifest
are applied.Save the
SiteConfig
CR and and push the changes to the site configuration repository.The ZTP pipeline monitors and adjusts what CRs it applies based on the
SiteConfig
filter instructions.Optional: To prevent the ZTP pipeline from applying all the
/source-crs/extra-manifest
CRs during cluster installation, apply the following YAML in theSiteConfig
CR:- clusterName: "site1-sno-du"
extraManifests:
filter:
inclusionDefault: exclude
Optional: To exclude all the
/source-crs/extra-manifest
RAN CRs and instead include a custom CR file during installation, edit the customSiteConfig
CR to set the custom manifests folder and theinclude
file, for example:clusters:
- clusterName: "site1-sno-du"
extraManifestPath: "<custom_manifest_folder>" (1)
extraManifests:
filter:
inclusionDefault: exclude (2)
include:
- custom-sctp-machine-config-worker.yaml
1 Replace <custom_manifest_folder>
with the name of the folder that contains the custom installation CRs, for example,user-custom-manifest/
.2 Set inclusionDefault
toexclude
to prevent the ZTP pipeline from applying the files in/source-crs/extra-manifest
during installation.The following example illustrates the custom folder structure:
siteconfig
├── site1-sno-du.yaml
└── user-custom-manifest
└── custom-sctp-machine-config-worker.yaml
Configuring PTP fast events using PolicyGenTemplate CRs
You can configure PTP fast events for vRAN clusters that are deployed using the GitOps Zero Touch Provisioning (ZTP) pipeline. Use PolicyGenTemplate
custom resources (CRs) as the basis to create a hierarchy of configuration files tailored to your specific site requirements.
Prerequisites
- Create a Git repository where you manage your custom site configuration data.
Procedure
Add the following YAML into
.spec.sourceFiles
in thecommon-ranGen.yaml
file to configure the AMQP Operator:#AMQ interconnect operator for fast events
- fileName: AmqSubscriptionNS.yaml
policyName: "subscriptions-policy"
- fileName: AmqSubscriptionOperGroup.yaml
policyName: "subscriptions-policy"
- fileName: AmqSubscription.yaml
policyName: "subscriptions-policy"
Apply the following
PolicyGenTemplate
changes togroup-du-3node-ranGen.yaml
,group-du-sno-ranGen.yaml
, orgroup-du-standard-ranGen.yaml
files according to your requirements:In
.sourceFiles
, add thePtpOperatorConfig
CR file that configures the AMQ transport host to theconfig-policy
:- fileName: PtpOperatorConfigForEvent.yaml
policyName: "config-policy"
Configure the
linuxptp
andphc2sys
for the PTP clock type and interface. For example, add the following stanza into.sourceFiles
:- fileName: PtpConfigSlave.yaml (1)
policyName: "config-policy"
metadata:
name: "du-ptp-slave"
spec:
profile:
- name: "slave"
interface: "ens5f1" (2)
ptp4lOpts: "-2 -s --summary_interval -4" (3)
phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"
ptpClockThreshold: (4)
holdOverTimeout: 30 #secs
maxOffsetThreshold: 100 #nano secs
minOffsetThreshold: -100 #nano secs
1 Can be one PtpConfigMaster.yaml
,PtpConfigSlave.yaml
, orPtpConfigSlaveCvl.yaml
depending on your requirements.PtpConfigSlaveCvl.yaml
configureslinuxptp
services for an Intel E810 Columbiaville NIC. For configurations based ongroup-du-sno-ranGen.yaml
orgroup-du-3node-ranGen.yaml
, usePtpConfigSlave.yaml
.2 Device specific interface name. 3 You must append the —summary_interval -4
value toptp4lOpts
in.spec.sourceFiles.spec.profile
to enable PTP fast events.4 ptpClockThreshold
configues how long the clock stays in clock holdover state. Holdover state is the period between local and master clock synchronizations. Offset is the time difference between the local and master clock.
Apply the following
PolicyGenTemplate
changes to your specific site YAML files, for example,example-sno-site.yaml
:In
.sourceFiles
, add theInterconnect
CR file that configures the AMQ router to theconfig-policy
:- fileName: AmqInstance.yaml
policyName: "config-policy"
Merge any other required changes and files with your custom site repository.
Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.
Configuring UEFI secure boot for clusters using PolicyGenTemplate CRs
You can configure UEFI secure boot for vRAN clusters that are deployed using the GitOps zero touch provisioning (ZTP) pipeline.
Prerequisites
- Create a Git repository where you manage your custom site configuration data.
Procedure
Create the following
MachineConfig
resource and save it in theuefi-secure-boot.yaml
file:apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: uefi-secure-boot
spec:
config:
ignition:
version: 3.1.0
kernelArguments:
- efi=runtime
In your Git repository custom
/siteconfig
directory, create a/sno-extra-manifest
folder and add theuefi-secure-boot.yaml
file, for example:siteconfig
├── site1-sno-du.yaml
├── site2-standard-du.yaml
└── sno-extra-manifest
└── uefi-secure-boot.yaml
In your cluster
SiteConfig
CR, specify the required values forextraManifestPath
andbootMode
:Enter the directory name in the
.spec.clusters.extraManifestPath
field, for example:clusters:
- clusterName: "example-cluster"
extraManifestPath: sno-extra-manifest/
Set the value for
.spec.clusters.nodes.bootMode
toUEFISecureBoot
, for example:nodes:
- hostName: "ran.example.lab"
bootMode: "UEFISecureBoot"
Deploy the cluster using the GitOps ZTP pipeline.
Verification
Open a remote shell to the deployed cluster, for example:
$ oc debug node/node-1.example.com
Verify that the
SecureBoot
feature is enabled:sh-4.4# mokutil --sb-state
Example output
SecureBoot enabled
Configuring bare-metal event monitoring using PolicyGenTemplate CRs
You can configure bare-metal hardware events for vRAN clusters that are deployed using the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
Install the OpenShift CLI (
oc
).Log in as a user with
cluster-admin
privileges.Create a Git repository where you manage your custom site configuration data.
Multiple |
Procedure
To configure the AMQ Interconnect Operator and the Bare Metal Event Relay Operator, add the following YAML to
spec.sourceFiles
in thecommon-ranGen.yaml
file:# AMQ interconnect operator for fast events
- fileName: AmqSubscriptionNS.yaml
policyName: "subscriptions-policy"
- fileName: AmqSubscriptionOperGroup.yaml
policyName: "subscriptions-policy"
- fileName: AmqSubscription.yaml
policyName: "subscriptions-policy"
# Bare Metal Event Rely operator
- fileName: BareMetalEventRelaySubscriptionNS.yaml
policyName: "subscriptions-policy"
- fileName: BareMetalEventRelaySubscriptionOperGroup.yaml
policyName: "subscriptions-policy"
- fileName: BareMetalEventRelaySubscription.yaml
policyName: "subscriptions-policy"
Add the
Interconnect
CR to.spec.sourceFiles
in the site configuration file, for example, theexample-sno-site.yaml
file:- fileName: AmqInstance.yaml
policyName: "config-policy"
Add the
HardwareEvent
CR tospec.sourceFiles
in your specific group configuration file, for example, in thegroup-du-sno-ranGen.yaml
file:- fileName: HardwareEvent.yaml
policyName: "config-policy"
spec:
nodeSelector: {}
transportHost: "amqp://<amq_interconnect_name>.<amq_interconnect_namespace>.svc.cluster.local" (1)
logLevel: "info"
1 The transportHost
URL is composed of the existing AMQ Interconnect CRname
andnamespace
. For example, intransportHost: “amqp://amq-router.amq-router.svc.cluster.local”
, the AMQ Interconnectname
andnamespace
are both set toamq-router
.Commit the
PolicyGenTemplate
change in Git, and then push the changes to your site configuration repository to deploy bare-metal events monitoring to new sites using GitOps ZTP.Create the Redfish Secret by running the following command:
$ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \
--from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \
--from-literal=hostaddr="<bmc_host_ip_addr>"
Additional resources
For more information about how to install the Bare Metal Event Relay, see Installing the Bare Metal Event Relay using the CLI.
For more information about how to install the AMQ Interconnect Operator, see Installing the AMQ messaging bus.
For more information about how to create the username, password, and the host IP address for the secret, see Creating the bare-metal event and Secret CRs.
Installing the GitOps ZTP pipeline
The procedures in this section tell you how to complete the following tasks:
Prepare the Git repository you need to host site configuration data.
Configure the hub cluster for generating the required installation and policy custom resources (CR).
Deploy the managed clusters using zero touch provisioning (ZTP).
Preparing the ZTP Git repository
Create a Git repository for hosting site configuration data. The zero touch provisioning (ZTP) pipeline requires read access to this repository.
Procedure
Create a directory structure with separate paths for the
SiteConfig
andPolicyGenTemplate
custom resources (CR).Export the
argocd
directory from theztp-site-generate
container image using the following commands:$ podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10
$ mkdir -p ./out
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
Check that the
out
directory contains the following subdirectories:out/extra-manifest
contains the source CR files thatSiteConfig
uses to generate extra manifestconfigMap
.out/source-crs
contains the source CR files thatPolicyGenTemplate
uses to generate the Red Hat Advanced Cluster Management (RHACM) policies.out/argocd/deployment
contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure.out/argocd/example
contains the examples forSiteConfig
andPolicyGenTemplate
files that represent the recommended configuration.
The directory structure under out/argocd/example
serves as a reference for the structure and content of your Git repository. The example includes SiteConfig
and PolicyGenTemplate
reference CRs for single-node, three-node, and standard clusters. Remove references to cluster types that you are not using. The following example describes a set of CRs for a network of single-node clusters:
example/
├── policygentemplates
│ ├── common-ranGen.yaml
│ ├── example-sno-site.yaml
│ ├── group-du-sno-ranGen.yaml
│ ├── group-du-sno-validator-ranGen.yaml
│ ├── kustomization.yaml
│ └── ns.yaml
└── siteconfig
├── example-sno.yaml
├── KlusterletAddonConfigOverride.yaml
└── kustomization.yaml
Keep SiteConfig
and PolicyGenTemplate
CRs in separate directories. Both the SiteConfig
and PolicyGenTemplate
directories must contain a kustomization.yaml
file that explicitly includes the files in that directory.
This directory structure and the kustomization.yaml
files must be committed and pushed to your Git repository. The initial push to Git should include the kustomization.yaml
files. The SiteConfig
(example-sno.yaml
) and PolicyGenTemplate
(common-ranGen.yaml
, group-du-sno*.yaml
, and example-sno-site.yaml
) files can be omitted and pushed at a later time as required when deploying a site.
The KlusterletAddonConfigOverride.yaml
file is only required if one or more SiteConfig
CRs which make reference to it are committed and pushed to Git. See example-sno.yaml
for an example of how this is used.
Preparing the hub cluster for ZTP
You can configure your hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CR) for each site based on a zero touch provisioning (ZTP) GitOps flow.
Prerequisites
Openshift Cluster 4.8 or 4.9 as the hub cluster
Red Hat Advanced Cluster Management (RHACM) Operator 2.3 or 2.4 installed on the hub cluster
Red Hat OpenShift GitOps Operator 1.3 on the hub cluster
Procedure
Install the Topology Aware Lifecycle Manager (TALM), which coordinates with any new sites added by ZTP and manages application of the
PolicyGenTemplate
-generated policies.Prepare the ArgoCD pipeline configuration:
Create a Git repository with the directory structure similar to the example directory. For more information, see “Preparing the ZTP Git repository”.
Configure access to the repository using the ArgoCD UI. Under Settings configure the following:
Repositories - Add the connection information. The URL must end in
.git
, for example,[https://repo.example.com/repo.git](https://repo.example.com/repo.git)
and credentials.Certificates - Add the public certificate for the repository, if needed.
Modify the two ArgoCD Applications,
out/argocd/deployment/clusters-app.yaml
andout/argocd/deployment/policies-app.yaml
, based on your Git repository:Update the URL to point to the Git repository. The URL must end with
.git
, for example,[https://repo.example.com/repo.git](https://repo.example.com/repo.git)
.The
targetRevision
must indicate which Git repository branch to monitor.The path should specify the path to the
SiteConfig
orPolicyGenTemplate
CRs, respectively.
To patch the ArgoCD instance in the hub cluster by using the patch file previously extracted into the
out/argocd/deployment/
directory, enter the following command:$ oc patch argocd openshift-gitops \
-n openshift-gitops --type=merge \
--patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
Apply the pipeline configuration to your hub cluster by using the following command:
$ oc apply -k out/argocd/deployment
Deploying additional changes to clusters
Custom resources (CRs) that are deployed through the GitOps zero touch provisioning (ZTP) pipeline support two goals:
Deploying additional Operators to spoke clusters that are required by typical RAN DU applications running at the network far-edge.
Customizing the OKD installation to provide a high performance platform capable of meeting the strict timing requirements in a minimal CPU budget.
If you require cluster configuration changes outside of the base GitOps ZTP pipeline configuration, there are three options:
Apply the additional configuration after the ZTP pipeline is complete
When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget.
Add content to the ZTP library
The base source CRs that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required.
Create extra manifests for the cluster installation
Extra manifests are applied during installation and makes the installation process more efficient.
Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of OKD. |
Additional resources
See Adding new content to the GitOps ZTP pipeline for more information about adding or modifying existing source CRs in the
ztp-site-generate
container.See Customizing the ZTP GitOps pipeline with extra manifests for more information on adding extra manifests.
Adding new content to the GitOps ZTP pipeline
The source CRs in the GitOps ZTP site generator container provide a set of critical features and node tuning settings for RAN Distributed Unit (DU) applications. These are applied to the clusters that you deploy with ZTP. To add or modify existing source CRs in the ztp-site-generate
container, rebuild the ztp-site-generate
container and make it available to the hub cluster, typically from the disconnected registry associated with the hub cluster. Any valid OKD CR can be added.
Perform the following procedure to add new content to the ZTP pipeline.
Procedure
Create a directory containing a Containerfile and the source CR YAML files that you want to include in the updated
ztp-site-generate
container, for example:ztp-update/
├── example-cr1.yaml
├── example-cr2.yaml
└── ztp-update.in
Add the following content to the
ztp-update.in
Containerfile:FROM registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10
ADD example-cr2.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/
ADD example-cr1.yaml /kustomize/plugin/ran.openshift.io/v1/policygentemplate/source-crs/
Open a terminal at the
ztp-update/
folder and rebuild the container:$ podman build -t ztp-site-generate-rhel8-custom:v4.10-custom-1
Push the built container image to your disconnected registry, for example:
$ podman push localhost/ztp-site-generate-rhel8-custom:v4.10-custom-1 registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1
Patch the Argo CD instance on the hub cluster to point to the newly built container image:
$ oc patch -n openshift-gitops argocd openshift-gitops --type=json -p '[{"op": "replace", "path":"/spec/repo/initContainers/0/image", "value": "registry.example.com:5000/ztp-site-generate-rhel8-custom:v4.10-custom-1"} ]'
When the Argo CD instance is patched, the
openshift-gitops-repo-server
pod automatically restarts.
Verification
Verify that the new
openshift-gitops-repo-server
pod has completed initialization and that the previous repo pod is terminated:$ oc get pods -n openshift-gitops | grep openshift-gitops-repo-server
Example output
openshift-gitops-server-7df86f9774-db682 1/1 Running 1 28s
You must wait until the new
openshift-gitops-repo-server
pod has completed initialization and the previous pod is terminated before the newly added container image content is available.
Additional resources
- Alternatively, you can patch the Argo CD instance as described in Preparing the hub cluster for ZTP by modifying
argocd-openshift-gitops-patch.json
with an updatedinitContainer
image before applying the patch file.
Customizing extra installation manifests in the ZTP GitOps pipeline
You can define a set of extra manifests for inclusion in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline. These manifests are linked to the SiteConfig
custom resources (CRs) and are applied to the cluster during installation. Including MachineConfig
CRs at install time makes the installation process more efficient.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
Create a set of extra manifest CRs that the ZTP pipeline uses to customize the cluster installs.
In your custom
/siteconfig
directory, create an/extra-manifest
folder for your extra manifests. The following example illustrates a sample/siteconfig
with/extra-manifest
folder:siteconfig
├── site1-sno-du.yaml
├── site2-standard-du.yaml
└── extra-manifest
└── 01-example-machine-config.yaml
Add your custom extra manifest CRs to the
siteconfig/extra-manifest
directory.In your
SiteConfig
CR, enter the directory name in theextraManifestPath
field, for example:clusters:
- clusterName: "example-sno"
networkType: "OVNKubernetes"
extraManifestPath: extra-manifest
Save the
SiteConfig
CRs and/extra-manifest
CRs and push them to the site configuration repo.
The ZTP pipeline appends the CRs in the /extra-manifest
directory to the default set of extra manifests during cluster provisioning.
Deploying a site
Use the following procedure to prepare the hub cluster for site deployment and initiate zero touch provisioning (ZTP) by pushing custom resources (CRs) to your Git repository.
Procedure
Create the required secrets for the site. These resources must be in a namespace with a name matching the cluster name. In
out/argocd/example/siteconfig/example-sno.yaml
, the cluster name and namespace isexample-sno
.Create the namespace for the cluster using the following commands:
$ export CLUSTERNS=example-sno
$ oc create namespace $CLUSTERNS
Create a pull secret for the cluster. The pull secret must contain all the credentials necessary for installing OKD and all required Operators. In all of the example
SiteConfig
CRs, the pull secret is namedassisted-deployment-pull-secret
, as shown below:$ oc apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: assisted-deployment-pull-secret
namespace: $CLUSTERNS
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: $(base64 <pull-secret.json)
EOF
Create a BMC authentication secret for each host you are deploying:
$ oc apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: $(read -p 'Hostname: ' tmp; printf $tmp)-bmc-secret
namespace: $CLUSTERNS
type: Opaque
data:
username: $(read -p 'Username: ' tmp; printf $tmp | base64)
password: $(read -s -p 'Password: ' tmp; printf $tmp | base64)
EOF
The secrets are referenced from the
SiteConfig
custom resource (CR) by name. The namespace must match theSiteConfig
namespace.Create a
SiteConfig
CR for your cluster in your local clone of the Git repository:Choose the appropriate example for your CR from the
out/argocd/example/siteconfig/
folder. The folder includes example files for single node, three-node, and standard clusters:example-sno.yaml
example-3node.yaml
example-standard.yaml
Change the cluster and host details in the example file to match the type of cluster you want. The following file is a composite of the three files that explains the configuration of each cluster type:
# example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno
---
apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
name: "example-sno"
namespace: "example-sno"
spec:
baseDomain: "example.com"
pullSecretRef:
name: "assisted-deployment-pull-secret"
clusterImageSetNameRef: "openshift-4.10" (1)
sshPublicKey: "ssh-rsa AAAA..."
clusters:
- clusterName: "example-sno"
networkType: "OVNKubernetes"
clusterLabels: (2)
# These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates:
# ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true'
common: true
# ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""'
group-du-sno: ""
# ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"'
# Normally this should match or contain the cluster name so it only applies to a single cluster
sites : "example-sno"
clusterNetwork:
- cidr: 1001:1::/48
hostPrefix: 64
machineNetwork: (3)
- cidr: 1111:2222:3333:4444::/64
# For 3-node and standard clusters with static IPs, the API and Ingress IPs must be configured here
apiVIP: 1111:2222:3333:4444::1:1 (4)
ingressVIP: 1111:2222:3333:4444::1:2 (5)
serviceNetwork:
- 1001:2::/112
additionalNTPSources:
- 1111:2222:3333:4444::2
nodes:
- hostName: "example-node1.example.com" (6)
role: "master"
bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" (7)
bmcCredentialsName:
name: "example-node1-bmh-secret" (8)
bootMACAddress: "AA:BB:CC:DD:EE:11"
bootMode: "UEFI"
rootDeviceHints:
hctl: '0:1:0'
cpuset: "0-1,52-53"
nodeNetwork: (9)
interfaces:
- name: eno1
macAddress: "AA:BB:CC:DD:EE:11"
config:
interfaces:
- name: eno1
type: ethernet
state: up
macAddress: "AA:BB:CC:DD:EE:11"
ipv4:
enabled: false
ipv6:
enabled: true
address:
- ip: 1111:2222:3333:4444::1:1
prefix-length: 64
dns-resolver:
config:
search:
- example.com
server:
- 1111:2222:3333:4444::2
routes:
config:
- destination: ::/0
next-hop-interface: eno1
next-hop-address: 1111:2222:3333:4444::1
table-id: 254
1 Applies to all cluster types. The value must match an image set available on the hub cluster. To see the list of supported versions on your hub, run oc get clusterimagesets
.2 Applies to all cluster types. These values must correspond to the PolicyGenTemplate
labels that you define in a later step.3 Applies to single node clusters. The value defines the cluster network sections for a single node deployment. 4 Applies to three-node and standard clusters. The value defines the cluster network sections. 5 Applies to three-node and standard clusters. The value defines the cluster network sections. 6 Applies to all cluster types. For single node deployments, define one host. For three-node deployments, define three hosts. For standard deployments, define three hosts with role: master
and two or more hosts defined withrole: worker
.7 Applies to all cluster types. Specifies the BMC address. 8 Applies to all cluster types. Specifies the BMC credentials. 9 Applies to all cluster types. Specifies the network settings for the node. You can inspect the default set of extra-manifest
MachineConfig
CRs inout/argocd/extra-manifest
. It is automatically applied to the cluster when it is installed.Optional: To provision additional install-time manifests on the provisioned cluster, create a directory in your Git repository, for example,
sno-extra-manifest/
, and add your custom manifest CRs to this directory. If yourSiteConfig.yaml
refers to this directory in theextraManifestPath
field, any CRs in this referenced directory are appended to the default set of extra manifests.
Add the
SiteConfig
CR to thekustomization.yaml
file in thegenerators
section, similar to the example shown inout/argocd/example/siteconfig/kustomization.yaml
.Commit your
SiteConfig
CR and associatedkustomization.yaml
in your Git repository.Push your changes to the Git repository. The ArgoCD pipeline detects the changes and begins the site deployment. You can push the changes to the
SiteConfig
CR and thePolicyGenTemplate
CR simultaneously.The
SiteConfig
CR creates the following CRs on the hub cluster:Namespace
- Unique per siteAgentClusterInstall
BareMetalHost
- One per nodeClusterDeployment
InfraEnv
NMStateConfig
- One per nodeExtraManifestsConfigMap
- Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more.ManagedCluster
KlusterletAddonConfig
GitOps ZTP and Topology Aware Lifecycle Manager
GitOps zero touch provisioning (ZTP) generates installation and configuration CRs from manifests stored in Git. These artifacts are applied to a centralized hub cluster where Red Hat Advanced Cluster Management (RHACM), assisted installer service, and the Topology Aware Lifecycle Manager (TALM) use the CRs to install and configure the spoke cluster. The configuration phase of the ZTP pipeline uses the TALM to orchestrate the application of the configuration CRs to the cluster. There are several key integration points between GitOps ZTP and the TALM.
Inform policies
By default, GitOps ZTP creates all policies with a remediation action of inform
. These policies cause RHACM to report on compliance status of clusters relevant to the policies but does not apply the desired configuration. During the ZTP installation, the TALM steps through the created inform
policies, creates a copy for the target spoke cluster(s) and changes the remediation action of the copy to enforce
. This pushes the configuration to the spoke cluster. Outside of the ZTP phase of the cluster lifecycle, this setup allows changes to be made to policies without the risk of immediately rolling those changes out to all affected spoke clusters in the network. You can control the timing and the set of clusters that are remediated using TALM.
Automatic creation of ClusterGroupUpgrade CRs
The TALM monitors the state of all ManagedCluster
CRs on the hub cluster. Any ManagedCluster
CR which does not have a ztp-done
label applied, including newly created ManagedCluster
CRs, causes the TALM to automatically create a ClusterGroupUpgrade
CR with the following characteristics:
The
ClusterGroupUpgrade
CR is created and enabled in theztp-install
namespace.ClusterGroupUpgrade
CR has the same name as theManagedCluster
CR.The cluster selector includes only the cluster associated with that
ManagedCluster
CR.The set of managed policies includes all policies that RHACM has bound to the cluster at the time the
ClusterGroupUpgrade
is created.Pre-caching is disabled.
Timeout set to 4 hours (240 minutes).
The automatic creation of an enabled
ClusterGroupUpgrade
ensures that initial zero-touch deployment of clusters proceeds without the need for user intervention. Additionally, the automatic creation of aClusterGroupUpgrade
CR for anyManagedCluster
without theztp-done
label allows a failed ZTP installation to be restarted by simply deleting theClusterGroupUpgrade
CR for the cluster.
Waves
Each policy generated from a PolicyGenTemplate
CR includes a ztp-deploy-wave
annotation. This annotation is based on the same annotation from each CR which is included in that policy. The wave annotation is used to order the policies in the auto-generated ClusterGroupUpgrade
CR.
All CRs in the same policy must have the same setting for the |
The TALM applies the configuration policies in the order specified by the wave annotations. The TALM waits for each policy to be compliant before moving to the next policy. It is important to ensure that the wave annotation for each CR takes into account any prerequisites for those CRs to be applied to the cluster. For example, an Operator must be installed before or concurrently with the configuration for the Operator. Similarly, the CatalogSource
for an Operator must be installed in a wave before or concurrently with the Operator Subscription. The default wave value for each CR takes these prerequisites into account.
Multiple CRs and policies can share the same wave number. Having fewer policies can result in faster deployments and lower CPU usage. It is a best practice to group many CRs into relatively few waves.
To check the default wave value in each source CR, run the following command against the out/source-crs
directory that is extracted from the ztp-site-generate
container image:
$ grep -r "ztp-deploy-wave" out/source-crs
Phase labels
The ClusterGroupUpgrade
CR is automatically created and includes directives to annotate the ManagedCluster
CR with labels at the start and end of the ZTP process.
When ZTP configuration post-installation commences, the ManagedCluster
has the ztp-running
label applied. When all policies are remediated to the cluster and are fully compliant, these directives cause the TALM to remove the ztp-running
label and apply the ztp-done
label.
For deployments which make use of the informDuValidator
policy, the ztp-done
label is applied when the cluster is fully ready for deployment of applications. This includes all reconciliation and resulting effects of the ZTP applied configuration CRs.
Linked CRs
The automatically created ClusterGroupUpgrade
CR has the owner reference set as the ManagedCluster
from which it was derived. This reference ensures that deleting the ManagedCluster
CR causes the instance of the ClusterGroupUpgrade
to be deleted along with any supporting resources.
Monitoring deployment progress
The ArgoCD pipeline uses the SiteConfig
and PolicyGenTemplate
CRs in Git to generate the cluster configuration CRs and RHACM policies and then sync them to the hub. You can monitor the progress of this synchronization can be monitored in the ArgoCD dashboard.
Procedure
When the synchronization is complete, the installation generally proceeds as follows:
The Assisted Service Operator installs OKD on the cluster. You can monitor the progress of cluster installation from the RHACM dashboard or from the command line:
$ export CLUSTER=<clusterName>
$ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
$ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
The Topology Aware Lifecycle Manager (TALM) applies the configuration policies that are bound to the cluster.
After the cluster installation is complete and the cluster becomes
Ready
, aClusterGroupUpgrade
CR corresponding to this cluster, with a list of ordered policies defined by theran.openshift.io/ztp-deploy-wave annotations
, is automatically created by the TALM. The cluster’s policies are applied in the order listed inClusterGroupUpgrade
CR. You can monitor the high-level progress of configuration policy reconciliation using the following commands:$ export CLUSTER=<clusterName>
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
You can monitor the detailed policy compliant status using the RHACM dashboard or the command line:
$ oc get policies -n $CLUSTER
The final policy that becomes compliant is the one defined in the *-du-validator-policy
policies. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.
After all policies become complaint, the ztp-done
label is added to the cluster, indicating the entire ZTP pipeline is complete for the cluster.
Indication of done for ZTP installations
Zero touch provisioning (ZTP) simplifies the process of checking the ZTP installation status for a cluster. The ZTP status moves through three phases: cluster installation, cluster configuration, and ZTP done.
Cluster installation phase
The cluster installation phase is shown by the ManagedCluster
CR ManagedClusterJoined
condition. If the ManagedCluster
CR does not have this condition, or the condition is set to False
, the cluster is still in the installation phase. Additional details about installation are available from the AgentClusterInstall
and ClusterDeployment
CRs. For more information, see “Troubleshooting GitOps ZTP”.
Cluster configuration phase
The cluster configuration phase is shown by a ztp-running
label applied the ManagedCluster
CR for the cluster.
ZTP done
Cluster installation and configuration is complete in the ZTP done phase. This is shown by the removal of the ztp-running
label and addition of the ztp-done
label to the ManagedCluster
CR. The ztp-done
label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.
The transition to the ZTP done state is conditional on the compliant state of a Red Hat Advanced Cluster Management (RHACM) static validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the spoke cluster is complete.
The validator inform policy ensures the configuration of the distributed unit (DU) cluster is fully applied and Operators have completed their initialization. The policy validates the following:
The target
MachineConfigPool
contains the expected entries and has finished updating. All nodes are available and not degraded.The SR-IOV Operator has completed initialization as indicated by at least one
SriovNetworkNodeState
withsyncStatus: Succeeded
.The PTP Operator daemon set exists.
The policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the spoke cluster is complete.
The validator inform policy is included in the reference group
PolicyGenTemplate
CRs. For reliable indication of the ZTP done state, this validator inform policy must be included in the ZTP pipeline.
Creating a validator inform policy
Use the following procedure to create a validator inform policy that provides an indication of when the zero touch provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy can be used for deployments of single node clusters, three-node clusters, and standard clusters.
Procedure
Create a stand-alone
PolicyGenTemplate
custom resource (CR) that contains the source filevalidatorCRs/informDuValidator.yaml
. You only need one stand-alonePolicyGenTemplate
CR for each cluster type.Single node clusters
group-du-sno-validator-ranGen.yaml
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-sno-validator" (1)
namespace: "ztp-group" (2)
spec:
bindingRules:
group-du-sno: "" (3)
bindingExcludedRules:
ztp-done: "" (4)
mcp: "master" (5)
sourceFiles:
- fileName: validatorCRs/informDuValidator.yaml
remediationAction: inform (6)
policyName: "du-policy" (7)
Three-node clusters
group-du-3node-validator-ranGen.yaml
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-3node-validator" (1)
namespace: "ztp-group" (2)
spec:
bindingRules:
group-du-3node: "" (3)
bindingExcludedRules:
ztp-done: "" (4)
mcp: "master" (5)
sourceFiles:
- fileName: validatorCRs/informDuValidator.yaml
remediationAction: inform (6)
policyName: "du-policy" (7)
Standard clusters
group-du-standard-validator-ranGen.yaml
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "group-du-standard-validator" (1)
namespace: "ztp-group" (2)
spec:
bindingRules:
group-du-standard: "" (3)
bindingExcludedRules:
ztp-done: "" (4)
mcp: "worker" (5)
sourceFiles:
- fileName: validatorCRs/informDuValidator.yaml
remediationAction: inform (6)
policyName: "du-policy" (7)
1 The name of PolicyGenTemplates
object. This name is also used as part of the names for theplacementBinding
,placementRule
, andpolicy
that are created in the requestednamespace
.2 This value should match the namespace
used in the groupPolicyGenTemplates
.3 The group-du-*
label defined inbindingRules
must exist in theSiteConfig
files.4 The label defined in bindingExcludedRules
must beztp-done:
. Theztp-done
label is used in coordination with the Topology Aware Lifecycle Manager.5 mcp
defines theMachineConfigPool
object that is used in the source filevalidatorCRs/informDuValidator.yaml
. It should bemaster
for single node and three-node cluster deployments andworker
for standard cluster deployments.6 Optional. The default value is inform
.7 This value is used as part of the name for the generated RHACM policy. The generated validator policy for the single node example is named group-du-sno-validator-du-policy
.Push the files to the ZTP Git repository.
Querying the policy compliance status for each cluster
After you have created the validator inform policies for your clusters and pushed them to the the zero touch provisioning (ZTP) Git repository, you can check the status of each cluster for policy compliance.
Procedure
To query the status of the spoke clusters, use either the Red Hat Advanced Cluster Management (RHACM) web console or the CLI:
To query status from the RHACM web console, perform the following actions:
Click Governance → Find policies.
Search for du-validator-policy.
Click into the policy.
To query status using the CLI, run the following command:
$ oc get policies du-validator-policy -n <namespace_for_common> -o jsonpath={'.status.status'} | jq
When all of the policies including the validator inform policy applied to the cluster become compliant, ZTP installation and configuration for this cluster is complete.
To query the cluster violation/compliant status from the ACM web console, click Governance → Cluster violations.
Check the validator policy compliant status for a cluster using the following commands:
Export the cluster name:
$ export CLUSTER=<cluster_name>
Get the policy:
$ oc get policies -n $CLUSTER | grep <validator_policy_name>
Alternatively, you can use the following command:
$ oc get policies -n <namespace-for-group> <validatorPolicyName> -o jsonpath="{.status.status[?(@.clustername=='$CLUSTER')]}" | jq
After the
*-validator-du-policy
RHACM policy becomes compliant for the cluster, the validator policy is unbound for this cluster and theztp-done
label is added to the cluster. This acts as a persistent indicator that the whole ZTP pipeline has completed for the cluster.
Node Tuning Operator
The Node Tuning Operator provides the ability to enable advanced node performance tunings on a set of nodes.
OKD provides a Node Tuning Operator to implement automatic tuning to achieve low latency performance for OKD applications. The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way.
The administrator can specify updating the kernel to rt-kernel
, reserving CPUs for management workloads, and using CPUs for running the workloads.
In earlier versions of OKD, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OKD 4.11, these functions are part of the Node Tuning Operator. |
Troubleshooting GitOps ZTP
The ArgoCD pipeline uses the SiteConfig
and PolicyGenTemplate
custom resources (CRs) from Git to generate the cluster configuration CRs and Red Hat Advanced Cluster Management (RHACM) policies. Use the following steps to troubleshoot issues that might occur during this process.
file// Module included in the following assemblies:
Validating the generation of installation CRs
The GitOps zero touch provisioning (ZTP) infrastructure generates a set of installation CRs on the hub cluster in response to a SiteConfig
CR pushed to your Git repository. You can check that the installation CRs were created by using the following command:
$ oc get AgentClusterInstall -n <cluster_name>
If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from SiteConfig
files to the installation CRs.
Procedure
Verify that the
SiteConfig→ManagedCluster
was generated to the hub cluster:$ oc get managedcluster
If the
SiteConfig
ManagedCluster
is missing, see if theclusters
application failed to synchronize the files from the Git repository to the hub:$ oc describe -n openshift-gitops application clusters
Check for
Status: Conditions:
to view the error logs. For example, setting an invalid value forextraManifestPath:
in thesiteConfig
file raises an error as shown below:Status:
Conditions:
Last Transition Time: 2021-11-26T17:21:39Z
Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/siteconfigs/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not create extra-manifest ranSite1.extra-manifest3 stat extra-manifest3: no such file or directory
2021/11/26 17:21:40 Error: could not build the entire SiteConfig defined by /tmp/kust-plugin-config-913473579: stat extra-manifest3: no such file or directory
Error: failure in plugin configured via /tmp/kust-plugin-config-913473579; exit status 1: exit status 1
Type: ComparisonError
Check for
Status: Sync:
. If there are log errors,Status: Sync:
could indicate anUnknown
error:Status:
Sync:
Compared To:
Destination:
Namespace: clusters-sub
Server: https://kubernetes.default.svc
Source:
Path: sites-config
Repo URL: https://git.com/ran-sites/siteconfigs/.git
Target Revision: master
Status: Unknown
Validating the generation of configuration policy CRs
Policy custom resources (CRs) are generated in the same namespace as the PolicyGenTemplate
from which they are created. The same troubleshooting flow applies to all policy CRs generated from a PolicyGenTemplate
regardless of whether they are ztp-common
, ztp-group
, or ztp-site
based, as shown using the following commands:
$ export NS=<namespace>
$ oc get policy -n $NS
The expected set of policy-wrapped CRs should be displayed.
If the policies failed synchronization, use the following troubleshooting steps.
Procedure
To display detailed information about the policies, run the following command:
$ oc describe -n openshift-gitops application policies
Check for
Status: Conditions:
to show the error logs. For example, setting an invalidsourceFile→fileName:
generates the error shown below:Status:
Conditions:
Last Transition Time: 2021-11-26T17:21:39Z
Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory
Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1
Type: ComparisonError
Check for
Status: Sync:
. If there are log errors atStatus: Conditions:
, theStatus: Sync:
showsUnknown
orError
:Status:
Sync:
Compared To:
Destination:
Namespace: policies-sub
Server: https://kubernetes.default.svc
Source:
Path: policies
Repo URL: https://git.com/ran-sites/policies/.git
Target Revision: master
Status: Error
When Red Hat Advanced Cluster Management (RHACM) recognizes that policies apply to a
ManagedCluster
object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:$ oc get policy -n $CLUSTER
Example output:
NAME REMEDIATION ACTION COMPLIANCE STATE AGE
ztp-common.common-config-policy inform Compliant 13d
ztp-common.common-subscriptions-policy inform Compliant 13d
ztp-group.group-du-sno-config-policy inform Compliant 13d
Ztp-group.group-du-sno-validator-du-policy inform Compliant 13d
ztp-site.example-sno-config-policy inform Compliant 13d
RHACM copies all applicable policies into the cluster namespace. The copied policy names have the format:
<policyGenTemplate.Namespace>.<policyGenTemplate.Name>-<policyName>
.Check the placement rule for any policies not copied to the cluster namespace. The
matchSelector
in thePlacementRule
for those policies should match labels on theManagedCluster
object:$ oc get placementrule -n $NS
Note the
PlacementRule
name appropriate for the missing policy, common, group, or site, using the following command:$ oc get placementrule -n $NS <placementRuleName> -o yaml
The status-decisions should include your cluster name.
The key-value pair of the
matchSelector
in the spec must match the labels on your managed cluster.
Check the labels on the
ManagedCluster
object using the following command:$ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
Check to see which policies are compliant using the following command:
$ oc get policy -n $CLUSTER
If the
Namespace
,OperatorGroup
, andSubscription
policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the spoke cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.
Restarting policies reconciliation
Use the following procedure to restart policies reconciliation in the event of unexpected compliance issues. This procedure is required when the ClusterGroupUpgrade
CR has timed out.
Procedure
A
ClusterGroupUpgrade
CR is generated in the namespaceztp-install
by the Topology Aware Lifecycle Manager after the managed spoke cluster becomesReady
:$ export CLUSTER=<clusterName>
$ oc get clustergroupupgrades -n ztp-install $CLUSTER
If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the
ClusterGroupUpgrade
CR showsUpgradeTimedOut
:$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
A
ClusterGroupUpgrade
CR in theUpgradeTimedOut
state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existingClusterGroupUpgrade
CR. This triggers the automatic creation of a newClusterGroupUpgrade
CR that begins reconciling the policies immediately:$ oc delete clustergroupupgrades -n ztp-install $CLUSTER
Note that when the ClusterGroupUpgrade
CR completes with status UpgradeCompleted
and the managed spoke cluster has the label ztp-done
applied, you can make additional configuration changes using PolicyGenTemplate
. Deleting the existing ClusterGroupUpgrade
CR will not make the TALM generate a new CR.
At this point, ZTP has completed its interaction with the cluster and any further interactions should be treated as an upgrade.
Additional resources
- For information about using TALM to construct your own
ClusterGroupUpgrade
CR, see About the ClusterGroupUpgrade CR.
Site cleanup
Remove a site and the associated installation and configuration policy CRs by removing the SiteConfig
and PolicyGenTemplate
file names from the kustomization.yaml
file. When you run the ZTP pipeline again, the generated CRs are removed. If you want to permanently remove a site, you should also remove the SiteConfig
and site-specific PolicyGenTemplate
files from the Git repository. If you want to remove a site temporarily, for example when redeploying a site, you can leave the SiteConfig
and site-specific PolicyGenTemplate
CRs in the Git repository.
After removing the |
Additional resources
- For information about removing a cluster, see Removing a cluster from management.
Removing obsolete content
If a change to the PolicyGenTemplate
file configuration results in obsolete policies, for example, policies are renamed, use the following procedure to remove those policies in an automated way.
Procedure
Remove the affected
PolicyGenTemplate
files from the Git repository, commit and push to the remote repository.Wait for the changes to synchronize through the application and the affected policies to be removed from the hub cluster.
Add the updated
PolicyGenTemplate
files back to the Git repository, and then commit and push to the remote repository.
Note that removing the zero touch provisioning (ZTP) distributed unit (DU) profile policies from the Git repository, and as a result also removing them from the hub cluster, does not affect any configuration of the managed spoke clusters. Removing a policy from the hub cluster does not delete it from the spoke cluster and the CRs managed by that policy.
As an alternative, after making changes to PolicyGenTemplate
files that result in obsolete policies, you can remove these policies from the hub cluster manually. You can delete policies from the RHACM console using the Governance tab or by using the following command:
$ oc delete policy -n <namespace> <policyName>
Tearing down the pipeline
If you need to remove the ArgoCD pipeline and all generated artifacts follow this procedure:
Procedure
Detach all clusters from RHACM.
Delete the
kustomization.yaml
file in thedeployment
directory using the following command:$ oc delete -k out/argocd/deployment
Upgrading GitOps ZTP
You can upgrade the Gitops zero touch provisioning (ZTP) infrastructure independently from the underlying cluster, Red Hat Advanced Cluster Management (RHACM), and OKD version running on the spoke clusters. This procedure guides you through the upgrade process to avoid impact on the spoke clusters. However, any changes to the content or settings of policies, including adding recommended content, results in changes that must be rolled out and reconciled to the spoke clusters.
Prerequisites
- This procedure assumes that you have a fully operational hub cluster running the earlier version of the GitOps ZTP infrastructure.
Procedure
At a high level, the strategy for upgrading the GitOps ZTP infrastructure is:
Label all existing clusters with the
ztp-done
label.Stop the ArgoCD applications.
Install the new tooling.
Update required content and optional changes in the Git repository.
Update and restart the application configuration.
Preparing for the upgrade
Use the following procedure to prepare your site for the GitOps zero touch provisioning (ZTP) upgrade.
Procedure
Obtain the latest version of the GitOps ZTP container from which you can extract a set of custom resources (CRs) used to configure the GitOps operator on the hub cluster for use in the GitOps ZTP solution.
Extract the
argocd/deployment
directory using the following commands:$ mkdir -p ./out
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.10 extract /home/ztp --tar | tar x -C ./out
The
/out
directory contains the following subdirectories:out/extra-manifest
: contains the source CR files that theSiteConfig
CR uses to generate the extra manifestconfigMap
.out/source-crs
: contains the source CR files that thePolicyGenTemplate
CR uses to generate the Red Hat Advanced Cluster Management (RHACM) policies.out/argocd/deployment
: contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure.out/argocd/example
: contains exampleSiteConfig
andPolicyGenTemplate
files that represent the recommended configuration.
Update the
clusters-app.yaml
andpolicies-app.yaml
files to reflect the name of your applications and the URL, branch, and path for your Git repository.
If the upgrade includes changes to policies that may result in obsolete policies, these policies should be removed prior to performing the upgrade.
Labeling the existing clusters
To ensure that existing clusters remain untouched by the tooling updates, all existing managed clusters must be labeled with the ztp-done
label.
Procedure
Find a label selector that lists the managed clusters that were deployed with zero touch provisioning (ZTP), such as
local-cluster!=true
:$ oc get managedcluster -l 'local-cluster!=true'
Ensure that the resulting list contains all the managed clusters that were deployed with ZTP, and then use that selector to add the
ztp-done
label:$ oc label managedcluster -l 'local-cluster!=true' ztp-done=
Stopping the existing GitOps ZTP applications
Removing the existing applications ensures that any changes to existing content in the Git repository are not rolled out until the new version of the tooling is available.
Use the application files from the deployment
directory. If you used custom names for the applications, update the names in these files first.
Procedure
Perform a non-cascaded delete on the
clusters
application to leave all generated resources in place:$ oc delete -f out/argocd/deployment/clusters-app.yaml
Perform a cascaded delete on the
policies
application to remove all previous policies:$ oc patch -f policies-app.yaml -p '{"metadata": {"finalizers": ["resources-finalizer.argocd.argoproj.io"]}}' --type merge
$ oc delete -f out/argocd/deployment/policies-app.yaml
Topology Aware Lifecycle Manager
Install the Topology Aware Lifecycle Manager (TALM) on the hub cluster.
Additional resources
- For information about the Topology Aware Lifecycle Manager (TALM), see About the Topology Aware Lifecycle Manager configuration.
Required changes to the Git repository
When upgrading from an earlier release to OKD 4.10, additional requirements are placed on the contents of the Git repository. Existing content in the repository must be updated to reflect these changes.
Changes to
PolicyGenTemplate
files:All
PolicyGenTemplate
files must be created in aNamespace
prefixed withztp
. This ensures that the GitOps zero touch provisioning (ZTP) application is able to manage the policy CRs generated by GitOps ZTP without conflicting with the way Red Hat Advanced Cluster Management (RHACM) manages the policies internally.Remove the
pre-sync.yaml
andpost-sync.yaml
files:This step is optional but recommended. When the
kustomization.yaml
files are added, thepre-sync.yaml
andpost-sync.yaml
files are no longer used. They must be removed to avoid confusion and can potentially cause errors if kustomization files are inadvertantly removed. Note that there is a set ofpre-sync.yaml
andpost-sync.yaml
files under both theSiteConfig
andPolicyGenTemplate
trees.Add the
kustomization.yaml
file to the repository:All
SiteConfig
andPolicyGenTemplate
CRs must be included in akustomization.yaml
file under their respective directory trees. For example:├── policygentemplates
│ ├── site1-ns.yaml
│ ├── site1.yaml
│ ├── site2-ns.yaml
│ ├── site2.yaml
│ ├── common-ns.yaml
│ ├── common-ranGen.yaml
│ ├── group-du-sno-ranGen-ns.yaml
│ ├── group-du-sno-ranGen.yaml
│ └── kustomization.yaml
└── siteconfig
├── site1.yaml
├── site2.yaml
└── kustomization.yaml
The files listed in the
generator
sections must contain eitherSiteConfig
orPolicyGenTemplate
CRs only. If your existing YAML files contain other CRs, for example,Namespace
, these other CRs must be pulled out into separate files and listed in theresources
section.The
PolicyGenTemplate
kustomization file must contain allPolicyGenTemplate
YAML files in thegenerator
section andNamespace
CRs in theresources
section. For example:apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
generators:
- common-ranGen.yaml
- group-du-sno-ranGen.yaml
- site1.yaml
- site2.yaml
resources:
- common-ns.yaml
- group-du-sno-ranGen-ns.yaml
- site1-ns.yaml
- site2-ns.yaml
The
SiteConfig
kustomization file must contain allSiteConfig
YAML files in thegenerator
section and any other CRs in the resources:apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
generators:
- site1.yaml
- site2.yaml
Review and incorporate recommended changes
Each release may include additional recommended changes to the configuration applied to deployed clusters. Typically these changes result in lower CPU use by the OpenShift platform, additional features, or improved tuning of the platform.
Review the reference
SiteConfig
andPolicyGenTemplate
CRs applicable to the types of cluster in your network. These examples can be found in theargocd/example
directory extracted from the GitOps ZTP container.
Installing the new GitOps ZTP applications
Using the extracted argocd/deployment
directory, and after ensuring that the applications point to your Git repository, apply the full contents of the deployment directory. Applying the full contents of the directory ensures that all necessary resources for the applications are correctly configured.
Procedure
To patch the ArgoCD instance in the hub cluster by using the patch file previously extracted into the
out/argocd/deployment/
directory, enter the following command:$ oc patch argocd openshift-gitops \
-n openshift-gitops --type=merge \
--patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
To apply the contents of the
argocd/deployment
directory, enter the following command:$ oc apply -k out/argocd/deployment
Roll out the configuration changes
If any configuration changes were included in the upgrade due to implementing recommended changes, the upgrade process results in a set of policy CRs on the hub cluster in the Non-Compliant
state. As of the OKD 4.10 release, these policies are set to inform
mode and are not pushed to the spoke clusters without an additional step by the user. This ensures that potentially disruptive changes to the clusters can be managed in terms of when the changes are made, for example, during a maintenance window, and how many clusters are updated concurrently.
To roll out the changes, create one or more ClusterGroupUpgrade
CRs as detailed in the TALM documentation. The CR must contain the list of Non-Compliant
policies that you want to push out to the spoke clusters as well as a list or selector of which clusters should be included in the update.
Additional resources
- For information about creating
ClusterGroupUpgrade
CRs, see About the auto-created ClusterGroupUpgrade CR for ZTP.
Manually install a single managed cluster
This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the SiteConfig
method described in “Creating ZTP custom resources for multiple managed clusters”.
Prerequisites
Enable the Assisted Installer service.
Ensure network connectivity:
The container within the hub must be able to reach the Baseboard Management Controller (BMC) address of the target bare-metal host.
The managed cluster must be able to resolve and reach the hub’s API
hostname
and*.app
hostname. Here is an example of the hub’s API and*.app
hostname:console-openshift-console.apps.hub-cluster.internal.domain.com
api.hub-cluster.internal.domain.com
The hub must be able to resolve and reach the API and
*.app
hostname of the managed cluster. Here is an example of the managed cluster’s API and*.app
hostname:console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
api.sno-managed-cluster-1.internal.domain.com
A DNS server that is IP reachable from the target bare-metal host.
A target bare-metal host for the managed cluster with the following hardware minimums:
4 CPU or 8 vCPU
32 GiB RAM
120 GiB disk for root file system
When working in a disconnected environment, the release image must be mirrored. Use this command to mirror the release image:
$ oc adm release mirror -a <pull_secret.json>
--from=quay.io/openshift-release-dev/ocp-release:{{ mirror_version_spoke_release }}
--to={{ provisioner_cluster_registry }}/ocp4 --to-release-image={{
provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }}
You mirrored the ISO and
rootfs
used to generate the spoke cluster ISO to an HTTP server and configured the settings to pull images from there.The images must match the version of the
ClusterImageSet
. To deploy a 4.9.0 version, therootfs
and ISO must be set at 4.9.0.
Procedure
Create a
ClusterImageSet
for each specific cluster version that needs to be deployed. AClusterImageSet
has the following format:apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
name: openshift-4.9.0-rc.0 (1)
spec:
releaseImage: quay.io/openshift-release-dev/ocp-release:4.9.0-x86_64 (2)
1 The descriptive version that you want to deploy. 2 Specifies the releaseImage
to deploy and determines the OS Image version. The discovery ISO is based on an OS image version as thereleaseImage
, or latest if the exact version is unavailable.Create the
Namespace
definition for the managed cluster:apiVersion: v1
kind: Namespace
metadata:
name: <cluster_name> (1)
labels:
name: <cluster_name> (1)
1 The name of the managed cluster to provision. Create the
BMC Secret
custom resource:apiVersion: v1
data:
password: <bmc_password> (1)
username: <bmc_username> (2)
kind: Secret
metadata:
name: <cluster_name>-bmc-secret
namespace: <cluster_name>
type: Opaque
1 The password to the target bare-metal host. Must be base-64 encoded. 2 The username to the target bare-metal host. Must be base-64 encoded. Create the
Image Pull Secret
custom resource:apiVersion: v1
data:
.dockerconfigjson: <pull_secret> (1)
kind: Secret
metadata:
name: assisted-deployment-pull-secret
namespace: <cluster_name>
type: kubernetes.io/dockerconfigjson
1 The OKD pull secret. Must be base-64 encoded. Create the
AgentClusterInstall
custom resource:apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
metadata:
# Only include the annotation if using OVN, otherwise omit the annotation
annotations:
agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
name: <cluster_name>
namespace: <cluster_name>
spec:
clusterDeploymentRef:
name: <cluster_name>
imageSetRef:
name: <cluster_image_set> (1)
networking:
clusterNetwork:
- cidr: <cluster_network_cidr> (2)
hostPrefix: 23
machineNetwork:
- cidr: <machine_network_cidr> (3)
serviceNetwork:
- <service_network_cidr> (4)
provisionRequirements:
controlPlaneAgents: 1
workerAgents: 0
sshPublicKey: <public_key> (5)
1 The name of the ClusterImageSet
custom resource used to install OKD on the bare-metal host.2 A block of IPv4 or IPv6 addresses in CIDR notation used for communication among cluster nodes. 3 A block of IPv4 or IPv6 addresses in CIDR notation used for the target bare-metal host external communication. Also used to determine the API and Ingress VIP addresses when provisioning DU single-node clusters. 4 A block of IPv4 or IPv6 addresses in CIDR notation used for cluster services internal communication. 5 A plain text string. You can use the public key to SSH into the node after it has finished installing. If you want to configure a static IP address for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters.
Create the
ClusterDeployment
custom resource:apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
name: <cluster_name>
namespace: <cluster_name>
spec:
baseDomain: <base_domain> (1)
clusterInstallRef:
group: extensions.hive.openshift.io
kind: AgentClusterInstall
name: <cluster_name>
version: v1beta1
clusterName: <cluster_name>
platform:
agentBareMetal:
agentSelector:
matchLabels:
cluster-name: <cluster_name>
pullSecretRef:
name: assisted-deployment-pull-secret
1 The managed cluster’s base domain. Create the
KlusterletAddonConfig
custom resource:apiVersion: agent.open-cluster-management.io/v1
kind: KlusterletAddonConfig
metadata:
name: <cluster_name>
namespace: <cluster_name>
spec:
clusterName: <cluster_name>
clusterNamespace: <cluster_name>
clusterLabels:
cloud: auto-detect
vendor: auto-detect
applicationManager:
enabled: true
certPolicyController:
enabled: false
iamPolicyController:
enabled: false
policyController:
enabled: true
searchCollector:
enabled: false (1)
1 Keep searchCollector
disabled. Set totrue
to enable theKlusterletAddonConfig
CR orfalse
to disable theKlusterletAddonConfig
CR.Create the
ManagedCluster
custom resource:apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
name: <cluster_name>
spec:
hubAcceptsClient: true
Create the
InfraEnv
custom resource:apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
name: <cluster_name>
namespace: <cluster_name>
spec:
clusterRef:
name: <cluster_name>
namespace: <cluster_name>
sshAuthorizedKey: <public_key> (1)
agentLabelSelector:
matchLabels:
cluster-name: <cluster_name>
pullSecretRef:
name: assisted-deployment-pull-secret
1 Entered as plain text. You can use the public key to SSH into the target bare-metal host when it boots from the ISO. Create the
BareMetalHost
custom resource:apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: <cluster_name>
namespace: <cluster_name>
annotations:
inspect.metal3.io: disabled
labels:
infraenvs.agent-install.openshift.io: "<cluster_name>"
spec:
bootMode: "UEFI"
bmc:
address: <bmc_address> (1)
disableCertificateVerification: true
credentialsName: <cluster_name>-bmc-secret
bootMACAddress: <mac_address> (2)
automatedCleaningMode: disabled
online: true
1 The baseboard management console address of the installation ISO on the target bare-metal host. 2 The MAC address of the target bare-metal host. Optionally, you can add
bmac.agent-install.openshift.io/hostname: <host-name>
as an annotation to set the managed cluster’s hostname. If you don’t add the annotation, the hostname will default to either a hostname from the DHCP server or local host.After you have created the custom resources, push the entire directory of generated custom resources to the Git repository you created for storing the custom resources.
Next steps
To provision additional clusters, repeat this procedure for each cluster.
Configuring BIOS for distributed unit bare-metal hosts
Distributed unit (DU) hosts require the BIOS to be configured before the host can be provisioned. The BIOS configuration is dependent on the specific hardware that runs your DUs and the particular requirements of your installation.
Procedure
Set the UEFI/BIOS Boot Mode to
UEFI
.In the host boot sequence order, set Hard drive first.
Apply the specific BIOS configuration for your hardware. The following table describes a representative BIOS configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.
The exact BIOS configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.
Table 2. Sample BIOS configuration for an Intel Xeon Skylake or Cascade Lake server BIOS Setting Configuration CPU Power and Performance Policy
Performance
Uncore Frequency Scaling
Disabled
Performance P-limit
Disabled
Enhanced Intel SpeedStep ® Tech
Enabled
Intel Configurable TDP
Enabled
Configurable TDP Level
Level 2
Intel® Turbo Boost Technology
Enabled
Energy Efficient Turbo
Disabled
Hardware P-States
Disabled
Package C-State
C0/C1 state
C1E
Disabled
Processor C6
Disabled
Enable global SR-IOV and VT-d settings in the BIOS for the host. These settings are relevant to bare-metal environments. |
Configuring static IP addresses for managed clusters
Optionally, after creating the AgentClusterInstall
custom resource, you can configure static IP addresses for the managed clusters.
You must create this custom resource before creating the |
Prerequisites
- Deploy and configure the
AgentClusterInstall
custom resource.
Procedure
Create a
NMStateConfig
custom resource:apiVersion: agent-install.openshift.io/v1beta1
kind: NMStateConfig
metadata:
name: <cluster_name>
namespace: <cluster_name>
labels:
sno-cluster-<cluster-name>: <cluster_name>
spec:
config:
interfaces:
- name: eth0
type: ethernet
state: up
ipv4:
enabled: true
address:
- ip: <ip_address> (1)
prefix-length: <public_network_prefix> (2)
dhcp: false
dns-resolver:
config:
server:
- <dns_resolver> (3)
routes:
config:
- destination: 0.0.0.0/0
next-hop-address: <gateway> (4)
next-hop-interface: eth0
table-id: 254
interfaces:
- name: "eth0" (5)
macAddress: <mac_address> (6)
1 The static IP address of the target bare-metal host. 2 The static IP address’s subnet prefix for the target bare-metal host. 3 The DNS server for the target bare-metal host. 4 The gateway for the target bare-metal host. 5 Must match the name specified in the interfaces
section.6 The mac address of the interface. When creating the
BareMetalHost
custom resource, ensure that one of its mac addresses matches a mac address in theNMStateConfig
target bare-metal host.When creating the
InfraEnv
custom resource, reference the label from theNMStateConfig
custom resource in theInfraEnv
custom resource:apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
name: <cluster_name>
namespace: <cluster_name>
spec:
clusterRef:
name: <cluster_name>
namespace: <cluster_name>
sshAuthorizedKey: <public_key>
agentLabelSelector:
matchLabels:
cluster-name: <cluster_name>
pullSecretRef:
name: assisted-deployment-pull-secret
nmStateConfigLabelSelector:
matchLabels:
sno-cluster-<cluster-name>: <cluster_name> # Match this label
Automated Discovery image ISO process for provisioning clusters
After you create the custom resources, the following actions happen automatically:
A Discovery image ISO file is generated and booted on the target machine.
When the ISO file successfully boots on the target machine it reports the hardware information of the target machine.
After all hosts are discovered, OKD is installed.
When OKD finishes installing, the hub installs the
klusterlet
service on the target cluster.The requested add-on services are installed on the target cluster.
The Discovery image ISO process finishes when the Agent
custom resource is created on the hub for the managed cluster.
Checking the managed cluster status
Ensure that cluster provisioning was successful by checking the cluster status.
Prerequisites
- All of the custom resources have been configured and provisioned, and the
Agent
custom resource is created on the hub for the managed cluster.
Procedure
Check the status of the managed cluster:
$ oc get managedcluster
True
indicates the managed cluster is ready.Check the agent status:
$ oc get agent -n <cluster_name>
Use the
describe
command to provide an in-depth description of the agent’s condition. Statuses to be aware of includeBackendError
,InputError
,ValidationsFailing
,InstallationFailed
, andAgentIsConnected
. These statuses are relevant to theAgent
andAgentClusterInstall
custom resources.$ oc describe agent -n <cluster_name>
Check the cluster provisioning status:
$ oc get agentclusterinstall -n <cluster_name>
Use the
describe
command to provide an in-depth description of the cluster provisioning status:$ oc describe agentclusterinstall -n <cluster_name>
Check the status of the managed cluster’s add-on services:
$ oc get managedclusteraddon -n <cluster_name>
Retrieve the authentication information of the
kubeconfig
file for the managed cluster:$ oc get secret -n <cluster_name> <cluster_name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster_name>-kubeconfig
Configuring a managed cluster for a disconnected environment
After you have completed the preceding procedure, follow these steps to configure the managed cluster for a disconnected environment.
Prerequisites
A disconnected installation of Red Hat Advanced Cluster Management (RHACM) 2.3.
Host the
rootfs
andiso
images on an HTTPD server.
If you enable TLS for the HTTPD server, you must confirm the root certificate is signed by an authority trusted by the client and verify the trusted certificate chain between your OKD hub and spoke clusters and the HTTPD server. Using a server configured with an untrusted certificate prevents the images from being downloaded to the image creation service. Using untrusted HTTPS servers is not supported. |
Procedure
Create a
ConfigMap
containing the mirror registry config:apiVersion: v1
kind: ConfigMap
metadata:
name: assisted-installer-mirror-config
namespace: assisted-installer
labels:
app: assisted-service
data:
ca-bundle.crt: <certificate> (1)
registries.conf: | (2)
unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
[[registry]]
location = <mirror_registry_url> (3)
insecure = false
mirror-by-digest-only = true
1 The mirror registry’s certificate used when creating the mirror registry. 2 The configuration for the mirror registry. 3 The URL of the mirror registry. This updates
mirrorRegistryRef
in theAgentServiceConfig
custom resource, as shown below:Example output
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
name: agent
namespace: assisted-installer
spec:
databaseStorage:
volumeName: <db_pv_name>
accessModes:
- ReadWriteOnce
resources:
requests:
storage: <db_storage_size>
filesystemStorage:
volumeName: <fs_pv_name>
accessModes:
- ReadWriteOnce
resources:
requests:
storage: <fs_storage_size>
mirrorRegistryRef:
name: 'assisted-installer-mirror-config'
osImages:
- openshiftVersion: <ocp_version>
rootfs: <rootfs_url> (1)
url: <iso_url> (1)
1 Must match the URLs of the HTTPD server. For disconnected installations, you must deploy an NTP clock that is reachable through the disconnected network. You can do this by configuring chrony to act as server, editing the
/etc/chrony.conf
file, and adding the following allowed IPv6 range:# Allow NTP client access from local network.
#allow 192.168.0.0/16
local stratum 10
bindcmdaddress ::
allow 2620:52:0:1310::/64
Configuring IPv6 addresses for a disconnected environment
Optionally, when you are creating the AgentClusterInstall
custom resource, you can configure IPV6 addresses for the managed clusters.
Procedure
In the
AgentClusterInstall
custom resource, modify the IP addresses inclusterNetwork
andserviceNetwork
for IPv6 addresses:apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
metadata:
# Only include the annotation if using OVN, otherwise omit the annotation
annotations:
agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
name: <cluster_name>
namespace: <cluster_name>
spec:
clusterDeploymentRef:
name: <cluster_name>
imageSetRef:
name: <cluster_image_set>
networking:
clusterNetwork:
- cidr: "fd01::/48"
hostPrefix: 64
machineNetwork:
- cidr: <machine_network_cidr>
serviceNetwork:
- "fd02::/112"
provisionRequirements:
controlPlaneAgents: 1
workerAgents: 0
sshPublicKey: <public_key>
Update the
NMStateConfig
custom resource with the IPv6 addresses you defined.
Generating RAN policies
Prerequisites
Install Kustomize
Install the Kustomize Policy Generator plug-in
Procedure
Configure the
kustomization.yaml
file to reference thepolicyGenerator.yaml
file. The following example shows the PolicyGenerator definition:apiVersion: policyGenerator/v1
kind: PolicyGenerator
metadata:
name: acm-policy
namespace: acm-policy-generator
# The arguments should be given and defined as below with same order --policyGenTempPath= --sourcePath= --outPath= --stdout --customResources
argsOneLiner: ./ranPolicyGenTempExamples ./sourcePolicies ./out true false
Where:
policyGenTempPath
is the path to thepolicyGenTemp
files.sourcePath
: is the path to the source policies.outPath
: is the path to save the generated ACM policies.stdout
: Iftrue
, prints the generated policies to the console.customResources
: Iftrue
generates the CRs from thesourcePolicies
files without ACM policies.
Test PolicyGen by running the following commands:
$ cd cnf-features-deploy/ztp/ztp-policy-generator/
$ XDG_CONFIG_HOME=./ kustomize build --enable-alpha-plugins
An
out
directory is created with the expected policies, as shown in this example:out
├── common
│ ├── common-log-sub-ns-policy.yaml
│ ├── common-log-sub-oper-policy.yaml
│ ├── common-log-sub-policy.yaml
│ ├── common-nto-sub-catalog-policy.yaml
│ ├── common-nto-sub-ns-policy.yaml
│ ├── common-nto-sub-oper-policy.yaml
│ ├── common-nto-sub-policy.yaml
│ ├── common-policies-placementbinding.yaml
│ ├── common-policies-placementrule.yaml
│ ├── common-ptp-sub-ns-policy.yaml
│ ├── common-ptp-sub-oper-policy.yaml
│ ├── common-ptp-sub-policy.yaml
│ ├── common-sriov-sub-ns-policy.yaml
│ ├── common-sriov-sub-oper-policy.yaml
│ └── common-sriov-sub-policy.yaml
├── groups
│ ├── group-du
│ │ ├── group-du-mc-chronyd-policy.yaml
│ │ ├── group-du-mc-mount-ns-policy.yaml
│ │ ├── group-du-mcp-du-policy.yaml
│ │ ├── group-du-mc-sctp-policy.yaml
│ │ ├── group-du-policies-placementbinding.yaml
│ │ ├── group-du-policies-placementrule.yaml
│ │ ├── group-du-ptp-config-policy.yaml
│ │ └── group-du-sriov-operconfig-policy.yaml
│ └── group-sno-du
│ ├── group-du-sno-policies-placementbinding.yaml
│ ├── group-du-sno-policies-placementrule.yaml
│ ├── group-sno-du-console-policy.yaml
│ ├── group-sno-du-log-forwarder-policy.yaml
│ └── group-sno-du-log-policy.yaml
└── sites
└── site-du-sno-1
├── site-du-sno-1-policies-placementbinding.yaml
├── site-du-sno-1-policies-placementrule.yaml
├── site-du-sno-1-sriov-nn-fh-policy.yaml
├── site-du-sno-1-sriov-nnp-mh-policy.yaml
├── site-du-sno-1-sriov-nw-fh-policy.yaml
├── site-du-sno-1-sriov-nw-mh-policy.yaml
└── site-du-sno-1-.yaml
The common policies are flat because they will be applied to all clusters. However, the groups and sites have subdirectories for each group and site as they will be applied to different clusters.
Troubleshooting the managed cluster
Use this procedure to diagnose any installation issues that might occur with the managed clusters.
Procedure
Check the status of the managed cluster:
$ oc get managedcluster
Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
SNO-cluster true True True 2d19h
If the status in the
AVAILABLE
column isTrue
, the managed cluster is being managed by the hub.If the status in the
AVAILABLE
column isUnknown
, the managed cluster is not being managed by the hub. Use the following steps to continue checking to get more information.Check the
AgentClusterInstall
install status:$ oc get clusterdeployment -n <cluster_name>
Example output
NAME PLATFORM REGION CLUSTERTYPE INSTALLED INFRAID VERSION POWERSTATE AGE
Sno0026 agent-baremetal false Initialized
2d14h
If the status in the
INSTALLED
column isfalse
, the installation was unsuccessful.If the installation failed, enter the following command to review the status of the
AgentClusterInstall
resource:$ oc describe agentclusterinstall -n <cluster_name> <cluster_name>
Resolve the errors and reset the cluster:
Remove the cluster’s managed cluster resource:
$ oc delete managedcluster <cluster_name>
Remove the cluster’s namespace:
$ oc delete namespace <cluster_name>
This deletes all of the namespace-scoped custom resources created for this cluster. You must wait for the
ManagedCluster
CR deletion to complete before proceeding.Recreate the custom resources for the managed cluster.
Updating managed policies with the Topology Aware Lifecycle Manager
You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of multiple OpenShift clusters. TALM uses Red Hat Advanced Cluster Management (RHACM) policies to perform changes on the target clusters.
The Topology Aware Lifecycle Manager is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/. |
Additional resources
- For more information about the Topology Aware Lifecycle Manager, see About the Topology Aware Lifecycle Manager.
About the auto-created ClusterGroupUpgrade CR for ZTP
TALM has a controller called ManagedClusterForCGU
that monitors the Ready
state of the ManagedCluster
CRs on the hub cluster and creates the ClusterGroupUpgrade
CRs for ZTP (zero touch provisioning).
For any managed cluster in the Ready
state without a “ztp-done” label applied, the ManagedClusterForCGU
controller automatically creates a ClusterGroupUpgrade
CR in the ztp-install
namespace with its associated RHACM policies that are created during the ZTP process. TALM then remediates the set of configuration policies that are listed in the auto-created ClusterGroupUpgrade
CR to push the configuration CRs to the managed cluster.
If the managed cluster has no bound policies when the cluster becomes |
Example of an auto-created ClusterGroupUpgrade
CR for ZTP
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
generation: 1
name: spoke1
namespace: ztp-install
ownerReferences:
- apiVersion: cluster.open-cluster-management.io/v1
blockOwnerDeletion: true
controller: true
kind: ManagedCluster
name: spoke1
uid: 98fdb9b2-51ee-4ee7-8f57-a84f7f35b9d5
resourceVersion: "46666836"
uid: b8be9cd2-764f-4a62-87d6-6b767852c7da
spec:
actions:
afterCompletion:
addClusterLabels:
ztp-done: "" (1)
deleteClusterLabels:
ztp-running: ""
deleteObjects: true
beforeEnable:
addClusterLabels:
ztp-running: "" (2)
clusters:
- spoke1
enable: true
managedPolicies:
- common-spoke1-config-policy
- common-spoke1-subscriptions-policy
- group-spoke1-config-policy
- spoke1-config-policy
- group-spoke1-validator-du-policy
preCaching: false
remediationStrategy:
maxConcurrency: 1
timeout: 240
1 | Applied to the managed cluster when TALM completes the cluster configuration. |
2 | Applied to the managed cluster when TALM starts deploying the configuration policies. |
End-to-end procedures for updating clusters in a disconnected environment
If you have deployed spoke clusters with distributed unit (DU) profiles using the GitOps ZTP with the Topology Aware Lifecycle Manager (TALM) pipeline described in “Deploying distributed units at scale in a disconnected environment”, this procedure describes how to upgrade your spoke clusters and Operators.
Preparing for the updates
If both the hub and the spoke clusters are running OKD 4.9, you must update ZTP from version 4.9 to 4.10. If OKD 4.10 is used, you can set up the environment.
Setting up the environment
TALM can perform both platform and Operator updates.
You must mirror both the platform image and Operator images that you want to update to in your mirror registry before you can use TALM to update your disconnected clusters. Complete the following steps to mirror the images:
For platform updates, you must perform the following steps:
Mirror the desired OKD image repository. Ensure that the desired platform image is mirrored by following the “Mirroring the OKD image repository” procedure linked in the Additional Resources. Save the contents of the
imageContentSources
section in theimageContentSources.yaml
file:Example output
imageContentSources:
- mirrors:
- mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4
source: quay.io/openshift-release-dev/ocp-release
- mirrors:
- mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
Save the image signature of the desired platform image that was mirrored. You must add the image signature to the
PolicyGenTemplate
CR for platform updates. To get the image signature, perform the following steps:Specify the desired OKD tag by running the following command:
$ OCP_RELEASE_NUMBER=<release_version>
Specify the architecture of the server by running the following command:
$ ARCHITECTURE=<server_architecture>
Get the release image digest from Quay by running the following command
$ DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')"
Set the digest algorithm by running the following command:
$ DIGEST_ALGO="${DIGEST%%:*}"
Set the digest signature by running the following command:
$ DIGEST_ENCODED="${DIGEST#*:}"
Get the image signature from the mirror.openshift.com website by running the following command:
$ SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo)
Save the image signature to the
checksum-<OCP_RELEASE_NUMBER>.yaml
file by running the following commands:$ cat >checksum-${OCP_RELEASE_NUMBER}.yaml <<EOF
${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64}
EOF
Prepare the update graph. You have two options to prepare the update graph:
Use the OpenShift Update Service.
For more information about how to set up the graph on the hub cluster, see Deploy the operator for OpenShift Update Service and Build the graph data init container.
Make a local copy of the upstream graph. Host the update graph on an
http
orhttps
server in the disconnected environment that has access to the spoke cluster. To download the update graph, use the following command:$ curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.10 -o ~/upgrade-graph_stable-4.10
For Operator updates, you must perform the following task:
- Mirror the Operator catalogs. Ensure that the desired operator images are mirrored by following the procedure in the “Mirroring Operator catalogs for use with disconnected clusters” section.
Additional resources
For more information about how to update ZTP, see Upgrading GitOps ZTP.
For more information about how to mirror an OKD image repository, see Mirroring the OKD image repository.
For more information about how to mirror Operator catalogs for disconnected clusters, see Mirroring Operator catalogs for use with disconnected clusters.
For more information about how to prepare the disconnected environment and mirroring the desired image repository, see Preparing the disconnected environment.
For more information about update channels and releases, see Understanding upgrade channels and releases.
Performing a platform update
You can perform a platform update with the TALM.
Prerequisites
Install the Topology Aware Lifecycle Manager (TALM).
Update ZTP to the latest version.
Provision one or more managed clusters with ZTP.
Mirror the desired image repository.
Log in as a user with
cluster-admin
privileges.Create RHACM policies in the hub cluster.
Procedure
Create a
PolicyGenTemplate
CR for the platform update:Save the following contents of the
PolicyGenTemplate
CR in thedu-upgrade.yaml
file.Example of
PolicyGenTemplate
for platform updateapiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "du-upgrade"
namespace: "ztp-group-du-sno"
spec:
bindingRules:
group-du-sno: ""
mcp: "master"
remediationAction: inform
sourceFiles:
- fileName: ImageSignature.yaml (1)
policyName: "platform-upgrade-prep"
binaryData:
${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} (2)
- fileName: DisconnectedICSP.yaml
policyName: "platform-upgrade-prep"
metadata:
name: disconnected-internal-icsp-for-ocp
spec:
repositoryDigestMirrors: (3)
- mirrors:
- quay-intern.example.com/ocp4/openshift-release-dev
source: quay.io/openshift-release-dev/ocp-release
- mirrors:
- quay-intern.example.com/ocp4/openshift-release-dev
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- fileName: ClusterVersion.yaml (4)
policyName: "platform-upgrade-prep"
metadata:
name: version
annotations:
ran.openshift.io/ztp-deploy-wave: "1"
spec:
channel: "stable-4.10"
upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10
- fileName: ClusterVersion.yaml (5)
policyName: "platform-upgrade"
metadata:
name: version
spec:
channel: "stable-4.10"
upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.10
desiredUpdate:
version: 4.10.4
status:
history:
- version: 4.10.4
state: "Completed"
1 The ConfigMap
CR contains the signature of the desired release image to update to.2 Shows the image signature of the desired OKD release. Get the signature from the checksum-${OCP_RELASE_NUMBER}.yaml
file you saved when following the procedures in the “Setting up the environment” section.3 Shows the mirror repository that contains the desired OKD image. Get the mirrors from the imageContentSources.yaml
file that you saved when following the procedures in the “Setting up the environment” section.4 Shows the ClusterVersion
CR to update upstream.5 Shows the ClusterVersion
CR to trigger the update. Thechannel
,upstream
, anddesiredVersion
fields are all required for image pre-caching.The
PolicyGenTemplate
CR generates two policies:The
du-upgrade-platform-upgrade-prep
policy does the preparation work for the platform update. It creates theConfigMap
CR for the desired release image signature, creates the image content source of the mirrored release image repository, and updates the cluster version with the desired update channel and the update graph reachable by the spoke cluster in the disconnected environment.The
du-upgrade-platform-upgrade
policy is used to perform platform upgrade.
Add the
du-upgrade.yaml
file contents to thekustomization.yaml
file located in the ZTP Git repository for thePolicyGenTemplate
CRs and push the changes to the Git repository.ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
Check the created policies by running the following command:
$ oc get policies -A | grep platform-upgrade
Apply the required update resources before starting the platform update with the TALM.
Save the content of the
platform-upgrade-prep
ClusterUpgradeGroup
CR with thedu-upgrade-platform-upgrade-prep
policy and the target spoke clusters to thecgu-platform-upgrade-prep.yml
file, as shown in the following example:apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-platform-upgrade-prep
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade-prep
clusters:
- spoke1
remediationStrategy:
maxConcurrency: 1
enable: true
Apply the policy to the hub cluster by running the following command:
$ oc apply -f cgu-platform-upgrade-prep.yml
Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
Create the
ClusterGroupUpdate
CR for the platform update with thespec.enable
field set tofalse
.Save the content of the platform update
ClusterGroupUpdate
CR with thedu-upgrade-platform-upgrade
policy and the target clusters to thecgu-platform-upgrade.yml
file, as shown in the following example:apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-platform-upgrade
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade
preCaching: false
clusters:
- spoke1
remediationStrategy:
maxConcurrency: 1
enable: false
Apply the
ClusterGroupUpdate
CR to the hub cluster by running the following command:$ oc apply -f cgu-platform-upgrade.yml
Optional: Pre-cache the images for the platform update.
Enable pre-caching in the
ClusterGroupUpdate
CR by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \
--patch '{"spec":{"preCaching": true}}' --type=merge
Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the hub cluster:
$ oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}'
Start the platform update:
Enable the
cgu-platform-upgrade
policy and disable pre-caching by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \
--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
Additional resources
- For more information about mirroring the images in a disconnected environment, Preparing the disconnected environment
Performing an Operator update
You can perform an Operator update with the TALM.
Prerequisites
Install the Topology Aware Lifecycle Manager (TALM).
Update ZTP to the latest version.
Provision one or more managed clusters with ZTP.
Mirror the desired index image, bundle images, and all Operator images referenced in the bundle images.
Log in as a user with
cluster-admin
privileges.Create RHACM policies in the hub cluster.
Procedure
Update the
PolicyGenTemplate
CR for the Operator update.Update the
du-upgrade
PolicyGenTemplate
CR with the following additional contents in thedu-upgrade.yaml
file:apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "du-upgrade"
namespace: "ztp-group-du-sno"
spec:
bindingRules:
group-du-sno: ""
mcp: "master"
remediationAction: inform
sourceFiles:
- fileName: DefaultCatsrc.yaml
remediationAction: inform
policyName: "operator-catsrc-policy"
metadata:
name: redhat-operators
spec:
displayName: Red Hat Operators Catalog
image: registry.example.com:5000/olm/redhat-operators:v4.10 (1)
updateStrategy: (2)
registryPoll:
interval: 1h
1 The index image URL contains the desired Operator images. If the index images are always pushed to the same image name and tag, this change is not needed. 2 Set how frequently the Operator Lifecycle Manager (OLM) polls the index image for new Operator versions with the registryPoll.interval
field. This change is not needed if a new index image tag is always pushed for y-stream and z-stream Operator updates. TheregistryPoll.interval
field can be set to a shorter interval to expedite the update, however shorter intervals increase computational load. To counteract this, you can restoreregistryPoll.interval
to the default value once the update is complete.This update generates one policy,
du-upgrade-operator-catsrc-policy
, to update theredhat-operators
catalog source with the new index images that contain the desired Operators images.If you want to use the image pre-caching for Operators and there are Operators from a different catalog source other than
redhat-operators
, you must perform the following tasks:Prepare a separate catalog source policy with the new index image or registry poll interval update for the different catalog source.
Prepare a separate subscription policy for the desired Operators that are from the different catalog source.
For example, the desired SRIOV-FEC Operator is available in the
certified-operators
catalog source. To update the catalog source and the Operator subscription, add the following contents to generate two policies,du-upgrade-fec-catsrc-policy
anddu-upgrade-subscriptions-fec-policy
:apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: "du-upgrade"
namespace: "ztp-group-du-sno"
spec:
bindingRules:
group-du-sno: ""
mcp: "master"
remediationAction: inform
sourceFiles:
…
- fileName: DefaultCatsrc.yaml
remediationAction: inform
policyName: "fec-catsrc-policy"
metadata:
name: certified-operators
spec:
displayName: Intel SRIOV-FEC Operator
image: registry.example.com:5000/olm/far-edge-sriov-fec:v4.10
updateStrategy:
registryPoll:
interval: 10m
- fileName: AcceleratorsSubscription.yaml
policyName: "subscriptions-fec-policy"
spec:
channel: "stable"
source: certified-operators
Remove the specified subscriptions channels in the common
PolicyGenTemplate
CR, if they exist. The default subscriptions channels from the ZTP image are used for the update.Push the
PolicyGenTemplate
CRs updates to the ZTP Git repository.ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
Check the created policies by running the following command:
$ oc get policies -A | grep -E "catsrc-policy|subscription"
Apply the required catalog source updates before starting the Operator update.
Save the content of the
ClusterGroupUpgrade
CR namedoperator-upgrade-prep
with the catalog source policies and the target spoke clusters to thecgu-operator-upgrade-prep.yml
file:apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-operator-upgrade-prep
namespace: default
spec:
clusters:
- spoke1
enable: true
managedPolicies:
- du-upgrade-operator-catsrc-policy
remediationStrategy:
maxConcurrency: 1
Apply the policy to the hub cluster by running the following command:
$ oc apply -f cgu-operator-upgrade-prep.yml
Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies -A | grep -E "catsrc-policy"
Create the
ClusterGroupUpgrade
CR for the Operator update with thespec.enable
field set tofalse
.Save the content of the Operator update
ClusterGroupUpgrade
CR with thedu-upgrade-operator-catsrc-policy
policy and the subscription policies created from the commonPolicyGenTemplate
and the target clusters to thecgu-operator-upgrade.yml
file, as shown in the following example:apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-operator-upgrade
namespace: default
spec:
managedPolicies:
- du-upgrade-operator-catsrc-policy (1)
- common-subscriptions-policy (2)
preCaching: false
clusters:
- spoke1
remediationStrategy:
maxConcurrency: 1
enable: false
1 The policy is needed by the image pre-caching feature to retrieve the operator images from the catalog source. 2 The policy contains Operator subscriptions. If you have upgraded ZTP from 4.9 to 4.10 by following “Upgrade ZTP from 4.9 to 4.10”, all Operator subscriptions are grouped into the common-subscriptions-policy
policy.One
ClusterGroupUpgrade
CR can only pre-cache the images of the desired Operators defined in the subscription policy from one catalog source included in theClusterGroupUpgrade
CR. If the desired Operators are from different catalog sources, such as in the example of the SRIOV-FEC Operator, anotherClusterGroupUpgrade
CR must be created withdu-upgrade-fec-catsrc-policy
anddu-upgrade-subscriptions-fec-policy
policies for the SRIOV-FEC Operator images pre-caching and update.Apply the
ClusterGroupUpgrade
CR to the hub cluster by running the following command:$ oc apply -f cgu-operator-upgrade.yml
Optional: Pre-cache the images for the Operator update.
Before starting image pre-caching, verify the subscription policy is
NonCompliant
at this point by running the following command:$ oc get policy common-subscriptions-policy -n <policy_namespace>
Example output
NAME REMEDIATION ACTION COMPLIANCE STATE AGE
common-subscriptions-policy inform NonCompliant 27d
Enable pre-caching in the
ClusterGroupUpgrade
CR by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \
--patch '{"spec":{"preCaching": true}}' --type=merge
Monitor the process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster:
$ oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}'
Check if the pre-caching is completed before starting the update by running the following command:
$ oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq
Example output
[
{
"lastTransitionTime": "2022-03-08T20:49:08.000Z",
"message": "The ClusterGroupUpgrade CR is not enabled",
"reason": "UpgradeNotStarted",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2022-03-08T20:55:30.000Z",
"message": "Precaching is completed",
"reason": "PrecachingCompleted",
"status": "True",
"type": "PrecachingDone"
}
]
Start the Operator update.
Enable the
cgu-operator-upgrade
ClusterGroupUpgrade
CR and disable pre-caching to start the Operator update by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \
--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
Additional resources
- For more information about updating GitOps ZTP, see Upgrading GitOps ZTP.
Performing a platform and an Operator update together
You can perform a platform and an Operator update at the same time.
Prerequisites
Install the Topology Aware Lifecycle Manager (TALM).
Update ZTP to the latest version.
Provision one or more managed clusters with ZTP.
Log in as a user with
cluster-admin
privileges.Create RHACM policies in the hub cluster.
Procedure
Create the
PolicyGenTemplate
CR for the updates by following the steps described in the “Performing a platform update” and “Performing an Operator update” sections.Apply the prep work for the platform and the Operator update.
Save the content of the
ClusterGroupUpgrade
CR with the policies for platform update preparation work, catalog source updates, and target clusters to thecgu-platform-operator-upgrade-prep.yml
file, for example:apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-platform-operator-upgrade-prep
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade-prep
- du-upgrade-operator-catsrc-policy
clusterSelector:
- group-du-sno
remediationStrategy:
maxConcurrency: 10
enable: true
Apply the
cgu-platform-operator-upgrade-prep.yml
file to the hub cluster by running the following command:$ oc apply -f cgu-platform-operator-upgrade-prep.yml
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
Create the
ClusterGroupUpdate
CR for the platform and the Operator update with thespec.enable
field set tofalse
.Save the contents of the platform and Operator update
ClusterGroupUpdate
CR with the policies and the target clusters to thecgu-platform-operator-upgrade.yml
file, as shown in the following example:apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: cgu-du-upgrade
namespace: default
spec:
managedPolicies:
- du-upgrade-platform-upgrade (1)
- du-upgrade-operator-catsrc-policy (2)
- common-subscriptions-policy (3)
preCaching: true
clusterSelector:
- group-du-sno
remediationStrategy:
maxConcurrency: 1
enable: false
1 This is the platform update policy. 2 This is the policy containing the catalog source information for the Operators to be updated. It is needed for the pre-caching feature to determine which Operator images to download to the spoke cluster. 3 This is the policy to update the Operators. Apply the
cgu-platform-operator-upgrade.yml
file to the hub cluster by running the following command:$ oc apply -f cgu-platform-operator-upgrade.yml
Optional: Pre-cache the images for the platform and the Operator update.
Enable pre-caching in the
ClusterGroupUpgrade
CR by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \
--patch '{"spec":{"preCaching": true}}' --type=merge
Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the spoke cluster:
$ oc get jobs,pods -n openshift-talm-pre-cache
Check if the pre-caching is completed before starting the update by running the following command:
$ oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}'
Start the platform and Operator update.
Enable the
cgu-du-upgrade
ClusterGroupUpgrade
CR to start the platform and the Operator update by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \
--patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
The CRs for the platform and Operator updates can be created from the beginning by configuring the setting to
spec.enable: true
. In this case, the update starts immediately after pre-caching completes and there is no need to manually enable the CR.Both pre-caching and the update create extra resources, such as policies, placement bindings, placement rules, managed cluster actions, and managed cluster view, to help complete the procedures. Setting the
afterCompletion.deleteObjects
field totrue
deletes all these resources after the updates complete.