Preparing to update to OKD 4.15
Learn more about administrative tasks that cluster admins must perform to successfully initialize an update, as well as optional guidelines for ensuring a successful update.
Kubernetes API removals
There are no Kubernetes API removals in OKD 4.15.
Assessing the risk of conditional updates
A conditional update is an update target that is available but not recommended due to a known risk that applies to your cluster. The Cluster Version Operator (CVO) periodically queries the OpenShift Update Service (OSUS) for the most recent data about update recommendations, and some potential update targets might have risks associated with them.
The CVO evaluates the conditional risks, and if the risks are not applicable to the cluster, then the target version is available as a recommended update path for the cluster. If the risk is determined to be applicable, or if for some reason CVO cannot evaluate the risk, then the update target is available to the cluster as a conditional update.
When you encounter a conditional update while you are trying to update to a target version, you must assess the risk of updating your cluster to that version. Generally, if you do not have a specific need to update to that target version, it is best to wait for a recommended update path from Red Hat.
However, if you have a strong reason to update to that version, for example, if you need to fix an important CVE, then the benefit of fixing the CVE might outweigh the risk of the update being problematic for your cluster. You can complete the following tasks to determine whether you agree with the Red Hat assessment of the update risk:
Complete extensive testing in a non-production environment to the extent that you are comfortable completing the update in your production environment.
Follow the links provided in the conditional update description, investigate the bug, and determine if it is likely to cause issues for your cluster. If you need help understanding the risk, contact Red Hat Support.
Additional resources
Best practices for cluster updates
OKD provides a robust update experience that minimizes workload disruptions during an update. Updates will not begin unless the cluster is in an upgradeable state at the time of the update request.
This design enforces some key conditions before initiating an update, but there are a number of actions you can take to increase your chances of a successful cluster update.
Choose versions recommended by the OpenShift Update Service
The OpenShift Update Service (OSUS) provides update recommendations based on cluster characteristics such as the cluster’s subscribed channel. The Cluster Version Operator saves these recommendations as either recommended or conditional updates. While it is possible to attempt an update to a version that is not recommended by OSUS, following a recommended update path protects users from encountering known issues or unintended consequences on the cluster.
Choose only update targets that are recommended by OSUS to ensure a successful update.
Address all critical alerts on the cluster
Critical alerts must always be addressed as soon as possible, but it is especially important to address these alerts and resolve any problems before initiating a cluster update. Failing to address critical alerts before beginning an update can cause problematic conditions for the cluster.
In the Administrator perspective of the web console, navigate to Observe → Alerting to find critical alerts.
Ensure that the cluster is in an Upgradable state
When one or more Operators have not reported their Upgradeable
condition as True
for more than an hour, the ClusterNotUpgradeable
warning alert is triggered in the cluster. In most cases this alert does not block patch updates, but you cannot perform a minor version update until you resolve this alert and all Operators report Upgradeable
as True
.
For more information about the Upgradeable
condition, see “Understanding cluster Operator condition types” in the additional resources section.
Ensure that enough spare nodes are available
A cluster should not be running with little to no spare node capacity, especially when initiating a cluster update. Nodes that are not running and available may limit a cluster’s ability to perform an update with minimal disruption to cluster workloads.
Depending on the configured value of the cluster’s maxUnavailable
spec, the cluster might not be able to apply machine configuration changes to nodes if there is an unavailable node. Additionally, if compute nodes do not have enough spare capacity, workloads might not be able to temporarily shift to another node while the first node is taken offline for an update.
Make sure that you have enough available nodes in each worker pool, as well as enough spare capacity on your compute nodes, to increase the chance of successful node updates.
Ensure that the cluster’s PodDisruptionBudget is properly configured
You can use the PodDisruptionBudget
object to define the minimum number or percentage of pod replicas that must be available at any given time. This configuration protects workloads from disruptions during maintenance tasks such as cluster updates.
However, it is possible to configure the PodDisruptionBudget
for a given topology in a way that prevents nodes from being drained and updated during a cluster update.
When planning a cluster update, check the configuration of the PodDisruptionBudget
object for the following factors:
For highly available workloads, make sure there are replicas that can be temporarily taken offline without being prohibited by the
PodDisruptionBudget
.For workloads that aren’t highly available, make sure they are either not protected by a
PodDisruptionBudget
or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination.
Additional resources