Updating OKD Virtualization

Updating OKD Virtualization

Learn how Operator Lifecycle Manager (OLM) delivers z-stream and minor version updates for OKD Virtualization.

The Node Maintenance Operator (NMO) is no longer shipped with OKD Virtualization. You can install the NMO from the OperatorHub in the OKD web console, or by using the OpenShift CLI (oc).
You must perform one of the following tasks before updating to OKD Virtualization 4.11 from OKD Virtualization 4.10.2 and later releases:
- Move all nodes out of maintenance mode.
- Install the standalone NMO and replace the nodemaintenances.nodemaintenance.kubevirt.io custom resource (CR) with a nodemaintenances.nodemaintenance.medik8s.io CR.

About updating OKD Virtualization

Operator Lifecycle Manager (OLM) manages the lifecycle of the OKD Virtualization Operator. The Marketplace Operator, which is deployed during OKD installation, makes external Operators available to your cluster.
OLM provides z-stream and minor version updates for OKD Virtualization. Minor version updates become available when you update OKD to the next minor version. You cannot update OKD Virtualization to the next minor version without first updating OKD.
OKD Virtualization subscriptions use a single update channel that is named stable. The stable channel ensures that your OKD Virtualization and OKD versions are compatible.
If your subscription’s approval strategy is set to Automatic, the update process starts as soon as a new version of the Operator is available in the stable channel. It is highly recommended to use the Automatic approval strategy to maintain a supportable environment. Each minor version of OKD Virtualization is only supported if you run the corresponding OKD version. For example, you must run OKD Virtualization 4.12 on OKD 4.12.
- Though it is possible to select the Manual approval strategy, this is not recommended because it risks the supportability and functionality of your cluster. With the Manual approval strategy, you must manually approve every pending update. If OKD and OKD Virtualization updates are out of sync, your cluster becomes unsupported.
The amount of time an update takes to complete depends on your network connection. Most automatic updates complete within fifteen minutes.
Updating OKD Virtualization does not interrupt network connections.
Data volumes and their associated persistent volume claims are preserved during update.

If you have virtual machines running that use hostpath provisioner storage, they cannot be live migrated and might block an OKD cluster update.

As a workaround, you can reconfigure the virtual machines so that they can be powered off automatically during a cluster update. Remove the evictionStrategy: LiveMigrate field and set the runStrategy field to Always.

About workload updates

When you update OKD Virtualization, virtual machine workloads, including libvirt, virt-launcher, and qemu, update automatically if they support live migration.

Each virtual machine has a virt-launcher pod that runs the virtual machine instance (VMI). The virt-launcher pod runs an instance of libvirt, which is used to manage the virtual machine (VM) process.

You can configure how workloads are updated by editing the spec.workloadUpdateStrategy stanza of the HyperConverged custom resource (CR). There are two available workload update methods: LiveMigrate and Evict.

Because the Evict method shuts down VMI pods, only the LiveMigrate update strategy is enabled by default.

When LiveMigrate is the only update strategy enabled:

VMIs that support live migration are migrated during the update process. The VM guest moves into a new pod with the updated components enabled.
VMIs that do not support live migration are not disrupted or updated.
- If a VMI has the LiveMigrate eviction strategy but does not support live migration, it is not updated.

If you enable both LiveMigrate and Evict:

VMIs that support live migration use the LiveMigrate update strategy.
VMIs that do not support live migration use the Evict update strategy. If a VMI is controlled by a VirtualMachine object that has a runStrategy value of always, a new VMI is created in a new pod with updated components.

Migration attempts and timeouts

When updating workloads, live migration fails if a pod is in the Pending state for the following periods:

5 minutes

If the pod is pending because it is Unschedulable.

15 minutes

If the pod is stuck in the pending state for any reason.

When a VMI fails to migrate, the virt-controller tries to migrate it again. It repeats this process until all migratable VMIs are running on new virt-launcher pods. If a VMI is improperly configured, however, these attempts can repeat indefinitely.

Each attempt corresponds to a migration object. Only the five most recent attempts are held in a buffer. This prevents migration objects from accumulating on the system while retaining information for debugging.

About EUS-to-EUS updates

Every even-numbered minor version of OKD, including 4.10 and 4.12, is an Extended Update Support (EUS) version. However, because Kubernetes design mandates serial minor version updates, you cannot directly update from one EUS version to the next.

After you update from the source EUS version to the next odd-numbered minor version, you must sequentially update OKD Virtualization to all z-stream releases of that minor version that are on your update path. When you have upgraded to the latest applicable z-stream version, you can then update OKD to the target EUS minor version.

When the OKD update succeeds, the corresponding update for OKD Virtualization becomes available. You can now update OKD Virtualization to the target EUS version.

Preparing to update

Before beginning an EUS-to-EUS update, you must:

Pause worker nodes’ machine config pools before you start an EUS-to-EUS update so that the workers are not rebooted twice.
Disable automatic workload updates before you begin the update process. This is to prevent OKD Virtualization from migrating or evicting your virtual machines (VMs) until you update to your target EUS version.

By default, OKD Virtualization automatically updates workloads, such as the virt-launcher pod, when you update the OKD Virtualization Operator. You can configure this behavior in the spec.workloadUpdateStrategy stanza of the HyperConverged custom resource.

Learn more about preparing to perform an EUS-to-EUS update.

Preventing workload updates during an EUS-to-EUS update

When you update from one Extended Update Support (EUS) version to the next, you must manually disable automatic workload updates to prevent OKD Virtualization from migrating or evicting workloads during the update process.

Prerequisites

You are running an EUS version of OKD and want to update to the next EUS version. You have not yet updated to the odd-numbered version in between.
You read “Preparing to perform an EUS-to-EUS update” and learned the caveats and requirements that pertain to your OKD cluster.
You paused the worker nodes’ machine config pools as directed by the OKD documentation.
It is recommended that you use the default Automatic approval strategy. If you use the Manual approval strategy, you must approve all pending updates in the web console. For more details, refer to the “Manually approving a pending Operator update” section.

Procedure

Back up the current workloadUpdateMethods configuration by running the following command:

$ WORKLOAD_UPDATE_METHODS=$(oc get kv kubevirt-kubevirt-hyperconverged -n openshift-cnv -o jsonpath='{.spec.workloadUpdateStrategy.workloadUpdateMethods}')

Turn off all workload update methods by running the following command:

$ oc patch hco kubevirt-hyperconverged -n openshift-cnv --type json -p '[{"op":"replace","path":"/spec/workloadUpdateStrategy/workloadUpdateMethods", "value":[]}]'

Example output

hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched

Ensure that the HyperConverged Operator is Upgradeable before you continue. Enter the following command and monitor the output:

$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o json | jq ".status.conditions"

Example output

[
  {
    "lastTransitionTime": "2022-12-09T16:29:11Z",
    "message": "Reconcile completed successfully",
    "observedGeneration": 3,
    "reason": "ReconcileCompleted",
    "status": "True",
    "type": "ReconcileComplete"
  },
  {
    "lastTransitionTime": "2022-12-09T20:30:10Z",
    "message": "Reconcile completed successfully",
    "observedGeneration": 3,
    "reason": "ReconcileCompleted",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-12-09T20:30:10Z",
    "message": "Reconcile completed successfully",
    "observedGeneration": 3,
    "reason": "ReconcileCompleted",
    "status": "False",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2022-12-09T16:39:11Z",
    "message": "Reconcile completed successfully",
    "observedGeneration": 3,
    "reason": "ReconcileCompleted",
    "status": "False",
    "type": "Degraded"
  },
  {
    "lastTransitionTime": "2022-12-09T20:30:10Z",
    "message": "Reconcile completed successfully",
    "observedGeneration": 3,
    "reason": "ReconcileCompleted",
    "status": "True",
    "type": "Upgradeable" (1)
  }
]

1	The OKD Virtualization Operator has the `Upgradeable` status.

Manually update your cluster from the source EUS version to the next minor version of OKD:
```
$ oc adm upgrade
```
Verification
- Check the current version by running the following command:
```
$ oc get clusterversion
```
  Updating OKD to the next version is a prerequisite for updating OKD Virtualization. For more details, refer to the “Updating clusters” section of the OKD documentation.
Update OKD Virtualization.
- With the default Automatic approval strategy, OKD Virtualization automatically updates to the corresponding version after you update OKD.
- If you use the Manual approval strategy, approve the pending updates by using the web console.
Monitor the OKD Virtualization update by running the following command:
```
$ oc get csv -n openshift-cnv
```
Update OKD Virtualization to every z-stream version that is available for the non-EUS minor version, monitoring each update by running the command shown in the previous step.

Confirm that OKD Virtualization successfully updated to the latest z-stream release of the non-EUS version by running the following command:

$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o json | jq ".status.versions"

Example output

[
  {
    "name": "operator",
    "version": "4.12.0"
  }
]

Wait until the HyperConverged Operator has the Upgradeable status before you perform the next update. Enter the following command and monitor the output:
```
$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o json | jq ".status.conditions"
```
Update OKD to the target EUS version.
Confirm that the update succeeded by checking the cluster version:
```
$ oc get clusterversion
```
Update OKD Virtualization to the target EUS version.
- With the default Automatic approval strategy, OKD Virtualization automatically updates to the corresponding version after you update OKD.
- If you use the Manual approval strategy, approve the pending updates by using the web console.
Monitor the OKD Virtualization update by running the following command:
```
$ oc get csv -n openshift-cnv
```
The update completes when the VERSION field matches the target EUS version and the PHASE field reads Succeeded.

Restore the workload update methods configuration that you backed up:

$ oc patch hco kubevirt-hyperconverged -n openshift-cnv --type json -p "[{\"op\":\"add\",\"path\":\"/spec/workloadUpdateStrategy/workloadUpdateMethods\", \"value\":$WORKLOAD_UPDATE_METHODS}]"

Example output

hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched

Verification

Check the status of VM migration by running the following command:
```
$ oc get vmim -A
```

Next steps

You can now unpause the worker nodes’ machine config pools.

Configuring workload update methods

You can configure workload update methods by editing the HyperConverged custom resource (CR).

Prerequisites

To use live migration as an update method, you must first enable live migration in the cluster.

If a VirtualMachineInstance CR contains evictionStrategy: LiveMigrate and the virtual machine instance (VMI) does not support live migration, the VMI will not update.

Procedure

To open the HyperConverged CR in your default editor, run the following command:
```
$ oc edit hco -n openshift-cnv kubevirt-hyperconverged
```

Edit the workloadUpdateStrategy stanza of the HyperConverged CR. For example:

apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
spec:
  workloadUpdateStrategy:
    workloadUpdateMethods: (1)
    - LiveMigrate (2)
    - Evict (3)
    batchEvictionSize: 10 (4)
    batchEvictionInterval: "1m0s" (5)
...

1	The methods that can be used to perform automated workload updates. The available values are `LiveMigrate` and `Evict`. If you enable both options as shown in this example, updates use `LiveMigrate` for VMIs that support live migration and `Evict` for any VMIs that do not support live migration. To disable automatic workload updates, you can either remove the `workloadUpdateStrategy` stanza or set `workloadUpdateMethods: []` to leave the array empty.
2	The least disruptive update method. VMIs that support live migration are updated by migrating the virtual machine (VM) guest into a new pod with the updated components enabled. If `LiveMigrate` is the only workload update method listed, VMIs that do not support live migration are not disrupted or updated.
3	A disruptive method that shuts down VMI pods during upgrade. `Evict` is the only update method available if live migration is not enabled in the cluster. If a VMI is controlled by a `VirtualMachine` object that has `runStrategy: always` configured, a new VMI is created in a new pod with updated components.
4	The number of VMIs that can be forced to be updated at a time by using the `Evict` method. This does not apply to the `LiveMigrate` method.
5	The interval to wait before evicting the next batch of workloads. This does not apply to the `LiveMigrate` method.

You can configure live migration limits and timeouts by editing the spec.liveMigrationConfig stanza of the HyperConverged CR.

To apply your changes, save and exit the editor.

Approving pending Operator updates

Manually approving a pending Operator update

If an installed Operator has the approval strategy in its subscription set to Manual, when new updates are released in its current update channel, the update must be manually approved before installation can begin.

Prerequisites

An Operator previously installed using Operator Lifecycle Manager (OLM).

Procedure

In the Administrator perspective of the OKD web console, navigate to Operators → Installed Operators.
Operators that have a pending update display a status with Upgrade available. Click the name of the Operator you want to update.
Click the Subscription tab. Any update requiring approval are displayed next to Upgrade Status. For example, it might display 1 requires approval.
Click 1 requires approval, then click Preview Install Plan.
Review the resources that are listed as available for update. When satisfied, click Approve.
Navigate back to the Operators → Installed Operators page to monitor the progress of the update. When complete, the status changes to Succeeded and Up to date.

Monitoring update status

Monitoring OKD Virtualization upgrade status

To monitor the status of a OKD Virtualization Operator upgrade, watch the cluster service version (CSV) PHASE. You can also monitor the CSV conditions in the web console or by running the command provided here.

The PHASE and conditions values are approximations that are based on available information.

Prerequisites

Log in to the cluster as a user with the cluster-admin role.
Install the OpenShift CLI (oc).

Procedure

Run the following command:
```
$ oc get csv -n openshift-cnv
```

Review the output, checking the PHASE field. For example:

Example output

VERSION  REPLACES                                        PHASE
4.9.0    kubevirt-hyperconverged-operator.v4.8.2         Installing
4.9.0    kubevirt-hyperconverged-operator.v4.9.0         Replacing

Optional: Monitor the aggregated status of all OKD Virtualization component conditions by running the following command:

$ oc get hco -n openshift-cnv kubevirt-hyperconverged \
-o=jsonpath='{range .status.conditions[*]}{.type}{"\t"}{.status}{"\t"}{.message}{"\n"}{end}'

A successful upgrade results in the following output:

Example output

ReconcileComplete  True  Reconcile completed successfully
Available          True  Reconcile completed successfully
Progressing        False Reconcile completed successfully
Degraded           False Reconcile completed successfully
Upgradeable        True  Reconcile completed successfully

Viewing outdated OKD Virtualization workloads

You can view a list of outdated workloads by using the CLI.

If there are outdated virtualization pods in your cluster, the OutdatedVirtualMachineInstanceWorkloads alert fires.

Procedure

To view a list of outdated virtual machine instances (VMIs), run the following command:
```
$ oc get vmi -l kubevirt.io/outdatedLauncherImage --all-namespaces
```