Upgrade Guide

This guide describes how to upgrade the Open Service Mesh (OSM) control plane.

How upgrades work

OSM’s control plane lifecycle is managed by Helm and can be upgraded with Helm’s upgrade functionality, which will patch or replace control plane components as needed based on changed values and resource templates.

Resource availability during upgrade

Since upgrades may include redeploying the osm-controller with the new version, there may be some downtime of the controller. While the osm-controller is unavailable, there will be a delay in processing new SMI resources, creating new pods to be injected with a proxy sidecar container will fail, and mTLS certificates will not be rotated.

Already existing SMI resources will be unaffected, this means that the data plane (which includes the Envoy sidecar configs) will also be unaffected by upgrading.

Data plane interruptions are expected if the upgrade includes CRD changes. Streamlining data plane upgrades is being tracked in issue #512.

Policy

Only certain upgrade paths are tested and supported.

Note: These plans are tentative and subject to change.

Breaking changes in this section refer to incompatible changes to the following user-facing components:

  • osm CLI commands, flags, and behavior
  • SMI CRDs and controllers

This implies the following are NOT user-facing and incompatible changes are NOT considered “breaking” as long as the incompatibility is handled by user-facing components:

  • Chart values.yaml
  • osm-mesh-config MeshConfig
  • Internally-used labels and annotations (monitored-by, injection, metrics, etc.)

Upgrades are only supported between versions that do not include breaking changes, as described below.

For OSM versions 0.y.z:

  • Breaking changes will not be introduced between 0.y.z and 0.y.z+1
  • Breaking changes may be introduced between 0.y.z and 0.y+1.0

For OSM versions x.y.z where x >= 1:

  • Breaking changes will not be introduced between x.y.z and x.y+1.0 or between x.y.z and x.y.z+1
  • Breaking changes may be introduced between x.y.z and x+1.0.0

How to upgrade OSM

The recommended way to upgrade a mesh is with the osm CLI. For advanced use cases, helm may be used.

CRD Upgrades

Because Helm does not manage CRDs beyond the initial installation, OSM leverages an init-container on the osm-bootstrap pod to to update existing and add new CRDs during an upgrade. If the new release contains updates to existing CRDs or adds new CRDs, the init-osm-bootstrap on the osm-bootstrap pod will update the CRDs. The associated Custom Resources will remain as is, requiring no additional action prior to or immediately after the upgrade.

Please check the CRD Updates section of the release notes to see if any updates have been made to the CRDs used by OSM. If the version of the Custom Resources are within the versions the updated CRD supports, no immediate action is required. OSM implements a conversion webhook for all of its CRDs, ensuring support for older versions and providing the flexibilty to update Custom Resources at a later point in time.

Upgrading with the OSM CLI

Pre-requisites

  • Kubernetes cluster with the OSM control plane installed
    • Ensure that the Kubernetes cluster has the minimum Kubernetes version required by the new OSM chart. This can be found in the Installation Pre-requisites
  • osm CLI installed
    • By default, the osm CLI will upgrade to the same chart version that it installs. e.g. v1.0.0 of the osm CLI will upgrade to v1.0.0 of the OSM Helm chart. Upgrading to any other version of the Helm chart than the version matching the CLI may work, but those scenarios are not tested and issues that arise may not get fixed even if reported.

The osm mesh upgrade command performs a helm upgrade of the existing Helm release for a mesh.

Basic usage requires no additional arguments or flags:

  1. $ osm mesh upgrade
  2. OSM successfully upgraded mesh osm

This command will upgrade the mesh with the default mesh name in the default OSM namespace. Values from the previous release will NOT carry over to the new release by default, but may be passed individually with the --set flag on osm mesh upgrade.

See osm mesh upgrade --help for more details

Upgrading with Helm

Pre-requisites

  • Kubernetes cluster with the OSM control plane installed
  • The helm 3 CLI

OSM Configuration

When upgrading, any custom settings used to install or run OSM may be reverted to the default, this only includes any metrics deployments. Please ensure that you carefully follow the guide to prevent these values from being overwritten.

To preserve any changes you’ve made to the OSM configuration, use the helm --values flag. Create a copy of the values file (make sure to use the version for the upgraded chart) and change any values you wish to customize. You can omit all other values.

**Note: Any configuration changes that go into the MeshConfig will not be applied during upgrade and the values will remain as is prior to the upgrade. If you wish to update any value in the MeshConfig you can do so by patching the resource after an upgrade.

For example, if the logLevel field in the MeshConfig was set to info prior to upgrade, updating this in override.yaml will during an upgrade will not cause any change.

Warning: Do NOT change osm.meshName or osm.osmNamespace

Helm Upgrade

Then run the following helm upgrade command.

  1. $ helm upgrade <mesh name> osm --repo https://openservicemesh.github.io/osm --version <chart version> --namespace <osm namespace> --values override.yaml

Omit the --values flag if you prefer to use the default settings.

Run helm upgrade --help for more options.

Upgrading Third Party Dependencies

Envoy

The envoy version can be updated by changing the value of the envoyImage variable in the osm-mesh-config. When doing so, it is recommended to specify the image digest associated with that envoy version to avoid being vulnerable to supply chain attacks. For instance, to update the envoy-alpine image to v1.19.1, the following command should be run:

  1. export osm_namespace=osm-system # Replace osm-system with the namespace where OSM is installed
  2. kubectl patch meshconfig osm-mesh-config -n $osm_namespace -p '{"spec":{"sidecar":{"envoyImage":"envoyproxy/envoy-alpine@sha256:6502a637c6c5fba4d03d0672d878d12da4bcc7a0d0fb3f1d506982dde0039abd"}}}' --type=merge

After the MeshConfig resource has been updated, all the pods and deployments that are part of the mesh must be restarted so that the newer version of Envoy sidecar can be injected onto the pods as a part of the automatic sidecar injection that OSM performs. This can be done with the kubectl rollout restart deploy command.

Prometheus, Grafana, and Jaeger

If enabled, OSM’s Prometheus, Grafana, and Jaeger services are deployed alongside other OSM control plane components. Though these third party dependencies cannot be updated through the meshconfig like Envoy, the versions can still be updated in the deployment directly. For instance, to update prometheus to v2.19.1, the user can run:

  1. export osm_namespace=osm-system # Replace osm-system with the namespace where OSM is installed
  2. kubectl set image deployment/osm-prometheus -n $osm_namespace prometheus="prom/prometheus:v2.19.1"

To update to Grafana 8.1.0, the command would look like:

  1. kubectl set image deployment/osm-grafana -n $osm_namespace grafana="grafana/grafana:8.1.0"

And for Jaeger, the user would run the following to update to 1.26.0:

  1. kubectl set image deployment/jaeger -n $osm_namespace jaeger="jaegertracing/all-in-one:1.26.0"

OSM Upgrade Troubleshooting Guide

OSM Mesh Upgrade Timing Out

Insufficient CPU

If the osm mesh upgrade command is timing out, it could be due to insufficient CPU.

  1. Check the pods to see if any of them aren’t fully up and running
  1. # Replace osm-system with osm-controller's namespace if using a non-default namespace
  2. kubectl get pods -n osm-system
  1. If there are any pods that are in Pending state, use kubectl describe to check the Events section
  1. # Replace osm-system with osm-controller's namespace if using a non-default namespace
  2. kubectl describe pod <pod-name> -n osm-system

If you see the following error, then please increase the number of CPUs Docker can use.

  1. `Warning FailedScheduling 4s (x15 over 19m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.`

Error Validating CLI Parameters

If the osm mesh upgrade command is still timing out, it could be due to a CLI/Image Version mismatch.

  1. Check the pods to see if any of them aren’t fully up and running
  1. # Replace osm-system with osm-controller's namespace if using a non-default namespace
  2. kubectl get pods -n osm-system
  1. If there are any pods that are in Pending state, use kubectl describe to check the Events section for Error Validating CLI parameters
  1. # Replace osm-system with osm-controller's namespace if using a non-default namespace
  2. kubectl describe pod <pod-name> -n osm-system
  1. If you find the error, please check the pod’s logs for any errors
  1. kubectl logs -n osm-system <pod-name> | grep -i error

If you see the following error, then it’s due to a CLI/Image Version mismatch.

  1. `"error":"Please specify the init container image using --init-container-image","reason":"FatalInvalidCLIParameters"`

Workaround is to set the container-registry and osm-image-tag flag when running osm mesh upgrade.

  1. osm mesh upgrade --container-registry $CTR_REGISTRY --osm-image-tag $CTR_TAG --enable-egress=true

Other Issues

If you’re running into issues that are not resolved with the steps above, please open a GitHub issue.