Upgrade Kubeflow

Upgrading your Kubeflow installation on Google Cloud

Before you start

To better understand upgrade process, you should read the following sections first:

This guide assumes the following settings:

General upgrade instructions

Both management cluster and Kubeflow cluster follow the same instance and upstream folder convention. To upgrade, you’ll typically need to update packages in upstream to the new version and repeat the make apply-<subcommand> commands in their respective deployment process.

However, specific upgrades might need manual actions below.

Upgrading management cluster

Upgrade management cluster from v1.1 to v1.2

Note, there is no change for management cluster from v1.2.0 to v1.2.1. So there is no need to upgrade if you are already on v1.2.0.

  1. The instructions below assume that your current working directory is

    1. cd "${MGMT_DIR}"
  2. Use your management cluster’s kubectl context:

    1. # Look at all your contexts
    2. kubectl config get-contexts
    3. # Select your management cluster's context
    4. kubectl config use-context "${MGMT_NAME}"
    5. # Verify the context connects to the cluster properly
    6. kubectl get namespace

    If you are using a different enviroment, you can always reconfigure the context by:

    1. make create-context
  3. Check your existing config connector version:

    1. # For Kubeflow v1.1, it should be 1.15.1
    2. $ kubectl get namespace cnrm-system -ojsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com\/version}'
    3. 1.15.1
  4. Uninstall the old config connector in the management cluster:

    1. kubectl delete sts,deploy,po,svc,roles,clusterroles,clusterrolebindings --all-namespaces -l cnrm.cloud.google.com/system=true --wait=true
    2. kubectl delete validatingwebhookconfiguration abandon-on-uninstall.cnrm.cloud.google.com --ignore-not-found --wait=true
    3. kubectl delete validatingwebhookconfiguration validating-webhook.cnrm.cloud.google.com --ignore-not-found --wait=true
    4. kubectl delete mutatingwebhookconfiguration mutating-webhook.cnrm.cloud.google.com --ignore-not-found --wait=true

    These commands uninstall the config connector without removing your resources.

  5. Replace your ./Makefile with the version in Kubeflow v1.2.0: https://github.com/kubeflow/gcp-blueprints/blob/v1.2.0/management/Makefile.

    If you made any customizations in ./Makefile, you should merge your changes with the upstream version. We’ve refactored the Makefile to move substantial commands into the upstream package, so hopefully future upgrades won’t require a manual merge of the Makefile.

  6. Update ./upstream/management package:

    1. make update
  7. Use kpt to set user values:

    1. kpt cfg set -R . name ${MGMT_NAME}
    2. kpt cfg set -R . gcloud.core.project ${MGMT_PROJECT}
    3. kpt cfg set -R . location ${LOCATION}

    Note, you can find out which setters exist in a package and what there current values are by:

    1. kpt cfg list-setters .
  8. Apply upgraded config connector:

    1. make apply-kcc

    Note, you can optionally also run make apply-cluster, but it should be the same as your existing management cluster.

  9. Check that your config connector upgrade is successful:

    1. # For Kubeflow v1.2, it should be 1.29.0
    2. $ kubectl get namespace cnrm-system -ojsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com\/version}'
    3. 1.29.0

Upgrading Kubeflow cluster

DISCLAIMERS:

  • The upgrade process depends on each Kubeflow application to handle the upgrade properly. There’s no guarantee on data completeness unless the application provides such a guarantee.
  • You are recommended to back up your data before an upgrade.
  • Upgrading Kubeflow cluster can be a disruptive process, please schedule some downtime and communicate with your users.

To upgrade from specific versions of Kubeflow, you may need to take certain manual actions — refer to specific sections in the guidelines below.

General instructions for upgrading Kubeflow:

  1. The instructions below assume that:

    • Your current working directory is:

      1. cd ${KF_DIR}
    • Your kubectl uses a context that connects to your Kubeflow cluster

      1. # List your existing contexts
      2. kubectl config get-contexts
      3. # Use the context that connects to your Kubeflow cluster
      4. kubectl config use-context ${KF_NAME}
  2. Edit the Makefile at ./Makefile and change MANIFESTS_URL to point at the version of Kubeflow manifests you want to use

    • Refer to the kpt docs for more info about supported dependencies
  3. Update the local copies:

    1. make update
  4. Redeploy:

    1. make apply

    To evaluate the changes before deploying them you can run make hydrate and then compare the contents of .build to what is currently deployed.

Upgrade Kubeflow cluster from v1.1 to v1.2

Note, v1.2.1 pins Kubernetes Cluster version to 1.16 to resolve #198. There is no need to upgrade from v1.2.0 to v1.2.1 if your Kubeflow cluster is working properly.

  1. The instructions below assume

    • Your current working directory is:

      1. cd ${KF_DIR}
    • Your kubectl uses a context that connects to your Kubeflow cluster:

      1. # List your existing contexts
      2. kubectl config get-contexts
      3. # Use the context that connects to your Kubeflow cluster
      4. kubectl config use-context ${KF_NAME}
  2. (Recommended) Replace your ./Makefile with the version in Kubeflow v1.2.0: https://github.com/kubeflow/gcp-blueprints/blob/v1.2.0/kubeflow/Makefile.

    If you made any customizations in ./Makefile, you should merge your changes with the upstream version.

    This step is recommended, because we introduced usability improvements and fixed compatibility for newer Kustomize versions (while still being compatible with Kustomize v3.2.1) to the Makefile. However, the deployment process is backward-compatible, so this is recommended, but not required.

  3. Update ./upstream/manifests package:

    1. make update
  4. Before applying new resources, you need to delete some immutable resources that were updated in this release:

    1. kubectl delete statefulset kfserving-controller-manager -n kubeflow --wait
    2. kubectl delete crds experiments.kubeflow.org suggestions.kubeflow.org trials.kubeflow.org

    WARNING: This step deletes all Katib running resources.

    Refer to a github comment in the v1.2 release issue for more details.

  5. Redeploy:

    1. make apply

    To evaluate the changes before deploying them you can:

    1. Run make hydrate.
    2. Compare the contents of .build with a historic version with tools like git diff.

Last modified 20.04.2021: Apply Docs Restructure to `v1.2-branch` = update `v1.2-branch` to current `master` v2 (#2612) (4e2602bd)