Upgrading and Reinstalling

How to upgrade or reinstall your Kubeflow Pipelines deployment

Starting from Kubeflow version 0.5, Kubeflow Pipelines persists thepipeline data in a permanent storage volume. Kubeflow Pipelines thereforesupports the following capabilities:

  • Reinstall: You can delete a cluster and create a new cluster, specifyingthe storage to retrieve the original data in the new cluster.

Note that upgrade isn’t currently supported, check this issuefor progress.

Context

Kubeflow Pipelines creates and manages the following data related to yourmachine learning pipeline:

  • Metadata: Experiments, jobs, runs, etc. Kubeflow Pipelinesstores the pipeline metadata in a MySQL database.
  • Artifacts: Pipeline packages, metrics, views, etc. Kubeflow Pipelinesstores the artifacts in a Minio server.

The MySQL database and the Minio server are both backed by the KubernetesPersistentVolume(PV) subsystem.

  • If you are deploying to Google Cloud Platform (GCP), Kubeflow Pipelinescreates a Compute EnginePersistent Disk (PD)and mounts it as a PV.
  • If you are not deploying to GCP, you can specify your own preferred PV.

Deploying Kubeflow

This section describes how to deploy Kubeflow in a way that ensures you can usethe Kubeflow Pipelines reinstallation capability.

Deploying Kubeflow on GCP

Follow the guide to deploying Kubeflow onGCP. You don’t need to do anything extra.

When the deployment has finished, you can see two entries in the GCPDeployment Manager, one for deploying the cluster and one fordeploying the storage:

Deployment Manager showing the storage deployment entry

The entry suffixed with -storage creates one PD for the metadata store and onefor the artifact store:

Deployment Manager showing details of the storage deployment entry

Deploying Kubeflow in other environments (non-GCP)

The steps below assume that you already have a Kubernetes cluster set up.

  • If you don’t need custom storage and are happy with the default PVs thatKubeflow provides, you can follow the Kubeflowquick startwithout doing anything extra. The deployment script uses the KubernetesdefaultStorageClassto provision the PVs for you.

  • If you want to specify a custom PV:

    • Create two PVs in your Kubernetes cluster with your preferred storage type.See theKubernetes guide to PVs.

    • Follow the Kubeflowquick start,but note the following change to the standard procedure:

Before running the apply command:

  1. kfctl apply all -V

You should first edit the following files to specify your PVs:

${KFAPP}/kustomize/minio/overlays/minioPd/params.env

  1. ...
  2. minioPd=[YOUR-PRE-CREATED-MINIO-PV-NAME]
  3. ...

${KFAPP}/kustomize/mysql/overlays/mysqlPd/params.env

  1. ...
  2. mysqlPd=[YOUR-PRE-CREATED-MYSQL-PV-NAME]
  3. ...
  • Then run the apply command as usual:
  1. kfctl apply k8s

Reinstalling Kubeflow Pipelines

You can delete a Kubeflow cluster and create a new one, specifyingyour existing storage to retrieve the original data in the new cluster.

Note: You must use command line deployment. You cannot reinstallKubeflow Pipelines using the web interface.

Reinstalling Kubeflow Pipelines on GCP

To reinstall Kubeflow Pipelines, follow the command line deploymentinstructions, but note the followingchange in the procedure:

  • Warning, when you do kfctl init ${KFAPP} —other-flags, you should use a different ${KFAPP} name from your existing ${KFAPP}. Otherwise, your data in existing PDs will be deleted during kfctl apply all -V.

  • Before running the following apply command:

  1. kfctl apply all -V

You should first:

  • Edit gcp_config/storage-kubeflow.yaml to skip creating new storages:
  1. ...
  2. createPipelinePersistentStorage: false
  3. ...
  • Edit the following files to specify the persistent disks createdin a previous deployment: ${KFAPP}/kustomize/minio/overlays/minioPd/params.env
  1. ...
  2. minioPd=[NAME-OF-ARTIFACT-STORAGE-DISK]
  3. ...

${KFAPP}/kustomize/mysql/overlays/mysqlPd/params.env

  1. ...
  2. mysqlPd=[NAME-OF-METADATA-STORAGE-DISK]
  3. ...
  • Then run the apply command:
  1. kfctl apply all -V

Reinstalling Kubeflow in other environments (non-GCP)

The steps are the same as for any non-GCP installation, except that youmust use the same PV definitions as in your previous deployment to create thePV in the new cluster.

  • Create two PVs in your Kubernetes cluster, using the same PV definitions asin your previous deployment. See theKubernetes guide to PVs.

  • Follow the Kubeflowquick start,but note the following change to the standard procedure:

Before running the apply command:

  1. kfctl apply k8s

You should first edit the following files to specify your PVs:

${KFAPP}/kustomize/minio/overlays/minioPd/params.env

  1. ...
  2. minioPd=[YOUR-PRE-CREATED-MINIO-PV-NAME]
  3. ...

${KFAPP}/kustomize/mysql/overlays/mysqlPd/params.env

  1. ...
  2. mysqlPd=[YOUR-PRE-CREATED-MYSQL-PV-NAME]
  3. ...
  • Then run the apply command:
  1. kfctl apply k8s