Upgrading and Reinstalling
How to upgrade or reinstall your Pipelines deployment on Google Cloud Platform (GCP)
Starting from Kubeflow v0.5, Kubeflow Pipelines persists thepipeline data in permanent storage volumes. Kubeflow Pipelines thereforesupports the following capabilities:
Reinstall: You can delete a cluster and create a new cluster, specifyingthe existing storage volumes to retrieve the original data in the new cluster.This guide tells you how to reinstall Kubeflow Pipelines as part of afull Kubeflow deployment.
Upgrade (limited support):
The full Kubeflow deployment currently supports upgrading in Alphastatus with limited support. Check the following sources for progressupdates:
Before you start
This guide tells you how to reinstall Kubeflow Pipelines as part of afull Kubeflow deployment on Google Kubernetes Engine (GKE). See theKubeflow deployment guide.
Instead of the full Kubeflow deployment, you can use Kubeflow PipelinesStandalone or GCP Hosted ML Pipelines (Alpha), which support different optionsfor upgrading and reinstalling. See the Kubeflow Pipelines installationoptions.
Kubeflow Pipelines data storage
Kubeflow Pipelines creates and manages the following data related to yourmachine learning pipeline:
- Metadata: Experiments, jobs, runs, etc. Kubeflow Pipelinesstores the pipeline metadata in a MySQL database.
- Artifacts: Pipeline packages, metrics, views, etc. Kubeflow Pipelinesstores the artifacts in a Minio server.
Kubeflow Pipelines uses the KubernetesPersistentVolume(PV) subsystem to provision the MySQL database and the Minio server.On GCP, Kubeflow Pipelines creates a Google Compute EnginePersistent Disk (PD)and mounts it as a PV.
After deploying Kubeflow on GCP, you can see two entries inthe GCP Deployment Manager,one for the cluster deployment and one for the storage deployment:
The entry with the suffix -storage
creates one PD for the metadata store andone for the artifact store:
Reinstalling Kubeflow Pipelines
You can delete a Kubeflow cluster and create a new one, specifyingyour existing storage to retrieve the original data in the new cluster.
Notes:
- You must use command-line deployment.You cannot reinstall Kubeflow Pipelines using the web interface.
- When you do
kfctl apply
orkfctl build
, you should use a differentdeployment name from your existing deployment name. Otherwise, kfctl willdelete your data in the existing PDs. This guide defines the deployment namein the ${KF_NAME} environment variable.
To reinstall Kubeflow Pipelines:
Follow the command-line deploymentinstructions, but note the followingchanges in the procedure.
Set a different
${KF_NAME}
name from your existing${KF_NAME}
.Before running the
kfctl apply
command:- Edit
${KF_DIR}/gcp_config/storage-kubeflow.yaml
and set the followingflag to skip creating new storage:
- Edit
...
createPipelinePersistentStorage: false
...
- Edit
${KF_DIR}/kustomize/minio/overlays/minioPd/params.env
and specifythe PD that your existing deployment uses for the Minio server:
...
minioPd=[NAME-OF-ARTIFACT-STORAGE-DISK]
...
- Edit
${KF_DIR}/kustomize/mysql/overlays/mysqlPd/params.env
and specifythe PD that your existing deployment uses for the MySQL database:
...
mysqlPd=[NAME-OF-METADATA-STORAGE-DISK]
...
- Run the
kfctl apply
command to deploy Kubeflow as usual:
kfctl apply -V -f ${CONFIG_FILE}
You should now have a new Kubeflow deployment that uses the same pipelines datastorage as your previous deployment. Follow the steps in the deployment guideto check your deployment.