Pipelines End-to-end on GCP

An end-to-end tutorial for Kubeflow Pipelines on GCP

This guide walks you through a Kubeflow Pipelines sample that runs an MNISTmachine learning (ML) model on Google Cloud Platform (GCP).

Introductions

Kubeflow Pipelines is a platform for building anddeploying portable, scalable ML workflows based onDocker containers. When you install Kubeflow, you get Kubeflow Pipelines too.

By working through this tutorial, you learn how to deploy Kubeflow onKubernetes Engine (GKE) and run a pipeline supplied as a Python script.The pipeline trains an MNIST model for image classification and serves the modelfor online inference (also known as online prediction).

Overview of GCP and GKE

Google Cloud Platform (GCP) is a suite of cloud computing services runningon Google infrastructure. The services include compute power, data storage,data analytics, and machine learning.

The Cloud Shell is a browser interface that provides command-line access to cloud resources that you can use to interact with GCP, including the gcloud command and others.

Kubernetes Engine (GKE) is a managed service on GCP whereyou can deploy containerized applications. You describe the resources that yourapplication needs, and GKE provisions and manages the underlyingcloud resources automatically.

Here’s a list of the primary GCP services that you use when following thisguide:

The model and the data

This tutorial trains a TensorFlow model on theMNIST dataset, which is a hello world scenario for machinelearning.

The MNIST dataset contains a large number of images of hand-written digits inthe range 0 to 9, as well as the labels identifying the digit in each image.

After training, the model can classify incoming images into 10 categories(0 to 9) based on what it’s learned about handwritten images. In other words,you send an image to the model, and the model does its best to identify thedigit shown in the image.Prediction UI

In the above screenshot, the image shows a hand-written 7. This image wasthe input to the model. The table below the image shows a bar graph for eachclassification label from 0 to 9, as output by the model. Each barrepresents the probability that the image matches the respective label.Judging by this screenshot, the model seems pretty confident that this imageis a 7.

Set up your environment

Let’s get started!

Set up your GCP account and SDK

Follow these steps to set up your GCP environment:

  • Select or create a project on the GCP Console.
  • Make sure that billing is enabled for your project. See the guide tomodifying a project’s billing settings.
  • Use Cloud console to grant your team access to Kubeflow by assigning them the following roles:

    • Project Owner: Ensures that your team can access all of the resources used in this guide.
    • IAP-secured Web App User: This guide uses Cloud Identity-Aware Proxy (IAP) to secure access to your Kubeflow cluster. Your team must be members of the IAP-secured Web App User role to authenticate with the Kubeflow web application. Notes:
  • As you work through this tutorial, your project uses billable components ofGCP. To minimise costs, follow the instructions toclean up your GCP resources when you’ve finished with them.
  • This guide uses Cloud Shell to manage your GCP environment, to save you the steps of installing Cloud SDK and kubectl.

Start your Cloud Shell

Follow the link to activate aCloud Shell environment in yourbrowser.

Set up some handy environment variables

Set up the following environment variables for use throughout the tutorial:

  • Set your GCP project ID. In the command below, replace <YOUR-PROJECT-ID>with your project ID:
  1. export PROJECT=<YOUR-PROJECT-ID>
  2. gcloud config set project ${PROJECT}
  • Set the zone for your GCP configuration. Choose a zone that offers theresources you need. See the guide to GCP regions and zones.

    • Ensure you have enough Compute Engine regional capacity.By default, the GKE cluster setup described in this guiderequires 16 CPUs.
    • If you want a GPU, ensure your zone offers GPUs. For example, the following commands set the zone to us-central1-c:
  1. export ZONE=us-central1-c
  2. gcloud config set compute/zone ${ZONE}
  • If you want a custom name for your Kubeflow deployment, set theDEPLOYMENT_NAME environment variable. The deployment name must be4-20 characters in length. If you don’t set thisenvironment variable, your deployment gets the default name of kubeflow:
  1. export DEPLOYMENT_NAME=kubeflow

Deploy Kubeflow

Deploy Kubeflow on GCP:

  • Follow the instructions in theguide to deploying Kubeflow on GCP,taking note of the following:

    • If you want the most simple deployment experience, use the Kubeflow deployment web appas described in the guide todeployment using the UI.The deployment web app currently supportsKubeflow v0.5.0.
    • For more controlover the deployment, use the guide todeployment using the CLI.The CLI supports Kubeflow v0.6.2 and later versions.
    • Make sure that you enable Cloud Identity-Aware Proxy (IAP)as prompted during the deployment process.
    • When setting up the authorized redirect URI for the OAuth clientcredentials, use the same value for the <deployment_name> as you usedwhen setting up the DEPLOYMENT_NAME environment variable earlier in thistutorial.
    • The following screenshot shows the Kubeflow deployment UI with hints aboutthe value for each input field: Prediction UI
  • (Optional) If you want to examine your cluster while waiting for theKubeflow dashboard tobe available, you can use kubectl to connect to your cluster:

    • Connect your Cloud Shell session to the cluster:
  1. gcloud container clusters get-credentials \
  2. ${DEPLOYMENT_NAME} --zone ${ZONE} --project ${PROJECT}
  • Switch to the kubeflow namespace to see the resources on the Kubeflowcluster:
  1. kubectl config set-context $(kubectl config current-context) --namespace=kubeflow
  • Check the resources deployed in the kubeflow namespace:
  1. kubectl get all
  • Access the Kubeflow UI, which becomes available at the following URI afterseveral minutes:
  1. https://<deployment-name>.endpoints.<project>.cloud.goog/

The following screenshot shows the Kubeflow UI:Prediction UI

  • Click Pipelines to access the pipelines UI. The pipelines UIlooks like this:Pipelines UI

Notes:

If you own/manage the domain or a subdomain with Cloud DNSthen you can configure this process to be much faster. Seekubeflow/kubeflow#731.

Create a Cloud Storage bucket

The next step is to create a Cloud Storage bucket to hold your trained model.

Cloud Storage is a scalable, fully-managed object/blob store.You can use it for a range of scenarios including serving website content,storing data for archival and disaster recovery, or distributing large dataobjects to users via direct download. This tutorial uses Cloud Storage tohold the trained machine learning model and associated data.

Use the gsutil mb command to create a storage bucket. Yourbucket name must be unique across all of Cloud Storage.The following commands create a bucket in the region that corresponds to thezone which you specified earlier in the tutorial:

  1. export BUCKET_NAME=${PROJECT}-${DEPLOYMENT_NAME}-bucket
  2. export REGION=$(gcloud compute zones describe $ZONE --format="value(region.basename())")
  3. gsutil mb -c regional -l ${REGION} gs://${BUCKET_NAME}

Prepare your pipeline

To simplify this tutorial, you can use a set of prepared files that includethe pipeline definition and supporting files. The project files are in theKubeflow examples repositoryon GitHub.

Download the project files

Clone the project files and go to the directory containing the MNIST pipelineexample:

  1. cd ${HOME}
  2. git clone https://github.com/kubeflow/examples.git
  3. cd examples/pipelines/mnist-pipelines

As an alternative to cloning, you can download theKubeflow examples repository zip file.

Set up Python

You need Python 3.5 or above. This tutorial uses Python 3.7.If you don’t have a Python 3 environment set up, installMiniconda as described below:

  • In a Debian/Ubuntu/Cloud shellenvironment, run the following commands:
  1. apt-get update
  2. wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
  3. bash Miniconda3-latest-Linux-x86_64.sh
  • In a Windows environment, download theinstallerand make sure you select the “Add Miniconda to my PATH environment variable”option during the installation.

  • In a Mac environment, download theinstallerand run the following command:

  1. bash Miniconda3-latest-MacOSX-x86_64.sh

Create a clean Python 3 environment (this tutorial uses Python 3.7):

  1. conda create --name mlpipeline python=3.7
  2. conda activate mlpipeline

If the conda command is not found, be sure to add Miniconda to your path:

  1. export PATH=MINICONDA_PATH/bin:$PATH

Install the Kubeflow Pipelines SDK

Install the Kubeflow Pipelines SDK, along with other Python dependencies definedin the requirements.txt file:

  1. pip install -r requirements.txt --upgrade

Compile the sample pipeline

The pipeline is defined in the Python file mnist_pipeline.py which youdownloaded from GitHub. When you execute that Python file, it compiles thepipeline to an intermediate representation which you can then upload to theKubeflow Pipelines service.

Run the following command to compile the pipeline:

  1. python3 mnist_pipeline.py

Alongside your mnist_pipeline.py file, you should now have a file calledmnist_pipeline.py.tar.gz which contains the compiled pipeline.

Run the pipeline

Go back to the the Kubeflow Pipelines UI, which you accessed in an earlier stepof this tutorial. Now you’re ready to upload and run your pipeline using thatUI.

  • Click Upload pipeline on the Kubeflow Pipelines UI:Upload a pipeline via the UI

  • Upload your mnist_pipeline.py.tar.gz file and give the pipeline a name:Enter the pipeline upload details

  • Your pipeline now appears in the list of pipelines on the UI.Click your pipeline name:Uploaded pipeline in list of pipelines

  • The UI shows your pipeline’s graph and various options.Click Create run:Pipeline graph and options

  • Supply the following run parameters:

    • Run name: A descriptive name for this run of the pipeline. You cansubmit multiple runs of the same pipeline.
    • bucket-path: The Cloud Storage bucket that you created earlier to hold theresults of the pipeline run. The sample supplies the values for the other parameters:

    • train-steps: The number of training steps to run.

    • learning-rate: The learning rate for model training.
    • batch-size: The batch size for model training. Then click Start:Starting a pipeline run
  • The pipeline run now appears in the list of runs:List of pipeline runs

  • Click the run to see its details. In the following screenshot, the firsttwo components (train and serve) have finished successfully and the thirdcomponent (web-ui) is still running:A running pipeline

  • Click on any component to see its logs.Logs for a pipeline component

  • When the pipeline run is complete, look at the logs for the web-uicomponent to find the IP address created for the MNIST web interface.Copy the IP address and paste it into your webbrowser’s address bar. The web UI should appear.

Below the connect screen, you should see a prediction UI for your MNISTmodel.Prediction UI

Each time you refresh the page, it loads a random image from the MNIST testdataset and performs a prediction. In the above screenshot, the image shows ahand-written 7. The table below the image shows a bar graph for eachclassification label from 0 to 9. Each bar representsthe probability that the image matches the respective label.

Notes:

  • You can find your trained model data in the bucket path you entered in step 5 of this procedure.

Understanding the pipeline definition code

The pipeline is defined in the Python file mnist_pipeline.py which youdownloaded from GitHub. The following sections give an overview of the contentof that file.

Decorator

The @dsl.pipeline decorator provides metadata about the pipeline:

  1. @dsl.pipeline(
  2. name='MNIST',
  3. description='A pipeline to train and serve the MNIST example.'
  4. )

Function header

The mnist_pipeline function defines the pipeline. The function includes anumber of arguments which are exposed in the Kubeflow Pipelines UI when youcreate a new run of the pipeline.Although you pass these arguments as strings, the arguments are of typekfp.dsl.PipelineParam.

  1. def mnist_pipeline(model_export_dir='gs://your-bucket/export',
  2. train_steps='200',
  3. learning_rate='0.01',
  4. batch_size='100'):

The training component (train)

The following block defines the train component, which handles the trainingof the ML model:

  1. train = dsl.ContainerOp(
  2. name='train',
  3. image='gcr.io/kubeflow-examples/mnist/model:v20190304-v0.2-176-g15d997b',
  4. arguments=[
  5. "/opt/model.py",
  6. "--tf-export-dir", model_export_dir,
  7. "--tf-train-steps", train_steps,
  8. "--tf-batch-size", batch_size,
  9. "--tf-learning-rate", learning_rate
  10. ]
  11. ).apply(gcp.use_gcp_secret('user-gcp-sa'))

A component consists of akfp.dsl.ContainerOpobject with a name and a container path. The container image forthe MNIST training component is defined in the MNIST example’sDockerfile.model.

The training component runs with access to your user-gcp-sa secret, whichensures the component has read/write access to your Cloud Storage bucket forstoring the output from the model training.

The model serving component (serve)

The following block defines the serve component, which serves the trainedmodel for prediction:

  1. serve = dsl.ContainerOp(
  2. name='serve',
  3. image='gcr.io/ml-pipeline/ml-pipeline-kubeflow-deployer:\
  4. 7775692adf28d6f79098e76e839986c9ee55dd61',
  5. arguments=[
  6. '--model-export-path', model_export_dir,
  7. '--server-name', "mnist-service"
  8. ]
  9. ).apply(gcp.use_gcp_secret('user-gcp-sa'))
  10. serve.after(train)

The serve component differs from the train component with respect tohow long the service lasts. While train runs a single container and thenexits, serve runs a container that launches long-lived resources in thecluster.

The ContainerOP takes two arguments:

  • A path pointing to the location of your trained model.
  • A server name.

The component creates a Kubeflowtf-servingservice within the cluster. This service lives on after the pipeline hasfinished running.

You can see the Dockerfile used to build this container in theKubeflow Pipelines repository.Like the train component, serve requires access to the user-gcp-sa secretfor access to the kubectl command within the container.

The serve.after(train) line specifies that this component must runsequentially after the train component is complete.

The web UI component (web-ui)

The following block defines the web-ui component, which displays a simpleweb page. The web application sends an image (picture) to the trained model anddisplays the prediction results:

  1. web_ui = dsl.ContainerOp(
  2. name='web-ui',
  3. image='gcr.io/kubeflow-examples/mnist/deploy-service:latest',
  4. arguments=[
  5. '--image', 'gcr.io/kubeflow-examples/mnist/web-ui:\
  6. v20190304-v0.2-176-g15d997b-pipelines',
  7. '--name', 'web-ui',
  8. '--container-port', '5000',
  9. '--service-port', '80',
  10. '--service-type', "LoadBalancer"
  11. ]
  12. ).apply(gcp.use_gcp_secret('user-gcp-sa'))
  13. web_ui.after(serve)

Like serve, the web-ui component launches a service that continues to existafter the pipeline is complete. Instead of launching a Kubeflow resource, theweb-ui component launches a standard Kubernetes deployment/service pair. Youcan see the Dockerfile that builds the deployment image in the./deploy-service/Dockerfilethat you downloaded with the sample files. This image runs thegcr.io/kubeflow-examples/mnist/web-ui:v20190304-v0.2-176-g15d997b-pipelinescontainer, which was built from the MNIST example’sweb-ui Dockerfile.

This component provisions a LoadBalancer service that gives external access to aweb-ui deployment launched in the cluster.

The main function

The main function compiles the pipeline, converting the Python program tothe intermediate YAML representation required by the Kubeflow Pipelines serviceand zipping the result into a tar.gz file:

  1. if __name__ == '__main__':
  2. import kfp.compiler as compiler
  3. compiler.Compiler().compile(mnist_pipeline, __file__ + '.tar.gz')

Clean up your GCP environment

Run the following command to delete your deployment and related resources:

  1. gcloud deployment-manager --project=${PROJECT} deployments delete ${DEPLOYMENT_NAME}

Delete your Cloud Storage bucket when you’ve finished with it:

  1. gsutil rm -r gs://${BUCKET_NAME}

As an alternative to the command line, you can delete the various resourcesusing the GCP Console.

Next steps

Build your own machine-learning pipelines with the Kubeflow PipelinesSDK.