Deploy Kubeflow cluster

Instructions for using kubectl and kpt to deploy Kubeflow on Google Cloud

This guide describes how to use kubectl and kpt to deploy Kubeflow on Google Cloud.

Deployment steps

Prerequisites

Before installing Kubeflow on the command line:

  1. You must have created a management cluster and installed Config Connector.

    • If you don’t have a management cluster follow the instructions

    • Your management cluster will need a namespace setup to administer the Google Cloud project where Kubeflow will be deployed. This step will be included in later step of current page.

  2. You need to use Linux or Cloud Shell for ASM installation. Currently ASM installation doesn’t work on macOS because it comes with an old version of bash.

  3. Make sure that your Google Cloud project meets the minimum requirements described in the project setup guide.

  4. Follow the guide setting up OAuth credentials to create OAuth credentials for Cloud Identity-Aware Proxy (Cloud IAP).

Install the required tools

  1. Install gcloud.

  2. Install gcloud components

    1. gcloud components install kubectl kpt anthoscli beta
    2. gcloud components update

    kubectl v1.18.19 works best with Kubeflow 1.3, you can install specific version by following instruction, for example: Install kubectl on Linux. But latest patch version of kubectl from v1.17 to v1.19 works well too.

  3. Install Kustomize.

    Note: Prior to Kubeflow v1.2, Kubeflow was compatible only with Kustomize v3.2.1. Starting from Kubeflow v1.2, you can now use any v3 Kustomize version to install Kubeflow. Kustomize v4 is not supported out of the box yet. Official Version

    To deploy the latest version of Kustomize on a Linux or Mac machine, run the following commands:

    1. # Detect your OS and download corresponding latest Kustomize binary
    2. curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
    3. # Add the kustomize package to your $PATH env variable
    4. sudo mv ./kustomize /usr/local/bin/kustomize

    Then, to verify the installation, run kustomize version. You should see Version:kustomize/vX.Y.Z in the output if you’ve successfully deployed Kustomize.

  4. Use one of the following options to install yq v3.

    • If you have Go installed, use the following command to install yq v3.

      1. GO111MODULE=on go get github.com/mikefarah/yq/v3
    • If you don’t have Go installed, follow the instructions in the yq repository to install yq v3. For example:

      1. sudo wget https://github.com/mikefarah/yq/releases/download/3.4.1/yq_linux_amd64 -O /usr/bin/yq && sudo chmod +x /usr/bin/yq
      2. yq --version
      3. # yq version 3.4.1

    Note: The Kubeflow deployment process is not compatible with yq v4 or later.

  5. Install jq https://stedolan.github.io/jq/ ,for example, we can run the following command on Ubuntu and Debian:

    1. sudo apt install jq

Fetch kubeflow/gcp-blueprints and upstream packages

  1. If you have already installed Management cluster, you have kubeflow/gcp-blueprints locally. You just need to run cd kubeflow to access Kubeflow cluster manifests. Otherwise, you can run the following commands:

    1. # Check out Kubeflow v1.3.0 blueprints
    2. git clone https://github.com/kubeflow/gcp-blueprints.git
    3. cd gcp-blueprints
    4. git checkout tags/v1.3.0 -b v1.3.0

    Alternatively, you can get the package by using kpt:

    1. # Check out Kubeflow v1.3.0 blueprints
    2. kpt pkg get https://github.com/kubeflow/gcp-blueprints.git@v1.3.0 gcp-blueprints
    3. cd gcp-blueprints
  2. Run the following command to pull upstream manifests from kubeflow/manifests repository.

    1. # Visit Kubeflow cluster related manifests
    2. cd kubeflow
    3. bash ./pull-upstream.sh

Environment Variables

Log in to gcloud. You only need to run this command once:

  1. gcloud auth login
  1. Review and fill all the environment variables in gcp-blueprints/kubeflow/env.sh, they will be used by kpt setter later on, and some of them will be used in this deployment guide. Review the comment in env.sh for the explanation for each envrionment variable. After defining these environment variables, run:

    1. source env.sh
  2. Set environment variables with OAuth Client ID and Secret for IAP:

    1. export CLIENT_ID=<Your CLIENT_ID>
    2. export CLIENT_SECRET=<Your CLIENT_SECRET>

    Note

    Do not omit the export because scripts triggered by make need these environment variables.

    Note

    Do not check in these two envrionment variables configuration to source control, they are secrets.

Configure Kubeflow

kpt setter config

Run the following commands to configure kpt setter for your Kubeflow cluster:

  1. bash ./kpt-set.sh

Everytime you change envrionment variables, make sure you run the command above to apply kpt setter change to all packages. Otherwise, kustomize build will not be able to pick up new changes.

Note, you can find out which setters exist in a package and their current values by running the following commands:

  1. kpt cfg list-setters .
  2. kpt cfg list-setters common/managed-storage
  3. kpt cfg list-setters apps/pipelines

You can learn more about kpt cfg set in kpt documentation, or by running kpt cfg set --help.

Management cluster config

You need to configure the kubectl context ${MGMTCTXT} to create a namespace same as your Kubeflow project, you only need to do this once for each Kubeflow project.

  • Choose the management cluster context

    1. kubectl config use-context "${MGMTCTXT}"
  • Create a namespace in your management cluster for the Kubeflow project if you haven’t done so.

    1. kubectl create namespace "${KF_PROJECT}"

Authorize Cloud Config Connector for each Kubeflow project

In the Management cluster deployment we created the Google Cloud service account ${MGMT_NAME}-cnrm-system@${MGMT_PROJECT}.iam.gserviceaccount.com this is the service account that Config Connector will use to create any Google Cloud resources. If your Management cluster and Kubeflow cluster live in different projects, you need to grant this Google Cloud service account sufficient privileges to create the desired resources in Kubeflow project.

The easiest way to do this is to grant the Google Cloud service account owner permissions on one or more projects.

  1. Set the Management environment variable if you haven’t:

    1. MGMT_PROJECT=<the project where you deploy your management cluster>
  2. Redirect to managment directory and configure kpt setter:

    1. pushd "../management"
    2. kpt cfg set -R . name "${MGMT_NAME}"
    3. kpt cfg set -R . gcloud.core.project "${MGMT_PROJECT}"
    4. kpt cfg set -R . managed-project "${KF_PROJECT}"
  3. Update the policy:

    1. gcloud beta anthos apply ./managed-project/iam.yaml

    Optionally, to restrict permissions you want to grant to this service account. You can edit ./managed-project/iam.yaml and specify more granular roles. Refer to IAMPolicy Config Connector reference for exact fields you can set.

  4. Return to gcp-blueprints/kubeflow directory:

    1. popd

Deploy Kubeflow

To deploy Kubeflow, run the following command:

  1. make apply
  • If resources can’t be created because webhook.cert-manager.io is unavailable wait and then rerun make apply

  • If resources can’t be created with an error message like:

    1. error: unable to recognize ".build/application/app.k8s.io_v1beta1_application_application-controller-kubeflow.yaml": no matches for kind "Application" in version "app.k8s.io/v1beta1”

    This issue occurs when the CRD endpoint isn’t established in the Kubernetes API server when the CRD’s custom object is applied. This issue is expected and can happen multiple times for different kinds of resource. To resolve this issue, try running make apply again.

Check your deployment

Follow these steps to verify the deployment:

  1. When the deployment finishes, check the resources installed in the namespace kubeflow in your new cluster. To do this from the command line, first set your kubectl credentials to point to the new cluster:

    1. gcloud container clusters get-credentials "${KF_NAME}" --zone "${ZONE}" --project "${KF_PROJECT}"

    Then, check what’s installed in the kubeflow namespace of your GKE cluster:

    1. kubectl -n kubeflow get all

Access the Kubeflow user interface (UI)

To access the Kubeflow central dashboard, follow these steps:

  1. Use the following command to grant yourself the IAP-secured Web App User role:

    1. gcloud projects add-iam-policy-binding "${KF_PROJECT}" --member=user:<EMAIL> --role=roles/iap.httpsResourceAccessor

    Note, you need the IAP-secured Web App User role even if you are already an owner or editor of the project. IAP-secured Web App User role is not implied by the Project Owner or Project Editor roles.

  2. Enter the following URI into your browser address bar. It can take 20 minutes for the URI to become available: https://${KF_NAME}.endpoints.${KF_PROJECT}.cloud.goog/

    You can run the following command to get the URI for your deployment:

    1. kubectl -n istio-system get ingress
    2. NAME HOSTS ADDRESS PORTS AGE
    3. envoy-ingress your-kubeflow-name.endpoints.your-gcp-project.cloud.goog 34.102.232.34 80 5d13h

    The following command sets an environment variable named HOST to the URI:

    1. export HOST=$(kubectl -n istio-system get ingress envoy-ingress -o=jsonpath={.spec.rules[0].host})
  3. Follow the instructions on the UI to create a namespace. Refer to this guide on creation of profiles.

Notes:

  • It can take 20 minutes for the URI to become available. Kubeflow needs to provision a signed SSL certificate and register a DNS name.
  • If you own or manage the domain or a subdomain with Cloud DNS then you can configure this process to be much faster. Check kubeflow/kubeflow#731.

Understanding the deployment process

This section gives you more details about the kubectl, kustomize, config connector configuration and deployment process, so that you can customize your Kubeflow deployment if necessary.

Application layout

Your Kubeflow application directory gcp-blueprints/kubeflow contains the following files and directories:

  • Makefile is a file that defines rules to automate deployment process. You can refer to GNU make documentation for more introduction. The Makefile we provide is designed to be user maintainable. You are encouraged to read, edit and maintain it to suit your own deployment customization needs.

  • apps, common, contrib are a series of independent components directory containing kustomize packages for deploying Kubeflow components. The structure is to align with upstream kubeflow/manifests.

    • kubeflow/gcp-blueprints only stores kustomization.yaml and patches for Google Cloud specific resources.

    • ./pull_upstream.sh will pull kubeflow/manifests and store manifests in upstream folder of each component in this guide. kubeflow/gcp-blueprints repo doesn’t store the copy of upstream manifests.

  • build is a directory that will contain the hydrated manifests outputted by the make rules, each component will have its own build directory. You can customize the build path when calling make command.

Source Control

It is recommended that you check in your entire local repository into source control.

Checking in build is recommended so you can easily see differences by git diff in manifests before applying them.

Google Cloud service accounts

The kfctl deployment process creates three service accounts in your Google Cloud project. These service accounts follow the principle of least privilege. The service accounts are:

  • ${KF_NAME}-admin is used for some admin tasks like configuring the load balancers. The principle is that this account is needed to deploy Kubeflow but not needed to actually run jobs.
  • ${KF_NAME}-user is intended to be used by training jobs and models to access Google Cloud resources (Cloud Storage, BigQuery, etc.). This account has a much smaller set of privileges compared to admin.
  • ${KF_NAME}-vm is used only for the virtual machine (VM) service account. This account has the minimal permissions needed to send metrics and logs to Stackdriver.

Upgrade Kubeflow

Refer to Upgrading Kubeflow cluster.

Next steps

Last modified 14.06.2021: update link (#2763) (86b7c850)