Overview
Full fledged Kubeflow deployment on Google Cloud
This guide describes how to deploy Kubeflow and a series of Kubeflow components on GKE (Google Kubernetes Engine). If you want to use Kubeflow Pipelines only, refer to Installation Options for Kubeflow Pipelines for choosing an installation option.
Deployment Structure
As a high level overview, you need to create one Management cluster which allows you to manage Google Cloud resources via Config Connector. Management cluster can create, manage and delete multiple Kubeflow clusters, while being independent from Kubeflow clusters’ activities. Below is a simplified view of deployment structure. Note that Management cluster can live in a different Google Cloud project from Kubeflow clusters, admin should assign owner permission to Management cluster’s service account. It will be explained in detail during Deployment steps.
Deployment steps
Follow the steps below to set up Kubeflow environment on Google Cloud. Some of these steps are one-time only, for example: OAuth Client can be shared by multiple Kubeflow clusters in the same Google Cloud project.
If you encounter any issue during the deployment steps, refer to Troubleshooting deployments on GKE to find common issues and debugging approaches. If this issue is new, file a bug to kubeflow/gcp-blueprints for GKE related issue, or file a bug to the corresponding component in Kubeflow on GitHub if the issue is component specific.
Features
Once you finish deployment, you will be able to:
- manage a running Kubernetes cluster with multiple Kubeflow components installed.
- get a Cloud Endpoint which is accessible via IAP (Identity-aware Proxy).
- enable Multi-user feature for resource and access isolation.
- take advantage of GKE’s Cluster Autoscaler to automatically resize the number of nodes in a node pool.
- choose GPUs and Cloud TPU to accelerate your workload.
- use Cloud Logging to help debugging and troubleshooting.
- access to many managed services offered by Google Cloud.
Next steps
- Repeat Deploy Kubeflow Cluster if you want to deploy multiple clusters.
- Run a full ML workflow on Kubeflow, using the end-to-end MNIST tutorial.
Last modified 17.05.2021: (gke) Full Kubeflow Deployment guide on GKE (revision for Kubeflow 1.3) (#2686) (f50693a3)