Kubeflow Overview
How Kubeflow helps you organize your ML workflow This guide introduces Kubeflow as a platform for developing and deploying amachine learning (ML) system.
Kubeflow is a platform for data scientists who want to build and experiment withML pipelines. Kubeflow is also for ML engineers and operational teams who wantto deploy ML systems to various environments for development, testing, andproduction-level serving.
Conceptual overview
Kubeflow is the ML toolkit for Kubernetes.The following diagram shows Kubeflow as a platform for arranging thecomponents of your ML system on top of Kubernetes:
Kubeflow builds on Kubernetes as a system fordeploying, scaling, and managing complex systems.
Using the Kubeflow configuration interfaces (see below) you canspecify the ML tools required for your workflow. Then you can deploy theworkflow to various clouds, local, and on-premises platforms for experimentation andfor production use.
Introducing the ML workflow
When you develop and deploy an ML system, the ML workflow typically consists ofseveral stages. Developing an ML system is an iterative process.You need to evaluate the output of various stages of the ML workflow, and applychanges to the model and parameters when necessary to ensure the model keepsproducing the results you need.
For the sake of simplicity, the following diagramshows the workflow stages in sequence. The arrow at the end of the workflowpoints back into the flow to indicate the iterative nature of the process:
Looking at the stages in more detail:
In the experimental phase, you develop your model based on initialassumptions, and test and update the model iteratively to produce theresults you’re looking for:
- Identify the problem you want the ML system to solve.
- Collect and analyze the data you need to train your ML model.
- Choose an ML framework and algorithm, and code the initial version of yourmodel.
- Experiment with the data and with training your model.
- Tune the model hyperparameters to ensure the most efficient processing and themost accurate results possible.
In the production phase, you deploy a system that performs the followingprocesses:
- Transform the data into the format that your training system needs.To ensure that your model behaves consistently during training andprediction, the transformation process must be the same in the experimentaland production phases.
- Train the ML model.
- Serve the model for online prediction or for running in batch mode.
- Monitor the model’s performance, and feed the results into your processesfor tuning or retraining the model.
Kubeflow components in the ML workflow
The next diagram adds Kubeflow to the workflow, showing which Kubeflowcomponents are useful at each stage:
To learn more, read the following guides to the Kubeflow components:
Kubeflow includes services for spawning and managingJupyter notebooks. Use notebooks for interactive datascience and experimenting with ML workflows.
Kubeflow Pipelines is a platform forbuilding, deploying, and managing multi-step ML workflows based on Dockercontainers.
Kubeflow offers several components that you can useto build your ML training, hyperparameter tuning, and serving workloads acrossmultiple platforms.
Example of a specific ML workflow
The following diagram shows a simple example of a specific ML workflow that youcan use to train and serve a model trained on the MNIST dataset:
For details of the workflow and to run the system yourself, see theend-to-end tutorial for Kubeflow on GCP.
Kubeflow interfaces
This section introduces the interfaces that you can use to interact withKubeflow and to build and run your ML workflows on Kubeflow.
Kubeflow user interface (UI)
The Kubeflow UI looks like this:
The UI offers a central dashboard that you can use to access the componentsof your Kubeflow deployment. Readhow to access the UI.
Kubeflow command line interface (CLI)
Kfctl is the Kubeflow CLI that you can use to install and configureKubeflow. Read about kfctl in the guide toconfiguring Kubeflow.
The Kubernetes CLI, kubectl, is useful for running commands against yourKubeflow cluster. You can use kubectl to deploy applications, inspect and managecluster resources, and view logs. Read about kubectl in the Kubernetesdocumentation.
Kubeflow APIs and SDKs
Various components of Kubeflow offer APIs and Python SDKs. See the followingsets of reference documentation:
- Kubeflow reference docs for guides to the KubeflowMetadata API and SDK, the PyTorchJob CRD, and the TFJob CRD.
- Pipelines reference docs for the KubeflowPipelines API and SDK, including the Kubeflow Pipelines domain-specificlanguage (DSL).
- Fairing reference docs for the Kubeflow FairingSDK.
Next steps
See how to install Kubeflow depending onyour chosen environment (local, cloud, or on-premises).