Introduction to the Pipelines SDK

Introduction to the Pipelines SDK

Overview of using the SDK to build components and pipelines

The Kubeflow PipelinesSDKprovides a set of Python packages that you can use to specify and run yourmachine learning (ML) workflows. A pipeline is a description of an MLworkflow, including all of the components that make up the steps in theworkflow and how the components interact with each other.

SDK packages

The Kubeflow Pipelines SDK includes the following packages:

kfp.compilerincludes classes and methods for building Docker container images for yourpipeline components. Methods in this package include, but are not limitedto, the following:
- kfp.compiler.Compiler.compile compiles your Python DSL code into a singlestatic configuration (in YAML format) that the Kubeflow Pipelines servicecan process. The Kubeflow Pipelines service converts the staticconfiguration into a set of Kubernetes resources for execution.
- kfp.compiler.build_docker_image builds a container image based on aDockerfile and pushes the image to a URI. In the parameters, you provide thepath to a Dockerfile containing the image specification, and the URI for thetarget image (for example, a container registry).
- kfp.compiler.build_python_component builds a container image for apipeline component based on a Python function, and pushes the image to aURI. In the parameters, you provide the Python function that does the workof the pipeline component, a Docker image to use as a base image,and the URI for the target image (for example, a container registry).
kfp.componentsincludes classes and methods for interacting with pipeline components.Methods in this package include, but are not limited to, the following:
- kfp.components.func_to_container_op converts a Python function to apipeline component and returns a factory function.You can then call the factory function to construct an instance of apipeline task(ContainerOp)that runs the original function in a container.
- kfp.components.load_component_from_file loads a pipeline component froma file and returns a factory function.You can then call the factory function to construct an instance of apipeline task(ContainerOp)that runs the component container image.
- kfp.components.load_component_from_url loads a pipeline component froma URL and returns a factory function.You can then call the factory function to construct an instance of apipeline task(ContainerOp)that runs the component container image.
kfp.dslcontains the domain-specific language (DSL) that you can use to define andinteract with pipelines and components.Methods, classes, and modules in this package include, but are not limited to,the following:
- kfp.dsl.ContainerOp represents a pipeline task (op) implemented by acontainer image.
- kfp.dsl.PipelineParam represents a pipeline parameter that you can passfrom one pipeline component to another. See the guide topipeline parameters.
- kfp.dsl.component is a decorator for DSL functions that returns apipeline component.(ContainerOp).
- kfp.dsl.pipeline is a decorator for Python functions that returns apipeline.
- kfp.dsl.python_component is a decorator for Python functions that addspipeline component metadata to the function object.
- kfp.dsl.typescontains a list of types defined by the Kubeflow Pipelines SDK. Typesinclude basic types like String, Integer, Float, and Bool, as wellas domain-specific types like GCPProjectID and GCRPath.See the guide toDSL static type checking.
- kfp.dsl.ResourceOprepresents a pipeline task (op) which lets you directly manipulateKubernetes resources (create, get, apply, …).
- kfp.dsl.VolumeOprepresents a pipeline task (op) which creates a new PersistentVolumeClaim(PVC). It aims to make the common case of creating a PersistentVolumeClaimfast.
- kfp.dsl.VolumeSnapshotOprepresents a pipeline task (op) which creates a new VolumeSnapshot. Itaims to make the common case of creating a VolumeSnapshot fast.
- kfp.dsl.PipelineVolumerepresents a volume used to pass data between pipeline steps. ContainerOpscan mount a PipelineVolume either via the constructor’s argumentpvolumes or add_pvolumes() method.
kfp.Clientcontains the Python client libraries for the Kubeflow PipelinesAPI.Methods in this package include, but are not limited to, the following:
- kfp.Client.create_experiment creates a pipelineexperiment and returns anexperiment object.
- kfp.Client.run_pipeline runs a pipeline and returns a run object.
kfp.notebook
KFP extension modulesinclude classes and functions for specific platforms on which you can useKubeflow Pipelines. Examples include utility functions for on premises,Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure.

Installing the SDK

Follow the guide toinstalling the Kubeflow Pipelines SDK.

Building pipelines and components

This section summarizes the ways you can use the SDK to build pipelines andcomponents:

Creating components from existing applicationcode
Creating components within your application code
Creating lightweight components
Using prebuilt, reusuable components in your pipeline

The diagrams provide a conceptual guide to the relationships between thefollowing concepts:

Your Python code
A pipeline component
A Docker container image
A pipeline

Creating components from existing application code

This section describes how to create a component and a pipeline outside yourPython application, by creating components from existing containerizedapplications. This technique is useful when you have already created aTensorFlow program, for example, and you want to use it in a pipeline.

Below is a more detailed explanation of the above diagram:

Write your application code, my-app-code.py. For example, write code totransform data or train a model.
Create a Docker container image thatpackages your program (my-app-code.py) and upload the container image to aregistry. To build a container image based on a givenDockerfile, you can usethe Docker command-lineinterfaceor thekfp.compiler.build_docker_image method from the Kubeflow PipelinesSDK.
Write a component function using the Kubeflow Pipelines DSL to define yourpipeline’s interactions with the component’s Docker container. Yourcomponent function must return akfp.dsl.ContainerOp.Optionally, you can use the kfp.dsl.componentdecoratorto enable static type checking inthe DSL compiler. To use the decorator, you can add the @kfp.dsl.componentannotation to your component function:

@kfp.dsl.component
def my_component(my_param):
  ...
  return kfp.dsl.ContainerOp(
    name='My component name',
    image='gcr.io/path/to/container/image'
  )

Write a pipeline function using the Kubeflow Pipelines DSL to define thepipeline and include all the pipeline components. Use the kfp.dsl.pipelinedecoratorto build a pipeline from your pipeline function. To use the decorator, you canadd the @kfp.dsl.pipeline annotation to your pipeline function:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(param_1: PipelineParam, param_2: PipelineParam):
  my_step = my_component(my_param='a')

Compile the pipeline to generate a compressed YAML definition of thepipeline. The Kubeflow Pipelines service converts the static configurationinto a set of Kubernetes resources for execution.

To compile the pipeline, you can choose one of the followingoptions:

Use thekfp.compiler.Compiler.compilemethod:

kfp.compiler.Compiler().compile(my_pipeline,  
  'my-pipeline.zip')

Alternatively, use the dsl-compile command on the command line.

dsl-compile --py [path/to/python/file] --output my-pipeline.zip

Use the Kubeflow Pipelines SDK to run the pipeline:

client = kfp.Client()
my_experiment = client.create_experiment(name='demo')
my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 
  'my-pipeline.zip')

You can also choose to share your pipeline as follows:

Upload the pipeline zip file to the Kubeflow Pipelines UI. For moreinformation about the UI, see the Kubeflow Pipelines quickstartguide.
Upload the pipeline zip file to a shared repository. See thereusable components and other shared resources.

More about the above workflow

For more detailed instructions, see the guide to building components andpipelines.

For an example, see thexgboost-training-cm.pypipeline sample on GitHub. The pipeline creates an XGBoost model usingstructured data in CSV format.

Creating components within your application code

This section describes how to create a pipeline component inside yourPython application, as part of the application. The DSL code for creating acomponent therefore runs inside your Docker container.

Below is a more detailed explanation of the above diagram:

Write your code in a Python function. For example, write code to transformdata or train a model:

def my_python_func(a: str, b: str) -> str:
  ...

Use the kfp.dsl.python_componentdecoratorto convert your Python function intoa pipeline component. To use the decorator, you can add the@kfp.dsl.python_component annotation to your function:

@kfp.dsl.python_component(
  name='My awesome component',
  description='Come and play',
)
def my_python_func(a: str, b: str) -> str:
  ...

Usekfp.compiler.build_python_componentto create a container image for the component.

my_op = compiler.build_python_component(
  component_func=my_python_func,
  staging_gcs_path=OUTPUT_DIR,
  target_image=TARGET_IMAGE)

Write a pipeline function using the Kubeflow Pipelines DSL to define thepipeline and include all the pipeline components. Use the kfp.dsl.pipelinedecoratorto build a pipeline from your pipeline function, by addingthe @kfp.dsl.pipeline annotation to your pipeline function:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(param_1: PipelineParam, param_2: PipelineParam):
  my_step = my_op(a='a', b='b')

Compile the pipeline to generate a compressed YAML definition of thepipeline. The Kubeflow Pipelines service converts the static configurationinto a set of Kubernetes resources for execution.

To compile the pipeline, you can choose one of the followingoptions:

Use thekfp.compiler.Compiler.compilemethod:

kfp.compiler.Compiler().compile(my_pipeline,  
  'my-pipeline.zip')

Alternatively, use the dsl-compile command on the command line.

dsl-compile --py [path/to/python/file] --output my-pipeline.zip

Use the Kubeflow Pipelines SDK to run the pipeline:

client = kfp.Client()
my_experiment = client.create_experiment(name='demo')
my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 
  'my-pipeline.zip')

You can also choose to share your pipeline as follows:

Upload the pipeline zip file to the Kubeflow Pipelines UI. For moreinformation about the UI, see the Kubeflow Pipelines quickstartguide.
Upload the pipeline zip file to a shared repository. See thereusable components and other shared resources.

More about the above workflow

For an example of the above workflow, see theJupyter notebook titled KubeFlow Pipeline Using TFX OSSComponents on GitHub.

Creating lightweight components

This section describes how to create lightweight Python components that do notrequire you to build a container image. Lightweight components simplifyprototyping and rapid development, especially in a Jupyter notebook environment.

Below is a more detailed explanation of the above diagram:

Write your code in a Python function. For example, write code to transformdata or train a model:

def my_python_func(a: str, b: str) -> str:
  ...

Usekfp.components.func_to_container_opto convert your Python function into a pipeline component:

my_op = kfp.components.func_to_container_op(my_python_func)

Optionally, you can write the component to a file that you can share or usein another pipeline:

my_op = kfp.components.func_to_container_op(my_python_func, 
  output_component_file='my-op.component')

If you stored your lightweight component in a file as described in theprevious step, usekfp.components.load_component_from_fileto load the component:

my_op = kfp.components.load_component_from_file('my-op.component')

Write a pipeline function using the Kubeflow Pipelines DSL to define thepipeline and include all the pipeline components. Use the kfp.dsl.pipelinedecoratorto build a pipeline from your pipeline function, by addingthe @kfp.dsl.pipeline annotation to your pipeline function:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(param_1: PipelineParam, param_2: PipelineParam):
  my_step = my_op(a='a', b='b')

Compile the pipeline to generate a compressed YAML definition of thepipeline. The Kubeflow Pipelines service converts the static configurationinto a set of Kubernetes resources for execution.

To compile the pipeline, you can choose one of the followingoptions:

Use thekfp.compiler.Compiler.compilemethod:

kfp.compiler.Compiler().compile(my_pipeline,  
  'my-pipeline.zip')

Alternatively, use the dsl-compile command on the command line.

dsl-compile --py [path/to/python/file] --output my-pipeline.zip

Use the Kubeflow Pipelines SDK to run the pipeline:

client = kfp.Client()
my_experiment = client.create_experiment(name='demo')
my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 
  'my-pipeline.zip')

More about the above workflow

For more detailed instructions, see the guide to building lightweightcomponents.

For an example, see the Lightweight Python components -basicsnotebook on GitHub.

Using prebuilt, reusable components in your pipeline

A reusable component is one that someone has built and made available for othersto use. To use the component in your pipeline, you need the YAML file thatdefines the component.

Below is a more detailed explanation of the above diagram:

Find the YAML file that defines the reusable component. For example, take alook at the reusable components and other sharedresources.
Usekfp.components.load_component_from_urlto load the component:

my_op = kfp.components.load_component_from_url('https://path/to/component.yaml')

Write a pipeline function using the Kubeflow Pipelines DSL to define thepipeline and include all the pipeline components. Use the kfp.dsl.pipelinedecoratorto build a pipeline from your pipeline function, by addingthe @kfp.dsl.pipeline annotation to your pipeline function:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(param_1: PipelineParam, param_2: PipelineParam):
  my_step = my_op(a='a', b='b')

Compile the pipeline to generate a compressed YAML definition of thepipeline. The Kubeflow Pipelines service converts the static configurationinto a set of Kubernetes resources for execution.

To compile the pipeline, you can choose one of the followingoptions:

Use thekfp.compiler.Compiler.compilemethod:

kfp.compiler.Compiler().compile(my_pipeline,  
  'my-pipeline.zip')

Alternatively, use the dsl-compile command on the command line.

dsl-compile --py [path/to/python/file] --output my-pipeline.zip

Use the Kubeflow Pipelines SDK to run the pipeline:

client = kfp.Client()
my_experiment = client.create_experiment(name='demo')
my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 
  'my-pipeline.zip')

More about the above workflow

For an example, see thexgboost-training-cm.pypipeline sample on GitHub. The pipeline creates an XGBoost model usingstructured data in CSV format.

Next steps

Use pipeline parameters to pass data between components.
Learn how to write recursive functions in theDSL.
Build a reusable component forsharing in multiple pipelines.
Find out how to use the DSL to manipulate Kubernetes resources dynamicallyas steps of your pipeline.