Component
Conceptual overview of components in Kubeflow Pipelines
A pipeline component is self-contained set of code that performs one step inthe ML workflow (pipeline), such as data preprocessing, data transformation,model training, and so on. A component is analogous to a function, in that ithas a name, parameters, return values, and a body.
Component code
The code for each component includes the following:
Client code: The code that talks to endpoints to submit jobs. For example,code to talk to the Google Dataproc API to submit a Spark job.
Runtime code: The code that does the actual job and usually runs in thecluster. For example, Spark code that transforms raw data into preprocesseddata.
Note the naming convention for client code and runtime code—for a tasknamed “mytask”:
- The
mytask.py
program contains the client code. - The
mytask
directory contains all the runtime code.
Component definition
A component specification in YAML format describes the component for theKubeflow Pipelines system. A component definition has the following parts:
- Metadata: name, description, etc.
- Interface: input/output specifications (name, type, description, defaultvalue, etc).
- Implementation: A specification of how to run the component given aset of argument values for the component’s inputs. The implementation sectionalso describes how to get the output values from the component once thecomponent has finished running.
For the complete definition of a component, see thecomponent specification.
Containerizing components
You must package your component as aDocker image. Components represent aspecific program or entry point inside a container.
Each component in a pipeline executes independently. The components do not runin the same process and cannot directly share in-memory data. You must serialize(to strings or files) all the data pieces that you pass between the componentsso that the data can travel over the distributed network. You must thendeserialize the data for use in the downstream component.
Next steps
- Read an overview of Kubeflow Pipelines.
- Follow the pipelines quickstart guideto deploy Kubeflow and run a sample pipeline directly from the KubeflowPipelines UI.
- Build your owncomponent and pipeline.
- Build a reusable component forsharing in multiple pipelines.