Manipulate Kubernetes Resources as Part of a Pipeline

Overview of using the SDK to manipulate Kubernetes resources dynamically as steps of the pipeline

Out of date

This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.

This page describes how to manipulate Kubernetes resources through individual Kubeflow Pipelines components during a pipeline. Users may handle any Kubernetes resource, while creating Persistent Volume Claims and Volume Snapshots is rendered easy in the common case.

Kubernetes Resources

ResourceOp

This class represents a step of the pipeline which manipulates Kubernetes resources. It implements Argo’s resource template.

This feature allows users to perform some action (get, create, apply, delete, replace, patch) on Kubernetes resources. Users are able to set conditions that denote the success or failure of the step undertaking that action.

Link to the corresponding Python library.

Arguments

Only most significant arguments are presented in this section. For more information, please refer to the aforementioned link to the library.

  • k8s_resource: Definition of the Kubernetes resource. (required)
  • action: Action to be performed (defaults to create).
  • merge_strategy: Merge strategy when action is patch. (optional)
  • success_condition: Condition to denote success of the step once it is true. (optional)
  • failure_condition: Condition to denote failure of the step once it is true. (optional)
  • attribute_outputs: Similar to file_outputs of kfp.dsl.ContainerOp. Maps output parameter names to JSON paths in the Kubernetes object. More on that in the following section. (optional)

Outputs

ResourceOps can produce output parameters. They can output field values of the resource which is being manipulated. For example:

  1. job = kubernetes_client.V1Job(...)
  2. rop = kfp.dsl.ResourceOp(
  3. name="create-job",
  4. k8s_resource=job,
  5. action="create",
  6. attribute_outputs={"name": "{.metadata.name}"}
  7. )

By default, ResourceOps output the resource’s name as well as the whole resource specification.

Samples

For better understanding, please refer to the following samples: 1


Persistent Volume Claims (PVCs)

Request the creation of PVC instances simple and fast.

VolumeOp

A ResourceOp specialized in PVC creation.

Link to the corresponding Python library.

Arguments

The following arguments are an extension to ResourceOp arguments. If a k8s_resource is passed, then none of the following should be provided.

  • resource_name: The name of the resource which will be created. This string will be prepended with the workflow name. This may contain PipelineParams. (required)
  • size: The requested size for the PVC. This may contain PipelineParams. (required)
  • storage_class: The storage class to be used. This may contain PipelineParams. (optional)
  • modes: The accessModes of the PVC (defaults to RWM). Check this documentation for further information. The user may find the following modes built-in:
    • VOLUME_MODE_RWO: ["ReadWriteOnce"]
    • VOLUME_MODE_RWM: ["ReadWriteMany"]
    • VOLUME_MODE_ROM: ["ReadOnlyMany"]
  • annotations: Annotations to be patched in the PVC. These may contain PipelineParams. (optional)
  • data_source: It is used to create a PVC from a VolumeSnapshot. It can be either a string or a V1TypedLocalObjectReference, and may contain PipelineParams. (Alpha feature, optional)

Outputs

Additionally to the whole specification of the resource and its name (ResourceOp defaults), a VolumeOp also outputs the storage size of the bounded Persistent Volume (as step.outputs["size"]). However, this may be empty if the storage provisioner has a WaitForFirstConsumer binding mode. This value, if not empty, is always greater than or equal to the requested size.

Useful information

  1. VolumeOp steps have a .volume attribute which is a PipelineVolume referencing the created PVC. More information on Pipeline Volumes in the following section.
  2. A ContainerOp has a pvolumes argument in its constructor. This is a dictionary with mount paths as keys and volumes as values and functions similarly to file_outputs (which can then be used as op.outputs["key"] or op.output). For example:
  1. vop = dsl.VolumeOp(
  2. name="volume_creation",
  3. resource_name="mypvc",
  4. size="1Gi"
  5. )
  6. step1 = dsl.ContainerOp(
  7. name="step1",
  8. ...
  9. pvolumes={"/mnt": vop.volume} # Implies execution after vop
  10. )
  11. step2 = dsl.ContainerOp(
  12. name="step2",
  13. ...
  14. pvolumes={"/data": step1.pvolume, # Implies execution after step1
  15. "/mnt": dsl.PipelineVolume(pvc="existing-pvc")}
  16. )
  17. step3 = dsl.ContainerOp(
  18. name="step3",
  19. ...
  20. pvolumes={"/common": step2.pvolumes["/mnt"]} # Implies execution after step2
  21. )

PipelineVolume

Reference Kubernetes volumes easily, mount them and express dependencies through them.

A PipelineVolume is essentially a Kubernetes Volume(*) carrying dependencies, supplemented with an .after() method extending them. Those dependencies can then be parsed properly by a ContainerOp, when consumed in pvolumes argument or add_pvolumes() method, to extend the dependencies of that step.

Link to the corresponding Python library.

(\) Inherits from V1Volume class of Kubernetes Python client.*

Arguments

PipelineVolume constructor accepts all arguments V1Volume constructor does. However, name can be omitted and a pseudo-random name for that volume is generated instead.

Extra arguments:

  • pvc: Name of an existing PVC to be referenced by this PipelineVolume. This value can be a PipelineParam.
  • volume: Initialize a new PipelineVolume instance from an existing V1Volume, or its inherited types (e.g. PipelineVolume).

Samples

For better understanding, please refer to the following samples: 1, 2, 3, 4


Volume Snapshots

Request the creation of Volume Snapshot instances simple and fast.

VolumeSnapshotOp

A ResourceOp specialized in Volume Snapshot creation.

Link to the corresponding Python library.

NOTE: You should check if your Kubernetes cluster admin has Volume Snapshots enabled in your cluster.

Arguments

The following arguments are an extension to the ResourceOp arguments. If a k8s_resource is passed, then none of the following may be provided.

  • resource_name: The name of the resource which will be created. This string will be prepended with the workflow name. This may contain PipelineParams. (required)
  • pvc: The name of the PVC to be snapshotted. This may contain PipelineParams. (optional)
  • snapshot_class: The snapshot storage class to be used. This may contain PipelineParams. (optional)
  • volume: An instance of a V1Volume, or its inherited type (e.g. PipelineVolume). This may contain PipelineParams. (optional)
  • annotations: Annotations to be patched in the VolumeSnapshot. These may contain PipelineParams. (optional)

NOTE: One of the pvc or volume needs to be provided.

Outputs

Additionally to the whole specification of the resource and its name (ResourceOp defaults), a VolumeSnapshotOp also outputs the restoreSize of the bounded VolumeSnapshot (as step.outputs["size"]). This is the minimum size for a PVC clone of that snapshot.

Useful information

VolumeSnapshotOp steps have a .snapshot attribute which is a V1TypedLocalObjectReference. This can be passed as a data_source to create a PVC out of that VolumeSnapshot. The user may otherwise use the step.outputs["name"] as data_source.

Samples

For better understanding, please refer to the following samples: 1, 2

Next steps

Last modified 03.03.2021: Move Kubeflow Pipelines under /components (#2505) (c34470b8)