Introduction
Kubeflow Operator introduction
This guide describes the Kubeflow Operator and the current supported releases of Kubeflow Operator.
Kubeflow Operator
Kubeflow Operator helps deploy, monitor and manage the lifecycle of Kubeflow. Built using the Operator Framework which offers an open source toolkit to build, test, package operators and manage the lifecycle of operators.
The operator is currently in incubation phase and is based on this design doc. It is built on top of KfDef CR, and uses kfctl as the nucleus for Controller. Current roadmap for this Operator is listed here. The Operator is also published on OperatorHub.
Applications and components to be deployed as part of Kubeflow platform are defined in the KfDef configuration manifest. Each application has a kustomize configuration with all its resource manifests. KfDef spec
includes the applications
field. Application are specified in the kustomizeConfig
field. parameters
and overlays
may be used to provide custom setting for the application. repoRef
field specifies the path to retrieve the application’s kustomize configuration.
KfDef spec
may also include a plugins
field for certain cloud platforms, including AWS and GCP. It is used by the platforms to preprocess certain tasks before Kubeflow deployment.
An example of KfDef is as follow:
apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
namespace: kubeflow
spec:
applications:
# Install Istio
- kustomizeConfig:
repoRef:
name: manifests
path: stacks/ibm/application/istio-stack
name: istio-stack
# Install Kubeflow applications.
- kustomizeConfig:
repoRef:
name: manifests
path: stacks/ibm
name: kubeflow-apps
# Other applications
- kustomizeConfig:
repoRef:
name: manifests
path: stacks/ibm/application/spark-operator
name: spark-operator
# Model Serving applications
- kustomizeConfig:
repoRef:
name: manifests
path: knative/installs/generic
name: knative
- kustomizeConfig:
repoRef:
name: manifests
path: kfserving/installs/generic
name: kfserving
repos:
- name: manifests
uri: https://github.com/kubeflow/manifests/archive/master.tar.gz
version: master
More KfDef examples may be found in Kubeflow manifests repo. Users can pick one there and make some modification to fit their requirements. OpenDataHub project also maintains a KfDef manifest for Kubeflow deployment on OpenShift Container Platforms.
The operator watches on all KfDef configuration instances in the cluster as custom resources (CR) and manage them. It handles reconcile requests to all the KfDef instances. To understand more on the operator controller behavior, refer to this controller-runtime link.
Kubeflow Operator shares the same packages and functions as the kfctl
CLI, which is the command line approach to deploy Kubeflow. Therefore, the deployment flow is similar except that the ownerReferences
metadata is added for each application’s Kubernetes object. The KfDef CR is the parent of all these objects. Kubeflow Operator does better in tearing down the Kubeflow deployment than the CLI approach. When the KfDef CR is deleted, Kubernetes garbage collection mechanism then takes over the responsibility to remove all and only the resources deployed through this KfDef configuration.
One of the many good reasons to use an operator is to monitor the resources. The Kubeflow Operator also watches all child resources of the KfDef CR. Should any of these resources be deleted, the operator would try to apply the resource manifest and bring the object up again.
The operator responds to following events:
When a KfDef instance is created or updated, the operator’s reconciler will be notified of the event and invoke the
Apply
functions provided by thekfctl
package to deploy Kubeflow. The Kubeflow resources specified with the manifests will be owned by the KfDef instance with theirownerReferences
set.When a KfDef instance is deleted, since the owner is deleted, all the secondary resources owned by it will be deleted through the garbage colleciton. In the mean time, the reconciler will be notified of the event and remove the finalizers.
When any resource deployed as part of a KfDef instance is deleted, the operator’s reconciler will be notified of the event and invoke the
Apply
functions provided by thekfctl
package to re-deploy the Kubeflow. The deleted resource will be recreated with the same manifest as specified when the KfDef instance is created.
Deploying Kubeflow with the Kubeflow Operator includes two steps: installing the Kubeflow Operator followed by deploying the KfDef custom resource.
Current Tested Operators and Pre-built Images
Kubeflow Operator controller logic is based on the kfctl
package, so for each major release of kfctl
, an operator image is built and tested with that version of manifests
to deploy a KfDef instance. Following table shows what releases have been tested.
Note: if building a customized operator for a specific version of Kubeflow is desired, you can run
git checkout
to that specific branch tag. Keep in mind to use the matching version of manifests.Last modified 11.08.2020: Update some IBM Cloud related docs for Kubeflow 1.1 release (#2110) (ecaa3e9a)