Golang Based Operator Tutorial

This guide walks through an example of building a simple memcached-operator using the operator-sdk CLI tool and controller-runtime library API.

NOTE: For the SDK versions prior to v0.19.0 please consult the legacy docs for the legacy CLI and project.

Prerequisites

Create a new project

Use the CLI to create a new memcached-operator project:

  1. $ mkdir -p $HOME/projects/memcached-operator
  2. $ cd $HOME/projects/memcached-operator
  3. # we'll use a domain of example.com
  4. # so all API groups will be <group>.example.com
  5. $ operator-sdk init --domain=example.com --repo=github.com/example-inc/memcached-operator

To learn about the project directory structure, see Kubebuilder project layout doc.

A note on dependency management

operator-sdk init generates a go.mod file to be used with Go modules. The --repo=<path> flag is required when creating a project outside of $GOPATH/src, as scaffolded files require a valid module path. Ensure you activate module support by running export GO111MODULE=on before using the SDK.

Manager

The main program for the operator main.go initializes and runs the Manager.

See the Kubebuilder entrypoint doc for more details on how the manager registers the Scheme for the custom resource API defintions, and sets up and runs controllers and webhooks.

The Manager can restrict the namespace that all controllers will watch for resources:

  1. mgr, err := ctrl.NewManager(cfg, manager.Options{Namespace: namespace})

By default this will be the namespace that the operator is running in. To watch all namespaces leave the namespace option empty:

  1. mgr, err := ctrl.NewManager(cfg, manager.Options{Namespace: ""})

It is also possible to use the MultiNamespacedCacheBuilder to watch a specific set of namespaces:

  1. var namespaces []string // List of Namespaces
  2. // Create a new Cmd to provide shared dependencies and start components
  3. mgr, err := ctrl.NewManager(cfg, manager.Options{
  4. NewCache: cache.MultiNamespacedCacheBuilder(namespaces),
  5. })

Operator scope

Read the operator scope documentation on how to run your operator as namespace-scoped vs cluster-scoped.

Multi-Group APIs

Before creating an API and controller, consider if your operator’s API requires multiple groups. If yes then add the line multigroup: true in the PROJECT file which should look like the following:

  1. domain: example.com
  2. layout: go.kubebuilder.io/v2
  3. multigroup: true
  4. ...

For multi-group projects, the API Go type files are created under apis/<group>/<version>/ and the controllers under controllers/<group>/.

This guide will cover the default case of a single group API.

Create a new API and Controller

Create a new Custom Resource Definition(CRD) API with group cache version v1alpha1 and Kind Memcached. When prompted, enter yes y for creating both the resource and controller.

  1. $ operator-sdk create api --group=cache --version=v1alpha1 --kind=Memcached
  2. Create Resource [y/n]
  3. y
  4. Create Controller [y/n]
  5. y
  6. Writing scaffold for you to edit...
  7. api/v1alpha1/memcached_types.go
  8. controllers/memcached_controller.go
  9. ...

This will scaffold the Memcached resource API at api/v1alpha1/memcached_types.go and the controller at controllers/memcached_controller.go.

See the API terminology doc for details on the CRD API conventions.

To understand the API Go types and controller scaffolding see the Kubebuilder api doc and controller doc.

Define the API

Define the API for the Memcached Custom Resource(CR) by modifying the Go type definitions at api/v1alpha1/memcached_types.go to have the following spec and status:

  1. // MemcachedSpec defines the desired state of Memcached
  2. type MemcachedSpec struct {
  3. // +kubebuilder:validation:Minimum=0
  4. // Size is the size of the memcached deployment
  5. Size int32 `json:"size"`
  6. }
  7. // MemcachedStatus defines the observed state of Memcached
  8. type MemcachedStatus struct {
  9. // Nodes are the names of the memcached pods
  10. Nodes []string `json:"nodes"`
  11. }

Add the +kubebuilder:subresource:status marker to add a status subresource to the CRD manifest so that the controller can update the CR status without changing the rest of the CR object:

  1. // Memcached is the Schema for the memcacheds API
  2. // +kubebuilder:subresource:status
  3. type Memcached struct {
  4. metav1.TypeMeta `json:",inline"`
  5. metav1.ObjectMeta `json:"metadata,omitempty"`
  6. Spec MemcachedSpec `json:"spec,omitempty"`
  7. Status MemcachedStatus `json:"status,omitempty"`
  8. }

After modifying the *_types.go file always run the following command to update the generated code for that resource type:

  1. $ make generate

The above makefile target will invoke the controller-gen utility to update the api/v1alpha1/zz_generated.deepcopy.go file to ensure our API’s Go type definitons implement the runtime.Object interface that all Kind types must implement.

Generating CRD manifests

Once the API is defined with spec/status fields and CRD validation markers, the CRD manifests can be generated and updated with the following command:

  1. $ make manifests

This makefile target will invoke controller-gen to generate the CRD manifests at config/crd/bases/cache.example.com_memcacheds.yaml.

OpenAPI validation

OpenAPIv3 schemas are added to CRD manifests in the spec.validation block when the manifests are generated. This validation block allows Kubernetes to validate the properties in a Memcached Custom Resource when it is created or updated.

Markers (annotations) are available to configure validations for your API. These markers will always have a +kubebuilder:validation prefix.

Usage of markers in API code is discussed in the kubebuilder CRD generation and marker documentation. A full list of OpenAPIv3 validation markers can be found here.

To learn more about OpenAPI v3.0 validation schemas in CRDs, refer to the Kubernetes Documentation.

Implement the Controller

For this example replace the generated controller file controllers/memcached_controller.go with the example memcached_controller.go implementation.

The example controller executes the following reconciliation logic for each Memcached CR:

  • Create a memcached Deployment if it doesn’t exist
  • Ensure that the Deployment size is the same as specified by the Memcached CR spec
  • Update the Memcached CR status using the status writer with the names of the memcached pods

The next two subsections explain how the controller watches resources and how the reconcile loop is triggered. Skip to the Build section to see how to build and run the operator.

Resources watched by the Controller

The SetupWithManager() function in controllers/memcached_controller.go specifies how the controller is built to watch a CR and other resources that are owned and managed by that controller.

  1. func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {
  2. return ctrl.NewControllerManagedBy(mgr).
  3. For(&cachev1alpha1.Memcached{}).
  4. Owns(&appsv1.Deployment{}).
  5. Complete(r)
  6. }

The NewControllerManagedBy() provides a controller builder that allows various controller configurations.

For(&cachev1alpha1.Memcached{}) specifies the Memcached type as the primary resource to watch. For each Memcached type Add/Update/Delete event the reconcile loop will be sent a reconcile Request (a namespace/name key) for that Memcached object.

Owns(&appsv1.Deployment{}) specifies the Deployments type as the secondary resource to watch. For each Deployment type Add/Update/Delete event, the event handler will map each event to a reconcile Request for the owner of the Deployment. Which in this case is the Memcached object for which the Deployment was created.

Controller Configurations

There are a number of other useful configurations that can be made when initialzing a controller. For more details on these configurations consult the upstream builder and controller godocs.

  • Set the max number of concurrent Reconciles for the controller via the MaxConcurrentReconciles option. Defaults to 1.

    1. func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {
    2. return ctrl.NewControllerManagedBy(mgr).
    3. For(&cachev1alpha1.Memcached{}).
    4. Owns(&appsv1.Deployment{}).
    5. WithOptions(controller.Options{
    6. MaxConcurrentReconciles: 2,
    7. }).
    8. Complete(r)
    9. }
  • Filter watch events using predicates

  • Choose the type of EventHandler to change how a watch event will translate to reconcile requests for the reconcile loop. For operator relationships that are more complex than primary and secondary resources, the EnqueueRequestsFromMapFunc handler can be used to transform a watch event into an arbitrary set of reconcile requests.

Reconcile loop

Every Controller has a Reconciler object with a Reconcile() method that implements the reconcile loop. The reconcile loop is passed the Request argument which is a Namespace/Name key used to lookup the primary resource object, Memcached, from the cache:

  1. import (
  2. ctrl "sigs.k8s.io/controller-runtime"
  3. cachev1alpha1 "github.com/example-inc/memcached-operator/api/v1alpha1"
  4. ...
  5. )
  6. func (r *MemcachedReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
  7. // Lookup the Memcached instance for this reconcile request
  8. memcached := &cachev1alpha1.Memcached{}
  9. err := r.Get(ctx, req.NamespacedName, memcached)
  10. ...
  11. }

Based on the return values, Result and error, the Request may be requeued and the reconcile loop may be triggered again:

  1. // Reconcile successful - don't requeue
  2. return ctrl.Result{}, nil
  3. // Reconcile failed due to error - requeue
  4. return ctrl.Result{}, err
  5. // Requeue for any reason other than an error
  6. return ctrl.Result{Requeue: true}, nil

You can set the Result.RequeueAfter to requeue the Request after a grace period as well:

  1. import "time"
  2. // Reconcile for any reason other than an error after 5 seconds
  3. return ctrl.Result{RequeueAfter: time.Second*5}, nil

Note: Returning Result with RequeueAfter set is how you can periodically reconcile a CR.

For a guide on Reconcilers, Clients, and interacting with resource Events, see the Client API doc.

Specify permissions and generate RBAC manifests

The controller needs certain RBAC permissions to interact with the resources it manages. These are specified via [RBAC markers][rbac_markers] like the following:

  1. // +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds,verbs=get;list;watch;create;update;patch;delete
  2. // +kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/status,verbs=get;update;patch
  3. // +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
  4. // +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;
  5. func (r *MemcachedReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {

The ClusterRole manifest at config/rbac/role.yaml is generated from the above markers via controller-gen with the following command:

  1. $ make manifests

Build and run the operator

Before running the operator, the CRD must be registered with the Kubernetes apiserver:

  1. $ make install

Once this is done, there are two ways to run the operator:

  • As Go program outside a cluster
  • As a Deployment inside a Kubernetes cluster

Configuring your test environment

Projects are scaffolded with unit tests that utilize the envtest library, which requires certain Kubernetes server binaries be present locally. Installation instructions can be found here.

1. Run locally outside the cluster

To run the operator locally execute the following command:

  1. $ make run ENABLE_WEBHOOKS=false

2. Run as a Deployment inside the cluster

Build and push the image

Before building the operator image, ensure the generated Dockerfile references the base image you want. You can change the default “runner” image gcr.io/distroless/static:nonroot by replacing its tag with another, for example alpine:latest, and removing the USER: nonroot:nonroot directive.

To build and push the operator image, use the following make commands. Make sure to modify the IMG arg in the example below to reference a container repository that you have access to. You can obtain an account for storing containers at repository sites such quay.io or hub.docker.com. This example uses quay.

Build the image:

  1. $ export USERNAME=<quay-username>
  2. $ make docker-build IMG=quay.io/$USERNAME/memcached-operator:v0.0.1

Push the image to a repository:

  1. $ make docker-push IMG=quay.io/$USERNAME/memcached-operator:v0.0.1

Note: The name and tag of the image (IMG=<some-registry>/<project-name>:tag) in both the commands can also be set in the Makefile. Modify the line which has IMG ?= controller:latest to set your desired default image name.

Deploy the operator

For this example we will run the operator in the default namespace which can be specified for all resources in config/default/kustomization.yaml:

  1. $ cd config/default/ && kustomize edit set namespace "default" && cd ../..

Run the following to deploy the operator. This will also install the RBAC manifests from config/rbac.

  1. $ make deploy IMG=quay.io/$USERNAME/memcached-operator:v0.0.1

NOTE If you have enabled webhooks in your deployments, you will need to have cert-manager already installed in the cluster or make deploy will fail when creating the cert-manager resources.

Verify that the memcached-operator is up and running:

  1. $ kubectl get deployment
  2. NAME READY UP-TO-DATE AVAILABLE AGE
  3. memcached-operator-controller-manager 1/1 1 1 8m

3. Deploy your Operator with the Operator Lifecycle Manager (OLM)

OLM will manage creation of most if not all resources required to run your operator, using a bit of setup from other operator-sdk commands. Check out the docs for more information.

Create a Memcached CR

Update the sample Memcached CR manifest at config/samples/cache_v1alpha1_memcached.yaml and define the spec as the following:

  1. apiVersion: cache.example.com/v1alpha1
  2. kind: Memcached
  3. metadata:
  4. name: memcached-sample
  5. spec:
  6. size: 3

Create the CR:

  1. $ kubectl apply -f config/samples/cache_v1alpha1_memcached.yaml

Ensure that the memcached operator creates the deployment for the sample CR with the correct size:

  1. $ kubectl get deployment
  2. NAME READY UP-TO-DATE AVAILABLE AGE
  3. memcached-operator-controller-manager 1/1 1 1 8m
  4. memcached-sample 3/3 3 3 1m

Check the pods and CR status to confirm the status is updated with the memcached pod names:

  1. $ kubectl get pods
  2. NAME READY STATUS RESTARTS AGE
  3. memcached-sample-6fd7c98d8-7dqdr 1/1 Running 0 1m
  4. memcached-sample-6fd7c98d8-g5k7v 1/1 Running 0 1m
  5. memcached-sample-6fd7c98d8-m7vn7 1/1 Running 0 1m
  1. $ kubectl get memcached/memcached-sample -o yaml
  2. apiVersion: cache.example.com/v1alpha1
  3. kind: Memcached
  4. metadata:
  5. clusterName: ""
  6. creationTimestamp: 2018-03-31T22:51:08Z
  7. generation: 0
  8. name: memcached-sample
  9. namespace: default
  10. resourceVersion: "245453"
  11. selfLink: /apis/cache.example.com/v1alpha1/namespaces/default/memcacheds/memcached-sample
  12. uid: 0026cc97-3536-11e8-bd83-0800274106a1
  13. spec:
  14. size: 3
  15. status:
  16. nodes:
  17. - memcached-sample-6fd7c98d8-7dqdr
  18. - memcached-sample-6fd7c98d8-g5k7v
  19. - memcached-sample-6fd7c98d8-m7vn7

Update the size

Update config/samples/cache_v1alpha1_memcached.yaml to change the spec.size field in the Memcached CR from 3 to 5:

  1. $ kubectl patch memcached memcached-sample -p '{"spec":{"size": 5}}' --type=merge

Confirm that the operator changes the deployment size:

  1. $ kubectl get deployment
  2. NAME READY UP-TO-DATE AVAILABLE AGE
  3. memcached-operator-controller-manager 1/1 1 1 10m
  4. memcached-sample 5/5 5 5 3m

Cleanup

  1. $ kubectl delete -f config/samples/cache_v1alpha1_memcached.yaml
  2. $ kubectl delete deployments,service -l control-plane=controller-manager
  3. $ kubectl delete role,rolebinding --all

Further steps

The following guides build off the operator created in this example, adding advanced features:

Also see the advanced topics doc for more use cases and under the hood details.

Last modified August 5, 2020: update docs to reflect call to NewManager (#3651) (c9348b5e)