Helm Operator Tutorial

An in-depth walkthough of building and running a Helm-based operator.

NOTE: If your project was created with an operator-sdk version prior to v1.0.0 please migrate, or consult the legacy docs.

Prerequisites

  • Go through the installation guide.
  • User authorized with cluster-admin permissions.
  • An accessible image registry for various operator images (ex. hub.docker.com, quay.io) and be logged in in your command line environment.
    • example.com is used as the registry Docker Hub namespace in these examples. Replace it with another value if using a different registry or namespace.
    • Authentication and certificates if the registry is private or uses a custom CA.

Overview

We will create a sample project to let you know how it works and this sample will:

  • Create a Nginx Deployment if it doesn’t exist
  • Ensure that the Deployment size is the same as specified by the Nginx CR spec
  • Update the Nginx CR status using the status writer with the names of the CR’s pods

Create a new project

Use the CLI to create a new Helm-based nginx-operator project:

  1. mkdir nginx-operator
  2. cd nginx-operator
  3. operator-sdk init --plugins helm --domain example.com --group demo --version v1alpha1 --kind Nginx

This creates the nginx-operator project specifically for watching the Nginx resource with APIVersion demo.example.com/v1alpha1 and Kind Nginx.

For Helm-based projects, operator-sdk init also generates the RBAC rules in config/rbac/role.yaml based on the resources that would be deployed by the chart’s default manifest. Be sure to double check that the rules generated in config/rbac/role.yaml meet the operator’s permission requirements.

To learn more about the project directory structure, see the project layout doc.

Use an existing chart

Instead of creating your project with a boilerplate Helm chart, you can also use --helm-chart, --helm-chart-repo, and --helm-chart-version to use an existing chart, either from your local filesystem or a remote chart repository.

If --helm-chart is specified, the --group, --version, and --kind flags become optional. If left unset, the default will be:

FlagValue
domainmy.domain
groupcharts
kinddeduce from the specified chart
versionv1alpha1

If --helm-chart is a local chart archive (e.g example-chart-1.2.0.tgz) or directory, it will be validated and unpacked or copied into the project.

Otherwise, the SDK will attempt to fetch the specified helm chart from a remote repository.

If a custom repository URL is not specified by --helm-chart-repo, the following chart reference formats are supported:

  • <repoName>/<chartName>: Fetch the helm chart named chartName from the helm chart repository named repoName, as specified in the $HELM_HOME/repositories/repositories.yaml file. Use helm repo add to configure this file.

  • <url>: Fetch the helm chart archive at the specified URL.

If a custom repository URL is specified by --helm-chart-repo, the only supported format for --helm-chart is:

  • <chartName>: Fetch the helm chart named chartName in the helm chart repository specified by the --helm-chart-repo URL.

If --helm-chart-version is not set, the SDK will fetch the latest available version of the helm chart. Otherwise, it will fetch the specified version. The option --helm-chart-version is not used when --helm-chart itself refers to a specific version, for example when it is a local path or a URL.

Note: For more details and examples run operator-sdk init --plugins helm --help.

Customize the operator logic

For this example the nginx-operator will execute the following reconciliation logic for each Nginx Custom Resource (CR):

  • Create a nginx Deployment if it doesn’t exist
  • Create a nginx Service if it doesn’t exist
  • Create a nginx Ingress if it is enabled and doesn’t exist
  • Ensure that the Deployment, Service, and optional Ingress match the desired configuration (e.g. replica count, image, service type, etc) as specified by the Nginx CR

Watch the Nginx CR

By default, the nginx-operator watches Nginx resource events as shown in watches.yaml and executes Helm releases using the specified chart:

  1. # Use the 'create api' subcommand to add watches to this file.
  2. - group: demo
  3. version: v1alpha1
  4. kind: Nginx
  5. chart: helm-charts/nginx
  6. #+kubebuilder:scaffold:watch

Reviewing the Nginx Helm Chart

When a Helm operator project is created, the SDK creates an example Helm chart that contains a set of templates for a simple Nginx release.

For this example, we have templates for deployment, service, and ingress resources, along with a NOTES.txt template, which Helm chart developers use to convey helpful information about a release.

If you aren’t already familiar with Helm Charts, take a moment to review the Helm Chart developer documentation.

Understanding the Nginx CR spec

Helm uses a concept called values to provide customizations to a Helm chart’s defaults, which are defined in the Helm chart’s values.yaml file.

Overriding these defaults is as simple as setting the desired values in the CR spec. Let’s use the number of replicas as an example.

First, inspecting helm-charts/nginx/values.yaml, we see that the chart has a value called replicaCount and it is set to 1 by default. If we want to have 2 nginx instances in our deployment, we would need to make sure our CR spec contained replicaCount: 2.

Update config/samples/demo_v1alpha1_nginx.yaml to look like the following:

  1. apiVersion: demo.example.com/v1alpha1
  2. kind: Nginx
  3. metadata:
  4. name: nginx-sample
  5. spec:
  6. replicaCount: 2

Similarly, we see that the default service port is set to 80, but we would like to use 8080, so we’ll again update config/samples/demo_v1alpha1_nginx.yaml by adding the service port override:

  1. apiVersion: demo.example.com/v1alpha1
  2. kind: Nginx
  3. metadata:
  4. name: nginx-sample
  5. spec:
  6. replicaCount: 2
  7. service:
  8. port: 8080

As you may have noticed, the Helm operator simply applies the entire spec as if it was the contents of a values file, just like helm install -f ./overrides.yaml works.

Configure the operator’s image registry

All that remains is to build and push the operator image to the desired image registry. Your Makefile composes image tags either from values written at project initialization or from the CLI. In particular, IMAGE_TAG_BASE lets you define a common image registry, namespace, and partial name for all your image tags. Update this to another registry and/or namespace if the current value is incorrect. Afterwards you can update the IMG variable definition like so:

  1. -IMG ?= controller:latest
  2. +IMG ?= $(IMAGE_TAG_BASE):$(VERSION)

Once done, you do not have to set IMG or any other image variable in the CLI. The following command will build and push an operator image tagged as example.com/nginx-operator:v0.0.1 to Docker Hub:

  1. make docker-build docker-push

Run the operator

There are three ways to run the operator:

1. Run locally outside the cluster

Execute the following command, which install your CRDs and run the manager locally:

  1. make install run

2. Run as a Deployment inside the cluster

By default, a new namespace is created with name <project-name>-system, ex. nginx-operator-system, and will be used for the deployment.

Run the following to deploy the operator. This will also install the RBAC manifests from config/rbac.

  1. make deploy

Verify that the nginx-operator is up and running:

  1. $ kubectl get deployment -n nginx-operator-system
  2. NAME READY UP-TO-DATE AVAILABLE AGE
  3. nginx-operator-controller-manager 1/1 1 1 8m

3. Deploy your Operator with OLM

First, install OLM:

  1. operator-sdk olm install

Bundle your operator, then build and push the bundle image. The bundle target generates a [bundle][doc-bundle] in the bundle directory containing manifests and metadata defining your operator. bundle-build and bundle-push build and push a bundle image defined by bundle.Dockerfile.

  1. make bundle bundle-build bundle-push

Finally, run your bundle. If your bundle image is hosted in a registry that is private and/or has a custom CA, these configuration steps must be complete.

  1. operator-sdk run bundle example.com/nginx-operator-bundle:v0.0.1

Check out the docs for a deep dive into operator-sdk‘s OLM integration.

Create a Nginx CR

Create the nginx CR that we modified earlier:

  1. kubectl apply -f config/samples/demo_v1alpha1_nginx.yaml

Ensure that the nginx-operator creates the deployment for the CR:

  1. $ kubectl get deployment
  2. NAME READY UP-TO-DATE AVAILABLE AGE
  3. nginx-sample 2/2 2 2 2m13s

Check the pods to confirm 2 replicas were created:

  1. $ kubectl get pods
  2. NAME READY STATUS RESTARTS AGE
  3. nginx-sample-c786bfdcf-4g6md 1/1 Running 0 81s
  4. nginx-sample-c786bfdcf-6bhmx 1/1 Running 0 81s

Check that the service port is set to 8080:

  1. $ kubectl get service
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. nginx-sample ClusterIP 10.96.26.3 <none> 8080/TCP 1m

Update the replicaCount and remove the port

Change the spec.replicaCount field from 2 to 3, remove the spec.service field:

  1. $ cat config/samples/demo_v1alpha1_nginx.yaml
  2. apiVersion: demo.example.com/v1alpha1
  3. kind: Nginx
  4. metadata:
  5. name: nginx-sample
  6. spec:
  7. replicaCount: 3

And apply the change:

  1. kubectl apply -f config/samples/demo_v1alpha1_nginx.yaml

Confirm that the operator changes the deployment size:

  1. $ kubectl get deployment
  2. NAME DESIRED CURRENT UP-TO-DATE AGE
  3. nginx-sample 3/3 3 3 7m29s

Check that the service port is set to the default (80):

  1. $ kubectl get service
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. nginx-sample ClusterIP 10.96.152.76 <none> 80/TCP 7m54s

Troubleshooting

Use the following command to check the operator logs.

  1. kubectl logs deployment.apps/nginx-operator-controller-manager -n nginx-operator-system -c manager

Use the following command to check the CR status and events.

  1. kubectl describe nginxes.demo.example.com

Cleanup

Clean up the resources:

  1. kubectl delete -f config/samples/demo_v1alpha1_nginx.yaml

Note: Make sure the above custom resource has been deleted before proceeding to run make undeploy, as helm-operator’s controller adds finalizers to the custom resources. Otherwise your cluster may have dangling custom resource objects that cannot be deleted.

  1. make undeploy

Next steps

Next, check out the following:

  1. Operator packaging and distribution with OLM.
  2. The advanced features doc for more use cases and under-the-hood details.

Last modified August 11, 2021: WIP (#5135) (a05f9668)