Writing E2E Tests

Using the Operator SDK’s Test Framework to Write E2E Tests

End-to-end tests are essential to ensure that an operator works as intended in real-world scenarios. The Operator SDK includes a testing framework to make writing tests simpler and quicker by removing boilerplate code and providing common test utilities. The Operator SDK includes the test framework as a library under pkg/test and the e2e tests are written as standard go tests.

Components

The test framework includes a few components. The most important to talk about are Framework and Context.

Framework

Framework contains all global variables, such as the kubeconfig, kubeclient, scheme, and dynamic client (provided via the controller-runtime project). It is initialized by the MainEntry function and can be used anywhere in the tests.

Note: several required arguments are initialized and added by MainEntry(). Do not attempt to use testing.M directly.

Context

Context is a local context that stores important information for each test, such as the namespace for that test and the cleanup functions. By handling namespace and resource initialization through Context, we can make sure that all resources are properly handled and removed after the test finishes.

Walkthrough: Writing Tests

In this section, we will be walking through writing the e2e tests of the sample memcached-operator.

Main Test

The first step to writing a test is to create the main_test.go file. The main_test.go file simply calls the test framework’s main entry that sets up the framework and then starts the tests. It should be pretty much identical for all operators. This is what it looks like for the memcached-operator:

  1. package e2e
  2. import (
  3. "testing"
  4. f "github.com/operator-framework/operator-sdk/pkg/test"
  5. )
  6. func TestMain(m *testing.M) {
  7. f.MainEntry(m)
  8. }

Individual Tests

In this section, we will be designing a test based on the memcached_test.go file from the memcached-operator sample.

1. Import the framework

Once MainEntry sets up the framework, it runs the remainder of the tests. First, make sure to import testing, the operator-sdk test framework (pkg/test) as well as your operator’s libraries:

  1. import (
  2. "testing"
  3. cachev1alpha1 "github.com/operator-framework/operator-sdk-samples/go/memcached-operator/pkg/apis/cache/v1alpha1"
  4. "github.com/operator-framework/operator-sdk-samples/go/memcached-operator/pkg/apis"
  5. framework "github.com/operator-framework/operator-sdk/pkg/test"
  6. )

2. Register types with framework scheme

The next step is to register your operator’s scheme with the framework’s dynamic client. To do this, pass the CRD’s AddToScheme function and its List type object to the framework’s AddToFrameworkScheme function. For our example memcached-operator, it looks like this:

  1. memcachedList := &cachev1alpha1.MemcachedList{}
  2. err := framework.AddToFrameworkScheme(apis.AddToScheme, memcachedList)
  3. if err != nil {
  4. t.Fatalf("failed to add custom resource scheme to framework: %v", err)
  5. }

We pass in the CR List object memcachedList as an argument to AddToFrameworkScheme() because the framework needs to ensure that the dynamic client has the REST mappings to query the API server for the CR type. The framework will keep polling the API server for the mappings and timeout after 5 seconds, returning an error if the mappings were not discovered in that time.

3. Setup the test context and resources

The next step is to create a Context for the current test and defer its cleanup function:

  1. ctx := framework.NewContext(t)
  2. defer ctx.Cleanup()

Now that there is a Context, the test’s Kubernetes resources (specifically the test namespace, Service Account, RBAC, and Operator deployment in local testing; just the Operator deployment in cluster testing) can be initialized:

  1. err := ctx.InitializeClusterResources(&framework.CleanupOptions{TestContext: ctx, Timeout: cleanupTimeout, RetryInterval: cleanupRetryInterval})
  2. if err != nil {
  3. t.Fatalf("failed to initialize cluster resources: %v", err)
  4. }

The InitializeClusterResources function uses the custom Create function in the framework client to create the resources provided in your namespaced manifest. The custom Create function use the controller-runtime’s client to create resources and then creates a cleanup function that is called by ctx.Cleanup which deletes the resource and then waits for the resource to be fully deleted before returning. This is configurable with CleanupOptions. For info on how to use CleanupOptions see this section.

If you want to make sure the operator’s deployment is fully ready before moving onto the next part of the test, the WaitForOperatorDeployment function from e2eutil (in the sdk under pkg/test/e2eutil) can be used:

  1. // get namespace
  2. namespace, err := ctx.GetOperatorNamespace()
  3. if err != nil {
  4. t.Fatal(err)
  5. }
  6. // get global framework variables
  7. f := framework.Global
  8. // wait for memcached-operator to be ready
  9. err = e2eutil.WaitForOperatorDeployment(t, f.KubeClient, namespace, "memcached-operator", 1, time.Second*5, time.Second*30)
  10. if err != nil {
  11. t.Fatal(err)
  12. }

4. Write the test specific code

Since the controller-runtime’s dynamic client uses go contexts, make sure to import the go context library. In this example, we imported it as goctx:

How to use the Framework Client Create‘s CleanupOptions

The test framework provides Client, which exposes most of the controller-runtime’s client unmodified, but the Create function has added functionality to create cleanup functions for these resources as well. To manage how cleanup is handled, we use a CleanupOptions struct. Here are some examples of how to use it:

  1. // Create with no cleanup
  2. Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{})
  3. // Create with cleanup but no polling for resources to be deleted
  4. Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{TestContext: ctx})
  5. // Create with cleanup and polling wait for resources to be deleted
  6. Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{TestContext: ctx, Timeout: timeout, RetryInterval: retryInterval})

This is how we can create a custom memcached custom resource with a size of 3:

  1. // create memcached custom resource
  2. exampleMemcached := &cachev1alpha1.Memcached{
  3. ObjectMeta: metav1.ObjectMeta{
  4. Name: "example-memcached",
  5. Namespace: namespace,
  6. },
  7. Spec: cachev1alpha1.MemcachedSpec{
  8. Size: 3,
  9. },
  10. }
  11. err = f.Client.Create(goctx.TODO(), exampleMemcached, &framework.CleanupOptions{TestContext: ctx, Timeout: time.Second * 5, RetryInterval: time.Second * 1})
  12. if err != nil {
  13. return err
  14. }

Now we can check if the operator successfully worked. In the case of the memcached operator, it should have created a deployment called “example-memcached” with 3 replicas. To check, we use the WaitForDeployment function, which is the same as WaitForOperatorDeployment with the exception that WaitForOperatorDeployment will skip waiting for the deployment if the test is run locally and the --up-local flag is set; the WaitForDeployment function always waits for the deployment:

  1. // wait for example-memcached to reach 3 replicas
  2. err = e2eutil.WaitForDeployment(t, f.KubeClient, namespace, "example-memcached", 3, time.Second*5, time.Second*30)
  3. if err != nil {
  4. return err
  5. }

We can also test that the deployment scales correctly when the CR is updated:

  1. err = f.Client.Get(goctx.TODO(), types.NamespacedName{Name: "example-memcached", Namespace: namespace}, exampleMemcached)
  2. if err != nil {
  3. return err
  4. }
  5. exampleMemcached.Spec.Size = 4
  6. err = f.Client.Update(goctx.TODO(), exampleMemcached)
  7. if err != nil {
  8. return err
  9. }
  10. // wait for example-memcached to reach 4 replicas
  11. err = e2eutil.WaitForDeployment(t, f.KubeClient, namespace, "example-memcached", 4, time.Second*5, time.Second*30)
  12. if err != nil {
  13. return err
  14. }

Once the end of the function is reached, the Context’s cleanup functions will automatically be run since they were deferred when the Context was created.

Running the Tests

To make running the tests simpler, the operator-sdk CLI tool has a test subcommand that can configure default test settings, such as locations of your global resource manifest file (by default deploy/crd.yaml) and your namespaced resource manifest file (by default deploy/service_account.yaml concatenated with deploy/rbac.yaml and deploy/operator.yaml), and allows the user to configure runtime options.

To run the tests, run the operator-sdk test local command in your project root and pass the location of the tests as an argument. You can use --help to view the other configuration options and use --go-test-flags to pass in arguments to go test. Here is an example command:

  1. $ operator-sdk test local ./test/e2e --go-test-flags "-v -parallel=2"

Image Flag

If you wish to specify a different operator image than specified in your operator.yaml file (or a user-specified namespaced manifest file), you can use the --image flag:

  1. $ operator-sdk test local ./test/e2e --image quay.io/example/my-operator:v0.0.2

Namespace Flag

If you wish to run all the tests in 1 namespace (which also forces -parallel=1), you can use the --namespace flag:

  1. $ kubectl create namespace operator-test
  2. $ operator-sdk test local ./test/e2e --operator-namespace operator-test

Up-Local Flag

To run the operator itself locally during the tests instead of starting a deployment in the cluster, you can use the --up-local flag. This mode will still create global resources, but by default will not create any in-cluster namespaced resources unless the user specifies one through the --namespaced-manifest flag.

NOTE: The --up-local flag requires the --operator-namespace flag and the command will NOT create the namespace. Then, be sure that you are specifying a valid namespace.

  1. $ kubectl create namespace operator-test
  2. $ operator-sdk test local ./test/e2e --operator-namespace operator-test --up-local

No-Setup Flag

If you would prefer to create the resources yourself and skip resource creation, you can use the --no-setup flag:

  1. $ kubectl create namespace operator-test
  2. $ kubectl create -f deploy/crds/cache.example.com_memcacheds_crd.yaml
  3. $ kubectl create -f deploy/service_account.yaml --namespace operator-test
  4. $ kubectl create -f deploy/role.yaml --namespace operator-test
  5. $ kubectl create -f deploy/role_binding.yaml --namespace operator-test
  6. $ kubectl create -f deploy/operator.yaml --namespace operator-test
  7. $ operator-sdk test local ./test/e2e --operator-namespace operator-test --no-setup

Test Permissions

Executing e2e tests requires the permission to access, create, and delete resources on your cluster. Depending on what kind of Kubernetes cluster you are using, this may require some manual setup. For example, OpenShift users are not created with cluster-admin access by default, so you would have to manually add permissions to access these resources.

The simplest way to accomplish this is to bind the cluster-admin Cluster Role to the Service Account you will run the test under. If you are unable or unwilling to grant such access, a more limited permission set can be created and bound to your Service Account. A good place to start would be the Role bound to your operator itself, such as this role for the memcached operator example. In addition, you might have to create a Cluster Role to allow your tests to create namespaces, like so:

  1. apiVersion: rbac.authorization.k8s.io/v1
  2. kind: ClusterRole
  3. metadata:
  4. name: testuser
  5. rules:
  6. - apiGroups:
  7. - ""
  8. resources:
  9. - namespaces
  10. verbs:
  11. - create
  12. - delete
  13. - get
  14. - list
  15. - watch
  16. - update

Note that this isn’t an exhaustive permission set, and the e2e tests you write might require more or less access.

For more documentation on the operator-sdk test local command, see the SDK CLI Reference doc.

Skip-Cleanup-Error Flag

If the tests encounter an error, it is possible to tell the framework not to delete the resources. This behavior is enabled with the --skip-cleanup-error flag:

  1. $ operator-sdk test local ./test/e2e --skip-cleanup-error

This is useful if after the error happens, you need to inspect the resources that were created in the test or if you have automated scripts that download all the logs from the pods at the end of the test run.

NOTE: The created resources will be deleted if the tests pass.

Running Go Test Directly (Not Recommended)

For advanced use cases, it is possible to run the tests via go test directly. As long as all flags defined in MainEntry are declared, the tests will run correctly. Running the tests directly with missing flags will result in undefined behavior. This is an example go test equivalent to the operator-sdk test local example above:

  1. # Combine service_account, role, role_binding, and operator manifests into namespaced manifest
  2. $ cp deploy/service_account.yaml deploy/namespace-init.yaml
  3. $ echo -e "\n---\n" >> deploy/namespace-init.yaml
  4. $ cat deploy/role.yaml >> deploy/namespace-init.yaml
  5. $ echo -e "\n---\n" >> deploy/namespace-init.yaml
  6. $ cat deploy/role_binding.yaml >> deploy/namespace-init.yaml
  7. $ echo -e "\n---\n" >> deploy/namespace-init.yaml
  8. $ cat deploy/operator.yaml >> deploy/namespace-init.yaml
  9. # Run tests
  10. $ go test ./test/e2e/... -root=$(pwd) -kubeconfig=$HOME/.kube/config -globalMan deploy/crds/cache.example.com_apps_crd.yaml -namespacedMan deploy/namespace-init.yaml -v -parallel=2

Manual Cleanup

While the test framework provides utilities that allow the test to automatically be cleaned up when done, it is possible that an error in the test code could cause a panic, which would stop the test without running the deferred cleanup. To clean up manually, you should check what namespaces currently exist in your cluster. You can do this with kubectl:

  1. $ kubectl get namespaces
  2. Example Output:
  3. NAME STATUS AGE
  4. default Active 2h
  5. kube-public Active 2h
  6. kube-system Active 2h
  7. main-1534287036 Active 23s
  8. memcached-memcached-group-cluster-1534287037 Active 22s
  9. memcached-memcached-group-cluster2-1534287037 Active 22s

The names of the namespaces will be either start with main or with the name of the tests and the suffix will be a Unix timestamp (number of seconds since January 1, 1970 00:00 UTC). Kubectl can be used to delete these namespaces and the resources in those namespaces:

  1. $ kubectl delete namespace main-153428703

Since the CRD is not namespaced, it must be deleted separately. Clean up the CRD created by the tests using the CRD manifest deploy/crd.yaml:

  1. $ kubectl delete -f deploy/crds/cache.example.com_memcacheds_crd.yaml

Last modified May 8, 2020: *: update doc links (#2991) (3efa98cb)