Advanced Topics

Manage CR status conditions

An often-used pattern is to include Conditions in the status of custom resources. A Condition represents the latest available observations of an object’s state (see the Kubernetes API conventions documentation for more information).

The Conditions field added to the MemcachedStatus struct simplifies the management of your CR’s conditions. It:

  • Enables callers to add and remove conditions.
  • Ensures that there are no duplicates.
  • Sorts the conditions deterministically to avoid unnecessary repeated reconciliations.
  • Automatically handles the each condition’s LastTransitionTime.
  • Provides helper methods to make it easy to determine the state of a condition.

To use conditions in your custom resource, add a Conditions field to the Status struct in _types.go:

  1. import (
  2. metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
  3. )
  4. type MyAppStatus struct {
  5. // Conditions represent the latest available observations of an object's state
  6. Conditions []metav1.Condition `json:"conditions"`
  7. }

Then, in your controller, you can use Conditions methods to make it easier to set and remove conditions or check their current values.

Adding 3rd Party Resources To Your Operator

The operator’s Manager supports the core Kubernetes resource types as found in the client-go scheme package and will also register the schemes of all custom resource types defined in your project.

  1. import (
  2. cachev1alpha1 "github.com/example/memcached-operator/api/v1alpha1
  3. ...
  4. )
  5. func init() {
  6. // Setup Scheme for all resources
  7. utilruntime.Must(cachev1alpha1.AddToScheme(scheme))
  8. //+kubebuilder:scaffold:scheme
  9. }

To add a 3rd party resource to an operator, you must add it to the Manager’s scheme. By creating an AddToScheme() method or reusing one you can easily add a resource to your scheme. An example shows that you define a function and then use the runtime package to create a SchemeBuilder.

Register with the Manager’s scheme

Call the AddToScheme() function for your 3rd party resource and pass it the Manager’s scheme via mgr.GetScheme() or scheme in main.go. Example:

  1. import (
  2. routev1 "github.com/openshift/api/route/v1"
  3. )
  4. func init() {
  5. ...
  6. // Adding the routev1
  7. utilruntime.Must(clientgoscheme.AddToScheme(scheme))
  8. utilruntime.Must(routev1.AddToScheme(scheme))
  9. //+kubebuilder:scaffold:scheme
  10. ...
  11. }
If 3rd party resource does not have AddToScheme() function

Use the SchemeBuilder package from controller-runtime to initialize a new scheme builder that can be used to register the 3rd party resource with the manager’s scheme.

Example of registering DNSEndpoints 3rd party resource from external-dns:

  1. import (
  2. ...
  3. "k8s.io/apimachinery/pkg/runtime/schema"
  4. "sigs.k8s.io/controller-runtime/pkg/scheme"
  5. ...
  6. // DNSEndoints
  7. externaldns "github.com/kubernetes-incubator/external-dns/endpoint"
  8. metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
  9. )
  10. func init() {
  11. ...
  12. log.Info("Registering Components.")
  13. schemeBuilder := &scheme.Builder{GroupVersion: schema.GroupVersion{Group: "externaldns.k8s.io", Version: "v1alpha1"}}
  14. schemeBuilder.Register(&externaldns.DNSEndpoint{}, &externaldns.DNSEndpointList{})
  15. if err := schemeBuilder.AddToScheme(mgr.GetScheme()); err != nil {
  16. log.Error(err, "")
  17. os.Exit(1)
  18. }
  19. ...
  20. }

NOTES:

  • After adding new import paths to your operator project, run go mod vendor if a vendor/ directory is present in the root of your project directory to fulfill these dependencies.
  • Your 3rd party resource needs to be added before add the controller in "Setup all Controllers".

Metrics

To learn about how metrics work in the Operator SDK read the metrics section of the Kubebuilder documentation.

Handle Cleanup on Deletion

To implement complex deletion logic, you can add a finalizer to your Custom Resource. This will prevent your Custom Resource from being deleted until you remove the finalizer (ie, after your cleanup logic has successfully run). For more information, see the official Kubernetes documentation on finalizers.

Example:

The following is a snippet from a theoretical controller file controllers/memcached_controller.go that implements a finalizer handler:

  1. import (
  2. ...
  3. "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
  4. )
  5. const memcachedFinalizer = "cache.example.com/finalizer"
  6. func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
  7. reqLogger := r.log.WithValues("memcached", req.NamespacedName)
  8. reqLogger.Info("Reconciling Memcached")
  9. // Fetch the Memcached instance
  10. memcached := &cachev1alpha1.Memcached{}
  11. err := r.Get(ctx, req.NamespacedName, memcached)
  12. if err != nil {
  13. if errors.IsNotFound(err) {
  14. // Request object not found, could have been deleted after reconcile request.
  15. // Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
  16. // Return and don't requeue
  17. reqLogger.Info("Memcached resource not found. Ignoring since object must be deleted.")
  18. return ctrl.Result{}, nil
  19. }
  20. // Error reading the object - requeue the request.
  21. reqLogger.Error(err, "Failed to get Memcached.")
  22. return ctrl.Result{}, err
  23. }
  24. ...
  25. // Check if the Memcached instance is marked to be deleted, which is
  26. // indicated by the deletion timestamp being set.
  27. isMemcachedMarkedToBeDeleted := memcached.GetDeletionTimestamp() != nil
  28. if isMemcachedMarkedToBeDeleted {
  29. if controllerutil.ContainsFinalizer(memcached, memcachedFinalizer) {
  30. // Run finalization logic for memcachedFinalizer. If the
  31. // finalization logic fails, don't remove the finalizer so
  32. // that we can retry during the next reconciliation.
  33. if err := r.finalizeMemcached(reqLogger, memcached); err != nil {
  34. return ctrl.Result{}, err
  35. }
  36. // Remove memcachedFinalizer. Once all finalizers have been
  37. // removed, the object will be deleted.
  38. controllerutil.RemoveFinalizer(memcached, memcachedFinalizer)
  39. err := r.Update(ctx, memcached)
  40. if err != nil {
  41. return ctrl.Result{}, err
  42. }
  43. }
  44. return ctrl.Result{}, nil
  45. }
  46. // Add finalizer for this CR
  47. if !controllerutil.ContainsFinalizer(memcached, memcachedFinalizer) {
  48. controllerutil.AddFinalizer(memcached, memcachedFinalizer)
  49. err = r.Update(ctx, memcached)
  50. if err != nil {
  51. return ctrl.Result{}, err
  52. }
  53. }
  54. ...
  55. return ctrl.Result{}, nil
  56. }
  57. func (r *MemcachedReconciler) finalizeMemcached(reqLogger logr.Logger, m *cachev1alpha1.Memcached) error {
  58. // TODO(user): Add the cleanup steps that the operator
  59. // needs to do before the CR can be deleted. Examples
  60. // of finalizers include performing backups and deleting
  61. // resources that are not owned by this CR, like a PVC.
  62. reqLogger.Info("Successfully finalized memcached")
  63. return nil
  64. }

Leader election

During the lifecycle of an operator it’s possible that there may be more than 1 instance running at any given time e.g when rolling out an upgrade for the operator. In such a scenario it is necessary to avoid contention between multiple operator instances via leader election so that only one leader instance handles the reconciliation while the other instances are inactive but ready to take over when the leader steps down.

There are two different leader election implementations to choose from, each with its own tradeoff.

  • Leader-with-lease: The leader pod periodically renews the leader lease and gives up leadership when it can’t renew the lease. This implementation allows for a faster transition to a new leader when the existing leader is isolated, but there is a possibility of split brain in certain situations.
  • Leader-for-life: The leader pod only gives up leadership (via garbage collection) when it is deleted. This implementation precludes the possibility of 2 instances mistakenly running as leaders (split brain). However, this method can be subject to a delay in electing a new leader. For instance when the leader pod is on an unresponsive or partitioned node, the pod-eviction-timeout dictates how long it takes for the leader pod to be deleted from the node and step down (default 5m).

By default the SDK enables the leader-with-lease implementation. However you should consult the docs above for both approaches to consider the tradeoffs that make sense for your use case.

The following examples illustrate how to use the two options:

Leader for life

A call to leader.Become() will block the operator as it retries until it can become the leader by creating the configmap named memcached-operator-lock.

  1. import (
  2. ...
  3. "github.com/operator-framework/operator-lib/leader"
  4. )
  5. func main() {
  6. ...
  7. err = leader.Become(context.TODO(), "memcached-operator-lock")
  8. if err != nil {
  9. log.Error(err, "Failed to retry for leader lock")
  10. os.Exit(1)
  11. }
  12. ...
  13. }

If the operator is not running inside a cluster leader.Become() will simply return without error to skip the leader election since it can’t detect the operator’s namespace.

Leader with lease

The leader-with-lease approach can be enabled via the Manager Options for leader election.

  1. import (
  2. ...
  3. "sigs.k8s.io/controller-runtime/pkg/manager"
  4. )
  5. func main() {
  6. ...
  7. opts := manager.Options{
  8. ...
  9. LeaderElection: true,
  10. LeaderElectionID: "memcached-operator-lock"
  11. }
  12. mgr, err := manager.New(cfg, opts)
  13. ...
  14. }

When the operator is not running in a cluster, the Manager will return an error on starting since it can’t detect the operator’s namespace in order to create the configmap for leader election. You can override this namespace by setting the Manager’s LeaderElectionNamespace option.

Last modified February 11, 2021: align the sdk with kb (#4402) (4fc8a17c)