Controller GrayScale Release

Controller GrayScale Release - 图1提示

If you are using KubeVela >= v1.8.0, controller sharding is supported. If you need to run multiple version of KubeVela controllers (all have version >= v1.8.0), you can refer to controller sharding.

System upgrades can always be a dangerous operation for system operators. As a control plane operator, KubeVela controller also faces similar challenges. The introduction of new features or function reconstruction could bring potential risks for running higher version controllers on low version applications.

To help system operators overcome such difficulties, KubeVela provide controller grayscale release mechanism which allow multiple version controllers to run concurrently. When applications are annotated with key app.oam.dev/controller-version-require, only the controller with matched version number will be able to handle it.

Let’s say you already have a controller at version v1.6.4. Now you want to upgrade to v1.7.0-beta.1. To use the controller grayscale release, you can follow the below actions.

  1. Deploy a new controller using the version v1.7.0-beta.1 (This can be achived by duplicate the vela-core deploy in your cluster as shown below). Add --ignore-app-without-controller-version to the args. This will let the controller only be able to handle applications with annotation app.oam.dev/controller-version-require=v1.7.0-beta.1.

Controller GrayScale Release - 图2提示

To duplicate the deploy of your controller, you can apply a deployment like the follow YAML (if you enable specific feature flags like AuthenticateApplication, it is recommended to copy your existing deployment configuration and modify the name & image field)

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: kubevela-vela-core-canary
  5. namespace: vela-system
  6. spec:
  7. replicas: 1
  8. selector:
  9. matchLabels:
  10. app.kubernetes.io/instance: kubevela
  11. app.kubernetes.io/name: vela-core-canary
  12. template:
  13. metadata:
  14. annotations:
  15. prometheus.io/path: /metrics
  16. prometheus.io/port: "8080"
  17. prometheus.io/scrape: "true"
  18. labels:
  19. app.kubernetes.io/instance: kubevela
  20. app.kubernetes.io/name: vela-core-canary
  21. spec:
  22. containers:
  23. - args:
  24. - --ignore-app-without-controller-version
  25. - --metrics-addr=:8080
  26. - --enable-leader-election
  27. - --use-webhook=true
  28. - --webhook-port=9443
  29. - --webhook-cert-dir=/etc/k8s-webhook-certs
  30. - --optimize-mark-with-prob=0.1
  31. - --optimize-disable-component-revision
  32. - --health-addr=:9440
  33. - --disable-caps=rollout
  34. - --system-definition-namespace=vela-system
  35. - --application-revision-limit=2
  36. - --definition-revision-limit=2
  37. - --oam-spec-ver=v0.3
  38. - --enable-cluster-gateway
  39. - --application-re-sync-period=5m
  40. - --concurrent-reconciles=4
  41. - --kube-api-qps=100
  42. - --kube-api-burst=200
  43. - --max-workflow-wait-backoff-time=60
  44. - --max-workflow-failed-backoff-time=300
  45. - --max-workflow-step-error-retry-times=10
  46. image: oamdev/vela-core:v1.7.0-beta.1
  47. imagePullPolicy: Always
  48. name: kubevela
  49. ports:
  50. - containerPort: 9443
  51. name: webhook-server
  52. protocol: TCP
  53. - containerPort: 9440
  54. name: healthz
  55. protocol: TCP
  56. readinessProbe:
  57. failureThreshold: 3
  58. httpGet:
  59. path: /readyz
  60. port: healthz
  61. scheme: HTTP
  62. initialDelaySeconds: 30
  63. periodSeconds: 5
  64. successThreshold: 1
  65. timeoutSeconds: 1
  66. resources:
  67. limits:
  68. cpu: 500m
  69. memory: 1Gi
  70. requests:
  71. cpu: 50m
  72. memory: 20Mi
  73. volumeMounts:
  74. - mountPath: /etc/k8s-webhook-certs
  75. name: tls-cert-vol
  76. readOnly: true
  77. serviceAccount: kubevela-vela-core
  78. serviceAccountName: kubevela-vela-core
  79. volumes:
  80. - name: tls-cert-vol
  81. secret:
  82. defaultMode: 420
  83. secretName: kubevela-vela-core-admission
  1. After setting up two controllers, you could check it out through CLI commands like
  1. kubectl get deployment -n vela-system -owide
  1. NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
  2. kubevela-cluster-gateway 1/1 1 1 4d2h kubevela-vela-core-cluster-gateway oamdev/cluster-gateway:v1.7.0-alpha.3 app.kubernetes.io/instance=kubevela-cluster-gateway,app.kubernetes.io/name=vela-core-cluster-gateway
  3. kubevela-vela-core 1/1 1 1 4d2h kubevela oamdev/vela-core:v1.6.4 app.kubernetes.io/instance=kubevela,app.kubernetes.io/name=vela-core
  4. kubevela-vela-core-canary 1/1 1 1 63m kubevela oamdev/vela-core:v1.7.0-beta.1 app.kubernetes.io/instance=kubevela,app.kubernetes.io/name=vela-core-canary
  1. Choose one application and add annotation app.oam.dev/controller-version-require: v1.7.0-beta.1 to it. The application now will be handled by the new controller.

Controller GrayScale Release - 图3提示

You can also deploy new applications with the annotation mentioned above, like

  1. apiVersion: core.oam.dev/v1beta1
  2. kind: Application
  3. metadata:
  4. name: test
  5. annotations:
  6. app.oam.dev/controller-version-require: v1.7.0-beta.1
  7. spec:
  8. components:
  9. - type: webservice
  10. name: test
  11. properties:
  12. image: nginx
  1. If you view the logs in the controller, you will find out the old controller contains logs like follows. This means the old controller skips the control loop of the target app. If you look into the logs of the new controller, you will find out that the target app is being handled there.
  1. I0110 10:12:30.034066 1 application_controller.go:128] "skip app: not match the controller requirement of app" application="default/test" controller="application" spanID="i-p8enedq6"
  1. After you have confirmed the target application works as expected, you can add the annotation to more applications and let more ones to be handled by the new controller.

  2. Finally, after all applications have been handled by the new controller, you can make upgrades to the original controller like using helm upgrade or vela install. Then you can remove the canary deployment and let the upgraded controller to handle all applications normally.

During this procedure, if any unexpected error happens, you can stop the canary controller by scale its replicas to 0. This won’t affect the applications that are still handled by the old version controller.

Controller GrayScale Release - 图4警告

Notice that there are some limitations for the grayscale release of controller.

  1. The CRD cannot be grayscale released. If different version controllers rely on various CRDs, this solution might not work properly.
  2. Although during the upgrade process, each controller only handles part of the applications, they will still use all the resources required to handle applications. This means in the process, the memory resource consumption will be doubled if you have two controllers running concurrently.

Last updated on 2023年8月4日 by Daniel Higuero