Progressive Delivery

Linkerd’s dynamic request routing allows you to dynamically shift traffic between services. This can be used to implement lower-risk deployment strategies like blue-green deploys and canaries.

But simply shifting traffic from one version of a service to the next is just the beginning. We can combine traffic splitting with Linkerd’s automatic golden metrics telemetry and drive traffic decisions based on the observed metrics. For example, we can gradually shift traffic from an old deployment to a new one while continually monitoring its success rate. If at any point the success rate drops, we can shift traffic back to the original deployment and back out of the release. Ideally, our users remain happy throughout, not noticing a thing!

In this tutorial, we’ll show you how to use two different progressive delivery tools: Flagger and Argo Rollouts and how to tie Linkerd’s metrics and request routing together in a control loop, allowing for fully-automated, metrics-aware canary deployments.

Progressive Delivery - 图1

Linkerd Production Tip

This page contains best-effort instructions by the open source community. Production users with mission-critical applications should familiarize themselves with Linkerd production resources and/or connect with a commercial Linkerd provider.

Prerequisites

To use this guide, you’ll need a Kubernetes cluster running:

Flagger

Install Flagger

While Linkerd will be managing the actual traffic routing, Flagger automates the process of creating new Kubernetes resources, watching metrics and incrementally sending users over to the new version. To add Flagger to your cluster and have it configured to work with Linkerd, run:

  1. kubectl apply -k github.com/fluxcd/flagger/kustomize/linkerd

This command adds:

  • The canary CRD that enables configuring how a rollout should occur.
  • RBAC which grants Flagger permissions to modify all the resources that it needs to, such as deployments and services.
  • A Flagger controller configured to interact with the Linkerd control plane.

To watch until everything is up and running, you can use kubectl:

  1. kubectl -n flagger-system rollout status deploy/flagger

Set up the demo

This demo consists of three components: a load generator, a deployment and a frontend. The deployment creates a pod that returns some information such as name. You can use the responses to watch the incremental rollout as Flagger orchestrates it. A load generator simply makes it easier to execute the rollout as there needs to be some kind of active traffic to complete the operation. Together, these components have a topology that looks like:

Topology
Topology

To add these components to your cluster and include them in the Linkerd data plane, run:

  1. kubectl create ns test && \
  2. kubectl apply -f https://run.linkerd.io/flagger.yml

Verify that everything has started up successfully by running:

  1. kubectl -n test rollout status deploy podinfo

Check it out by forwarding the frontend service locally and opening http://localhost:8080 locally by running:

  1. kubectl -n test port-forward svc/frontend 8080

Progressive Delivery - 图3

Note

Request routing occurs on the client side of the connection and not the server side. Any requests coming from outside the mesh will not be shifted and will always be directed to the primary backend. A service of type LoadBalancer will exhibit this behavior as the source is not part of the mesh. To shift external traffic, add your ingress controller to the mesh.

Configure the release

Before changing anything, you need to configure how a release should be rolled out on the cluster. The configuration is contained in a Canary and MetricTemplate definition. To apply to your cluster, run:

  1. kubectl apply -f - <<EOF
  2. apiVersion: flagger.app/v1beta1
  3. kind: Canary
  4. metadata:
  5. name: podinfo
  6. namespace: test
  7. spec:
  8. targetRef:
  9. apiVersion: apps/v1
  10. kind: Deployment
  11. name: podinfo
  12. service:
  13. # service port number
  14. port: 9898
  15. # container port number or name (optional)
  16. targetPort: 9898
  17. # Reference to the Service that the generated HTTPRoute would attach to.
  18. gatewayRefs:
  19. - name: podinfo
  20. namespace: test
  21. group: core
  22. kind: Service
  23. port: 9898
  24. analysis:
  25. interval: 10s
  26. threshold: 5
  27. stepWeight: 10
  28. maxWeight: 100
  29. metrics:
  30. - name: success-rate
  31. templateRef:
  32. name: success-rate
  33. namespace: test
  34. thresholdRange:
  35. min: 99
  36. interval: 1m
  37. ---
  38. apiVersion: flagger.app/v1beta1
  39. kind: MetricTemplate
  40. metadata:
  41. name: success-rate
  42. namespace: test
  43. spec:
  44. provider:
  45. type: prometheus
  46. address: http://prometheus.linkerd-viz:9090
  47. query: |
  48. sum(
  49. rate(
  50. response_total{
  51. namespace="{{ namespace }}",
  52. deployment=~"{{ target }}",
  53. classification!="failure",
  54. direction="inbound"
  55. }[{{ interval }}]
  56. )
  57. )
  58. /
  59. sum(
  60. rate(
  61. response_total{
  62. namespace="{{ namespace }}",
  63. deployment=~"{{ target }}",
  64. direction="inbound"
  65. }[{{ interval }}]
  66. )
  67. )
  68. * 100
  69. EOF

In case you are using ServiceProfiles (for features not currently supported with HttpRoutes) you will need to change your Canary to use the SMI interface as provider:

  1. apiVersion: flagger.app/v1beta1
  2. kind: Canary
  3. metadata:
  4. name: podinfo
  5. namespace: test
  6. spec:
  7. provider: "smi:v1alpha2"
  8. targetRef:
  9. apiVersion: apps/v1
  10. kind: Deployment
  11. name: podinfo
  12. service:
  13. # service port number
  14. port: 9898
  15. # container port number or name (optional)
  16. targetPort: 9898
  17. analysis:
  18. interval: 10s
  19. threshold: 5
  20. stepWeight: 10
  21. maxWeight: 100
  22. metrics:
  23. - name: success-rate
  24. templateRef:
  25. name: success-rate
  26. namespace: test
  27. thresholdRange:
  28. min: 99
  29. interval: 1m

This will override the default flagger’s mesh provider from linkerd to smi:v1alpha2. Flagger will create TrafficSplits instead of HTTPRoutes. Linkerd SMI extension will monitor these TrafficSplits and will change dstOverrides on your ServiceProfiles when the deployment is happening.

The Flagger controller is watching these definitions and will create some new resources on your cluster. To watch as this happens, run:

  1. kubectl -n test get ev --watch

A new deployment named podinfo-primary will be created with the same number of replicas that podinfo has. Once the new pods are ready, the original deployment is scaled down to zero. This provides a deployment that is managed by Flagger as an implementation detail and maintains your original configuration files and workflows. Once you see the following line, everything is setup:

  1. 0s Normal Synced canary/podinfo Initialization done! podinfo.test

In addition to a managed deployment, there are also services created to orchestrate routing traffic between the new and old versions of your application. These can be viewed with kubectl -n test get svc and should look like:

  1. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  2. frontend ClusterIP 10.7.251.33 <none> 8080/TCP 96m
  3. podinfo ClusterIP 10.7.252.86 <none> 9898/TCP 96m
  4. podinfo-canary ClusterIP 10.7.245.17 <none> 9898/TCP 23m
  5. podinfo-primary ClusterIP 10.7.249.63 <none> 9898/TCP 23m

At this point, the topology looks a little like:

Initialized
Initialized

Progressive Delivery - 图5

Note

This guide barely touches all the functionality provided by Flagger. Make sure to read the documentation if you’re interested in combining canary releases with HPA, working off custom metrics or doing other types of releases such as A/B testing.

Start the rollout

As a system, Kubernetes resources have two major sections: the spec and status. When a controller sees a spec, it tries as hard as it can to make the status of the current system match the spec. With a deployment, if any of the pod spec configuration is changed, a controller will kick off a rollout. By default, the deployment controller will orchestrate a rolling update.

In this example, Flagger will notice that a deployment’s spec changed and start orchestrating the canary rollout. To kick this process off, you can update the image to a new version by running:

  1. kubectl -n test set image deployment/podinfo \
  2. podinfod=quay.io/stefanprodan/podinfo:1.7.1

Any kind of modification to the pod’s spec such as updating an environment variable or annotation would result in the same behavior as updating the image.

On update, the canary deployment (podinfo) will be scaled up. Once ready, Flagger will begin to update the HTTPRoute incrementally. With a configured stepWeight of 10, each increment will increase the weight of podinfo by 10. For each period, the success rate will be observed and as long as it is over the threshold of 99%, Flagger will continue the rollout. To watch this entire process, run:

  1. kubectl -n test get ev --watch

While an update is occurring, the resources and traffic will look like this at a high level:

Ongoing
Ongoing

After the update is complete, this picture will go back to looking just like the figure from the previous section.

Progressive Delivery - 图7

Note

You can toggle the image tag between 1.7.1 and 1.7.0 to start the rollout again.

Resource

The canary resource updates with the current status and progress. You can watch by running:

  1. watch kubectl -n test get canary

Behind the scenes, Flagger is splitting traffic between the primary and canary backends by updating the HTTPRoute resource. To watch how this configuration changes over the rollout, run:

  1. kubectl -n test get httproute.gateway.networking.k8s.io podinfo -o yaml

Each increment will increase the weight of podinfo-canary and decrease the weight of podinfo-primary. Once the rollout is successful, the weight of podinfo-primary will be set back to 100 and the underlying canary deployment (podinfo) will be scaled down.

Metrics

As traffic shifts from the primary deployment to the canary one, Linkerd provides visibility into what is happening to the destination of requests. The metrics show the backends receiving traffic in real time and measure the success rate, latencies and throughput. From the CLI, you can watch this by running:

  1. watch linkerd viz -n test stat deploy --from deploy/load

Browser

Visit again http://localhost:8080. Refreshing the page will show toggling between the new version and a different header color. Alternatively, running curl http://localhost:8080 will return a JSON response that looks something like:

  1. {
  2. "hostname": "podinfo-primary-74459c7db8-lbtxf",
  3. "version": "1.7.0",
  4. "revision": "4fc593f42c7cd2e7319c83f6bfd3743c05523883",
  5. "color": "blue",
  6. "message": "greetings from podinfo v1.7.0",
  7. "goos": "linux",
  8. "goarch": "amd64",
  9. "runtime": "go1.11.2",
  10. "num_goroutine": "6",
  11. "num_cpu": "8"
  12. }

This response will slowly change as the rollout continues.

Cleanup

To cleanup, remove the Flagger controller from your cluster and delete the test namespace by running:

  1. kubectl delete -k github.com/fluxcd/flagger/kustomize/linkerd && \
  2. kubectl delete ns test

Argo Rollouts

Argo Rollouts is another tool which can use Linkerd to perform incremental canary rollouts based on traffic metrics.

Install Argo Rollouts

Similarly to Flagger, Argo Rollouts will automate the process of creating new Kubernetes resources, watching metrics and will use Linkerd to incrementally shift traffic to the new version. To install Argo Rollouts, run:

  1. kubectl create namespace argo-rollouts && \
  2. kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

To use Argo Rollouts with Linkerd, you will also need to enable the GatewayAPI routing plugin and grant it the necessary RBAC to ready and modify HTTPRoutes:

  1. kubectl apply -f - <<EOF
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: argo-rollouts-config # must be so name
  6. namespace: argo-rollouts # must be in this namespace
  7. data:
  8. trafficRouterPlugins: |-
  9. - name: "argoproj-labs/gatewayAPI"
  10. location: "https://github.com/argoproj-labs/rollouts-plugin-trafficrouter-gatewayapi/releases/download/v0.0.0-rc1/gateway-api-plugin-linux-amd64"
  11. ---
  12. apiVersion: rbac.authorization.k8s.io/v1
  13. kind: ClusterRole
  14. metadata:
  15. name: argo-controller-role
  16. namespace: argo-rollouts
  17. rules:
  18. - apiGroups:
  19. - gateway.networking.k8s.io
  20. resources:
  21. - httproutes
  22. verbs:
  23. - "*"
  24. ---
  25. apiVersion: rbac.authorization.k8s.io/v1
  26. kind: ClusterRoleBinding
  27. metadata:
  28. name: argo-controller
  29. roleRef:
  30. apiGroup: rbac.authorization.k8s.io
  31. kind: ClusterRole
  32. name: argo-controller-role
  33. subjects:
  34. - namespace: argo-rollouts
  35. kind: ServiceAccount
  36. name: argo-rollouts
  37. EOF

Finally, we’ll also need the Argo Rollouts plugin for Kubectl so that we can control rollouts from the command line. Install it by following these instructions.

Set up the demo

We can use the same demo application that we used to demonstrate Flagger. Deploy it by running:

  1. kubectl create ns test && \
  2. kubectl apply -f https://run.linkerd.io/flagger.yml

Configure the rollout

To set up rollouts for this application, we will create a few resources: Services for the stable and canary versions, an HTTPRoute to control routing between these two Services, and a Rollout resource to configure how rollouts should be performed:

  1. kubectl apply -f - <<EOF
  2. apiVersion: gateway.networking.k8s.io/v1beta1
  3. kind: HTTPRoute
  4. metadata:
  5. name: argo-rollouts-http-route
  6. namespace: test
  7. spec:
  8. parentRefs:
  9. - name: podinfo
  10. namespace: test
  11. kind: Service
  12. group: core
  13. port: 9898
  14. rules:
  15. - backendRefs:
  16. - name: podinfo-stable
  17. namespace: test
  18. port: 9898
  19. - name: podinfo-canary
  20. namespace: test
  21. port: 9898
  22. ---
  23. apiVersion: v1
  24. kind: Service
  25. metadata:
  26. name: podinfo-canary
  27. namespace: test
  28. spec:
  29. ports:
  30. - port: 8989
  31. targetPort: 8989
  32. protocol: TCP
  33. name: http
  34. selector:
  35. app: podinfo
  36. ---
  37. apiVersion: v1
  38. kind: Service
  39. metadata:
  40. name: podinfo-stable
  41. namespace: test
  42. spec:
  43. ports:
  44. - port: 8989
  45. targetPort: 8989
  46. protocol: TCP
  47. name: http
  48. selector:
  49. app: podinfo
  50. ---
  51. apiVersion: argoproj.io/v1alpha1
  52. kind: Rollout
  53. metadata:
  54. name: rollouts-demo
  55. namespace: test
  56. spec:
  57. replicas: 1
  58. strategy:
  59. canary:
  60. canaryService: podinfo-canary # our created canary service
  61. stableService: podinfo-stable # our created stable service
  62. trafficRouting:
  63. plugins:
  64. argoproj-labs/gatewayAPI:
  65. httpRoute: argo-rollouts-http-route # our created httproute
  66. namespace: test
  67. steps:
  68. - setWeight: 30
  69. - pause: {}
  70. - setWeight: 40
  71. - pause: { duration: 10 }
  72. - setWeight: 60
  73. - pause: { duration: 10 }
  74. - setWeight: 80
  75. - pause: { duration: 10 }
  76. revisionHistoryLimit: 2
  77. selector:
  78. matchLabels:
  79. app: podinfo
  80. template:
  81. metadata:
  82. labels:
  83. app: podinfo
  84. spec:
  85. containers:
  86. - name: podinfod
  87. image: quay.io/stefanprodan/podinfo:1.7.0
  88. ports:
  89. - containerPort: 9898
  90. protocol: TCP
  91. EOF

Start the rollout

We can trigger a rollout to a new version of podinfo by running:

  1. kubectl argo rollouts -n test set image rollouts-demo \
  2. podinfod=quay.io/stefanprodan/podinfo:1.7.1

We can watch the rollout progress by running:

  1. kubectl argo rollouts -n test get rollout rollouts-demo --watch

Behind the scenes, Argo Rollouts is splitting traffic between the stable and canary backends by updating the HTTPRoute resource. To watch how this configuration changes over the rollout, run:

  1. kubectl -n test get httproute.gateway.networking.k8s.io podinfo -o yaml

We can also use the Linkerd CLI to observe which pods the traffic is being routed to in real time:

  1. watch linkerd viz -n test stat po --from deploy/load

Cleanup

To cleanup, remove the Argo Rollouts controller from your cluster and delete the test namespace by running:

  1. kubectl delete ns argo-rollouts && \
  2. kubectl delete ns test