Automated Canary Releases

Automated Canary Releases

Linkerd’s traffic split feature allows you todynamically shift traffic between services. This can be used to implementlower-risk deployment strategies like blue-green deploys and canaries.

But simply shifting traffic from one version of a service to the next is justthe beginning. We can combine traffic splitting with Linkerd’s automaticgolden metrics telemetry and drive traffic decisionsbased on the observed metrics. For example, we can gradually shift traffic froman old deployment to a new one while continually monitoring its success rate. Ifat any point the success rate drops, we can shift traffic back to the originaldeployment and back out of the release. Ideally, our users remain happythroughout, not noticing a thing!

In this tutorial, we’ll walk you through how to combine Linkerd withFlagger, a progressive delivery tool that tiesLinkerd’s metrics and traffic splitting together in a control loop,allowing for fully-automated, metrics-aware canary deployments.

Prerequisites

To use this guide, you’ll need to have Linkerd installed on your cluster.Follow the Installing Linkerd Guide if you haven’talready done this.
The installation of Flagger depends on kubectl 1.14 or newer.

Install Flagger

While Linkerd will be managing the actual traffic routing, Flagger automatesthe process of creating new Kubernetes resources, watching metrics andincrementally sending users over to the new version. To add Flagger to yourcluster and have it configured to work with Linkerd, run:

kubectl apply -k github.com/weaveworks/flagger/kustomize/linkerd

This command adds:

The canaryCRDthat enables configuring how a rollout should occur.
RBAC which grants Flagger permissions to modify all the resources that itneeds to, such as deployments and services.
A controller configured to interact with the Linkerd control plane.To watch until everything is up and running, you can use kubectl:

kubectl -n linkerd rollout status deploy/flagger --watch

Set up the demo

This demo consists of two components: a load generator and a deployment. Thedeployment creates a pod that returns some information such as name. You can usethe responses to watch the incremental rollout as Flagger orchestrates it. Aload generator simply makes it easier to execute the rollout as there needs tobe some kind of active traffic to complete the operation. Together, thesecomponents have a topology that looks like:

)

Topology

To add these components to your cluster and include them in the Linkerddata plane, run:

kubectl create ns test && \
  kubectl apply -f https://run.linkerd.io/flagger.yml

Verify that everything has started up successfully by running:

kubectl -n test rollout status deploy podinfo --watch

Check it out by forwarding the service locally and openinghttp://localhost:9898 locally by running:

kubectl -n test port-forward svc/podinfo 9898

NoteTraffic shifting occurs on the client side of the connection and not theserver side. Any requests coming from outside the mesh will not be shifted andwill always be directed to the primary backend. A service of type LoadBalancerwill exhibit this behavior as the source is not part of the mesh. To shiftexternal traffic, add your ingress controller to the mesh.

Configure the release

Before changing anything, you need to configure how a release should be rolledout on the cluster. The configuration is contained in aCanarydefinition. To apply to your cluster, run:

cat <<EOF | kubectl apply -f -
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  service:
    port: 9898
  canaryAnalysis:
    interval: 10s
    threshold: 5
    stepWeight: 10
    metrics:
    - name: request-success-rate
      threshold: 99
      interval: 1m
EOF

The Flagger controller is watching these definitions and will create some newresources on your cluster. To watch as this happens, run:

kubectl -n test get ev --watch

A new deployment named podinfo-primary will be created with the same number ofreplicas that podinfo has. Once the new pods are ready, the originaldeployment is scaled down to zero. This provides a deployment that is managed byFlagger as an implementation detail and maintains your original configurationfiles and workflows. Once you see the following line, everything is setup:

0s          Normal    Synced                   canary/podinfo                          Initialization done! podinfo.test

In addition to a managed deployment, there are also services created toorchestrate routing traffic between the new and old versions of yourapplication. These can be viewed with kubectl -n test get svc and should looklike:

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
podinfo              ClusterIP   10.7.252.86   <none>        9898/TCP   96m
podinfo-canary       ClusterIP   10.7.245.17   <none>        9898/TCP   23m
podinfo-primary      ClusterIP   10.7.249.63   <none>        9898/TCP   23m

At this point, the topology looks a little like:

)

Initialized

NoteThis guide barely touches all the functionality provided by Flagger. Make sureto read the documentation if you’re interested incombining canary releases with HPA, working off custom metrics or doing othertypes of releases such as A/B testing.

Start the rollout

As a system, Kubernetes resources have two major sections: the spec and status.When a controller sees a spec, it tries as hard as it can to make the status ofthe current system match the spec. With a deployment, if any of the pod specconfiguration is changed, a controller will kick off a rollout. By default, thedeployment controller will orchestrate a rollingupdate.

In this example, Flagger will notice that a deployment’s spec changed and startorchestrating the canary rollout. To kick this process off, you can update theimage to a new version by running:

kubectl -n test set image deployment/podinfo \
  podinfod=quay.io/stefanprodan/podinfo:1.7.1

Any kind of modification to the pod’s spec such as updating an environmentvariable or annotation would result in the same behavior as updating the image.

On update, the canary deployment (podinfo) will be scaled up. Once ready,Flagger will begin to update the TrafficSplit CRDincrementally. With a configured stepWeight of 10, each increment will increasethe weight of podinfo by 10. For each period, the success rate will beobserved and as long as it is over the threshold of 99%, Flagger will continuethe rollout. To watch this entire process, run:

kubectl -n test get ev --watch

While an update is occurring, the resources and traffic will look like this at ahigh level:

)

Ongoing

After the update is complete, this picture will go back to looking just like thefigure from the previous section.

NoteYou can toggle the image tag between 1.7.1 and 1.7.0 to start the rolloutagain.

Resource

The canary resource updates with the current status and progress. You can watchby running:

watch kubectl -n test get canary

Behind the scenes, Flagger is splitting traffic between the primary and canarybackends by updating the traffic split resource. To watch how this configurationchanges over the rollout, run:

kubectl -n test get trafficsplit podinfo -o yaml

Each increment will increase the weight of podinfo-canary and decrease theweight of podinfo-primary. Once the rollout is successful, the weight ofpodinfo-primary will be set back to 100 and the underlying canary deployment(podinfo) will be scaled down.

Metrics

As traffic shifts from the primary deployment to the canary one, Linkerdprovides visibility into what is happening to the destination of requests. Themetrics show the backends receiving traffic in real time and measure the successrate, latencies and throughput. From the CLI, you can watch this by running:

watch linkerd -n test stat deploy --from deploy/load

For something a little more visual, you can use the dashboard. Start it byrunning linkerd dashboard and then look at the detail page for the podinfotraffic split.

)

Dashboard

Browser

To see the landing page served by podinfo, run:

kubectl -n test port-forward svc/frontend 8080

This will make the podinfo landing page available athttp://localhost:8080. Refreshing the page will showtoggling between the new version and a different header color. Alternatively,running curl http://localhost:8080 will return a JSON response that lookssomething like:

{
  "hostname": "podinfo-primary-74459c7db8-lbtxf",
  "version": "1.7.0",
  "revision": "4fc593f42c7cd2e7319c83f6bfd3743c05523883",
  "color": "blue",
  "message": "greetings from podinfo v1.7.0",
  "goos": "linux",
  "goarch": "amd64",
  "runtime": "go1.11.2",
  "num_goroutine": "6",
  "num_cpu": "8"
}

This response will slowly change as the rollout continues.

Cleanup

To cleanup, remove the Flagger controller from your cluster and delete thetest namespace by running:

kubectl delete -k github.com/weaveworks/flagger/kustomize/linkerd && \
  kubectl delete ns test