Distributed tracing with Linkerd
Using distributed tracing in practice can be complex, for a high levelexplanation of what you get and how it is done, we've assembled a list ofmyths.
This guide will walk you through configuring and enabling tracing foremojivoto. Jump to the endfor some recommendations on the best way to make use of distributed tracing withLinkerd.
To use distributed tracing, you'll need to:
- Add a collector which receives spans from your application and Linkerd.
- Add a tracing backend to explore traces.
- Modify your application to emit spans.
- Configure Linkerd's proxies to emit spans.
In the case of emojivoto, once all these steps are complete there will be atopology that looks like:
Prerequisites
- To use this guide, you'll need to have Linkerd installed on your cluster.Follow the Installing Linkerd Guide if you haven'talready done this.
Install the collector
The first step of getting distributed tracing setup is installing a collectoronto your cluster. This component consists of “receivers” that consume spansemitted from the mesh and your applications as well as “exporters” that convertspans and forward them to a backend. To add the OpenCensusCollector to your cluster,run:
kubectl apply -f https://run.linkerd.io/tracing/collector.yml
You will now have a tracing
namespace that contains the collector running aspart of the mesh. It has been configured to:
- Receive spans from OpenCensus clients
- Export spans to a Jaeger backend
The collector is extremely configurable and can use thereceiver orexporter of your choice.
Before moving onto the next step, make sure everything is up and running withkubectl
:
kubectl -n tracing rollout status deploy/oc-collector
Install Jaeger
With a running collector, it is now time to addJaeger to your cluster. The all inoneconfiguration will store the traces, make them searchable and providevisualization of all the data being emitted. To install it on your cluster, run:
kubectl apply -f https://run.linkerd.io/tracing/backend.yml
Jaeger itself is made up of manycomponents.The all-in-one image bundles all these components into a single container tomake demos and showing tracing off a little bit easier.
Before moving onto the next step, make sure everything is up and running withkubectl
:
kubectl -n tracing rollout status deploy/jaeger
Install Emojivoto
Add emojivoto to your cluster with:
kubectl apply -f https://run.linkerd.io/emojivoto.yml
It is possible to use linkerd inject
to add the proxy to emojivoto as outlinedin getting started. Alternatively, annotations can do thesame thing. You can patch these onto the running application with:
kubectl -n emojivoto patch -f https://run.linkerd.io/emojivoto.yml -p '
spec:
template:
metadata:
annotations:
linkerd.io/inject: enabled
config.linkerd.io/trace-collector: oc-collector.tracing:55678
'
Before moving onto the next step, make sure everything is up and running withkubectl
:
kubectl -n emojivoto rollout status deploy/web
Modify the application
Unlike most features of a service mesh, distributed tracing requires modifyingthe source of your application. Tracing needs some way to tie incoming requeststo your application together with outgoing requests to dependent services. To dothis, some headers are added to each request that contain a unique ID for thetrace. Linkerd uses the b3propagation format to tie thesethings together.
We've already modified emojivoto to instrument its requests with thisinformation, thiscommitshows how this was done. For most programming languages, it simply requires theaddition of a client library to take care of this. Emojivoto uses the OpenCensusclient, but others can be used.
To enable tracing in emojivoto, run:
kubectl -n emojivoto set env --all deploy OC_AGENT_HOST=oc-collector.tracing:55678
This command will add an environment variable that enables the applications topropagate context and emit spans.
Explore Jaeger
With vote-bot
starting traces for every request, spans should now be showingup in Jaeger. To get to the UI, start a port forward and send your browser tohttp://localhost:16686.
kubectl -n tracing port-forward svc/jaeger 16686
You can search for any service in the dropdown and click Find Traces. vote-bot
is a great way to get started.
Clicking on a specific trace will provide all the details, you'll be able to seethe spans for every proxy!
There sure are a lot of linkerd-proxy
spans in that output. Internally, theproxy has a server and client side. When a request goes through the proxy, it isreceived by the server and then issued by the client. For a single request thatgoes between two meshed pods, there will be a total of 4 spans. Two will be onthe source side as the request traverses that proxy and two will be on thedestination side as the request is received by the remote proxy.
Cleanup
To cleanup, remove the tracing components along with emojivoto by running:
kubectl delete ns tracing emojivoto
Troubleshooting
I don't see any spans for the proxies
The Linkerd proxy uses the b3propagation format. Some clientlibraries, such as Jaeger, use different formats by default. You'll want toconfigure your client library to use the b3 format to have the proxiesparticipate in traces.
I don't see any traces
Instead of requiring complex client configuration to ensure spans are encryptedin transit, Linkerd relies on its mTLS implementation. This means that it isrequired the collector is part of the mesh. If you are using a service accountother than default
for the collector, the proxies must be configured to usethis as well with the config.alpha.linkerd.io/trace-collector-service-account
annotation.
Recommendations
Ingress
The ingress is an especially important component for distributed tracing becauseit creates the root span of each trace and is responsible for deciding if thattrace should be sampled or not. Having the ingress make all sampling decisionsensures that either an entire trace is sampled or none of it is, and avoidscreating “partial traces”.
Distributed tracing systems all rely on services to propagate metadata about thecurrent trace from requests that they receive to requests that they send. Thismetadata, called the trace context, is usually encoded in one or more requestheaders. There are many different trace context header formats and while we hopethat the ecosystem will eventually converge on open standards like W3Ctracecontext, we only use the b3format today. Being one of theearliest widely used formats, it has the widest support, especially amongingresses like Nginx.
This reference architecture includes a simple Nginx config that samples 50% oftraces and emits trace data to the collector (using the Zipkin protocol). Anyingress controller can be used here in place of Nginx as long as it:
- Supports probabilistic sampling
- Encodes trace context in the b3 format
- Emits spans in a protocol supported by the OpenCensus collector
If using helm to install ingress-nginx, you can configure tracing by using:
controller:
config:
enable-opentracing: "true"
zipkin-collector-host: oc-collector.tracing
Client Library
While it is possible for services to manually propagate trace propagationheaders, it's usually much easier to use a library which does three things:
- Propagates the trace context from incoming request headers to outgoing requestheaders
- Modifies the trace context (i.e. starts a new span)
- Transmits this data to a trace collector
We recommend using OpenCensus in your service and configuring it with:
- b3 propagation (this is thedefault)
- the OpenCensus agentexporter
The OpenCensus agent exporter will export trace data to the OpenCensus collectorover a gRPC API. The details of how to configure OpenCensus will vary languageby language, but there are guides for many popularlanguages. You can also see an end-to-endexample of this in Go with our example application,Emojivoto.
You may notice that the OpenCensus project is in maintenance mode and willbecome part of OpenTelemetry. Unfortunately,OpenTelemetry is not yet production ready and so OpenCensus remains ourrecommendation for the moment.
It is possible to use many other tracing client libraries as well. Just makesure the b3 propagation format is being used and the client library can exportits spans in a format the collector has been configured to receive.
Collector: OpenCensus
The OpenCensus collector receives trace data from the OpenCensus agent exporterand potentially does translation and filtering before sending that data toJaeger. Having the OpenCensus exporter send to the OpenCensus collector gives usa lot of flexibility: we can switch to any backend that OpenCensus supportswithout needing to interrupt the application.
Backend: Jaeger
Jaeger is one of the most widely used tracing backends and for good reason: itis easy to use and does a great job of visualizing traces. However, any backendsupported by OpenCensus can be usedinstead.
Linkerd
If your application is injected with Linkerd, the Linkerd proxy will participatein the traces and will also emit trace data to the OpenCensus collector. Thisenriches the trace data and allows you to see exactly how much time requests arespending in the proxy and on the wire. To enable Linkerd's participation:
- Set the
config.linkerd.io/trace-collector
annotation on the namespace or podspecs that you want to participate in traces. This should be set to theaddress of the OpenCensus collector service. - Set the
config.alpha.linkerd.io/trace-collector-service-account
annotationon the namespace of pod specs that you want to participate in traces. Thisshould be set to the name of the service account of the collector and is usedto ensure secure communication between the proxy and the collector. This canbe omitted if the collector is running as the default service account. - Ensure the OpenCensus collector is injected with the Linkerd proxy.
While Linkerd can only actively participate in traces that use the b3propagation format, Linkerd will always forward unknown request headerstransparently, which means it will never interfere with traces that use otherpropagation formats.