Tracing
Overview
Distributed tracing allows developers to obtain visualizations of call flows in large service oriented architectures. It can be invaluable in understanding serialization, parallelism, and sources of latency. Envoy supports three features related to system wide tracing:
- Request ID generation: Envoy will generate UUIDs when needed and populate the x-request-id HTTP header. Applications can forward the x-request-id header for unified logging as well as tracing.
- External trace service integration: Envoy supports pluggable external trace visualization providers. Currently Envoy supports LightStep, Zipkin or any Zipkin compatible backends (e.g. Jaeger). However, support for other tracing providers would not be difficult to add.
- Client trace ID joining: The x-client-trace-id header can be used to join untrusted request IDs to the trusted internal x-request-id.
How to initiate a trace
The HTTP connection manager that handles the request must have the tracing object set. There are several ways tracing can be initiated:
- By an external client via the x-client-trace-id header.
- By an internal service via the x-envoy-force-trace header.
- Randomly sampled via the random_sampling runtime setting.
The router filter is also capable of creating a child span for egress calls via the start_child_span option.
Trace context propagation
Envoy provides the capability for reporting tracing information regarding communications between services in the mesh. However, to be able to correlate the pieces of tracing information generated by the various proxies within a call flow, the services must propagate certain trace context between the inbound and outbound requests.
Whichever tracing provider is being used, the service should propagate the x-request-id to enable logging across the invoked services to be correlated.
The tracing providers also require additional context, to enable the parent/child relationships between the spans (logical units of work) to be understood. This can be achieved by using the LightStep (via OpenTracing API) or Zipkin tracer directly within the service itself, to extract the trace context from the inbound request and inject it into any subsequent outbound requests. This approach would also enable the service to create additional spans, describing work being done internally within the service, that may be useful when examining the end-to-end trace.
Alternatively the trace context can be manually propagated by the service:
- When using the LightStep tracer, Envoy relies on the service to propagate the x-ot-span-context HTTP header while sending HTTP requests to other services.
- When using the Zipkin tracer, Envoy relies on the service to propagate the B3 HTTP headers ( x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, and x-b3-flags). The x-b3-sampled header can also be supplied by an external client to either enable or disable tracing for a particular request.
What data each trace contains
An end-to-end trace is comprised of one or more spans. A span represents a logical unit of work that has a start time and duration and can contain metadata associated with it. Each span generated by Envoy contains the following data:
- Originating service cluster set via
--service-cluster
. - Start time and duration of the request.
- Originating host set via
--service-node
. - Downstream cluster set via the x-envoy-downstream-service-cluster header.
- HTTP URL.
- HTTP method.
- HTTP response code.
- Tracing system-specific metadata.
The span also includes a name (or operation) which by default is defined as the host of the invoked service. However this can be customized using a Decorator on the route. The name can also be overridden using the x-envoy-decorator-operation header.
Envoy automatically sends spans to tracing collectors. Depending on the tracing collector, multiple spans are stitched together using common information such as the globally unique request ID x-request-id (LightStep) or the trace ID configuration (Zipkin). See
for more information on how to setup tracing in Envoy.