How to trace GreptimeDB

GreptimeDB uses Rust’s tracing framework for code instrument. For the specific details and usage of tracing, please refer to the official documentation of tracing.

By transparently transmitting trace_id and other information on the entire distributed system, we can record the function call chain of the entire distributed link, know the time of each tracked function take and other related information, so as to monitor the entire system.

Define tracing context in RPC

Because the tracing framework does not natively support distributed tracing, we need to manually pass information such as trace_id in the RPC message to correctly identify the function calling relationship. We use standards based on w3c to encode relevant information into tracing_context and attach the message to the RPC header. Mainly defined in:

  • frontend interacts with datanode: tracing_context is defined in RegionRequestHeader
  • frontend interacts with metasrv: tracing_context is defined in RequestHeader
  • Client interacts with frontend: tracing_context is defined in RequestHeader

Pass tracing context in RPC call

We build a TracingContext structure that encapsulates operations related to the tracing context. Related code

GreptimeDB uses TracingContext::from_current_span() to obtain the current tracing context, uses the to_w3c() method to encode the tracing context into a w3c-compliant format, and attaches it to the RPC message, so that the tracing context is correctly distributed passed within the component.

The following example illustrates how to obtain the current tracing context and pass the parameters correctly when constructing the RPC message, so that the tracing context is correctly passed among the distributed components.

rust

  1. let request = RegionRequest {
  2. header: Some(RegionRequestHeader {
  3. tracing_context: TracingContext::from_current_span().to_w3c(),
  4. ..Default::default()
  5. }),
  6. body: Some(region_request::Body::Alter(request)),
  7. };

On the receiver side of the RPC message, the tracing context needs to be correctly decoded and used to build the first span to trace the function call. For example, the following code will correctly decode the tracing_context in the received RPC message using the TracingContext::from_w3c method. And use the attach method to attach the context message to the newly created info_span!("RegionServer::handle_read"), so that the call can be tracked across distributed components.

rust

  1. ...
  2. let tracing_context = request
  3. .header
  4. .as_ref()
  5. .map(|h| TracingContext::from_w3c(&h.tracing_context))
  6. .unwrap_or_default();
  7. let result = self
  8. .handle_read(request)
  9. .trace(tracing_context.attach(info_span!("RegionServer::handle_read")))
  10. .await?;
  11. ...

Use tracing::instrument to instrument the code

We use the instrument macro provided by tracing to instrument the code. We only need to annotate the instrument macro in the function that needs to be instrument. The instrument macro will print every function parameter on each function call into the span in the form of Debug. For parameters that do not implement the Debug trait, or the structure is too large and has too many parameters, resulting in a span that is too large. If you want to avoid these situations, you need to use skip_all to skip printing all parameters.

rust

  1. #[tracing::instrument(skip_all)]
  2. async fn instrument_function(....) {
  3. ...
  4. }

Code instrument across runtime

Rust’s tracing library will automatically handle the nested relationship between instrument functions in the same runtime, but if a function call across the runtime, tracing library cannot automatically trace such calls, and we need to manually pass the context across the runtime.

rust

  1. let tracing_context = TracingContext::from_current_span();
  2. let handle = runtime.spawn(async move {
  3. handler
  4. .handle(query)
  5. .trace(tracing_context.attach(info_span!("xxxxx")))
  6. ...
  7. });

For example, the above code needs to perform tracing across runtimes. We first obtain the current tracing context through TracingContext::from_current_span(), create a span in another runtime, and attach the span to the current context, and we are done. The hidden code points that span the runtime are eliminated, and the call chain is correctly traced.