Metrics tracking

Metrics tracking

Overview

1. Metrics Access Instructions

2. Metrics System Design

The metrics system of Dubbo involves three aspects: metrics collection, local aggregation, and metrics pushing.

  • Metrics Collection: Push metrics that need to be monitored internally in Dubbo to a unified Collector for storage.
  • Local Aggregation: Basic metrics are obtained from metrics collection, while some quantile metrics need to be calculated through local aggregation.
  • Metrics Pushing: Collected and aggregated metrics are pushed to a third-party server in a specific manner, currently only involving Prometheus.

3. Structural Design

  • Remove the original classes related to Metrics
  • Create new modules dubbo-metrics/dubbo-metrics-api, dubbo-metrics/dubbo-metrics-prometheus, with MetricsConfig as the configuration class for these modules
  • Use micrometer, and in the Collector, use basic types to represent metrics, such as Long, Double, etc., and introduce micrometer in dubbo-metrics-api to convert internal metrics.

4. Data Flow

img.png

5. Objectives

The metrics interface will provide a MetricsService, which not only provides interface-level data for flexible services but also offers ways to query all metrics. The method-level metrics query interface can be declared as follows:

  1. public interface MetricsService {
  2. /**
  3. * Default {@link MetricsService} extension name.
  4. */
  5. String DEFAULT_EXTENSION_NAME = "default";
  6. /**
  7. * The contract version of {@link MetricsService}, future updates
  8. must ensure compatibility.
  9. */
  10. String VERSION = "1.0.0";
  11. /**
  12. * Get metrics by prefixes
  13. *
  14. * @param categories categories
  15. * @return metrics - key=MetricCategory value=MetricsEntityList
  16. */
  17. Map<MetricsCategory, List<MetricsEntity>> getMetricsByCategories(List<MetricsCategory> categories);
  18. /**
  19. * Get metrics by interface and prefixes
  20. *
  21. * @param serviceUniqueName serviceUniqueName (eg.group/interfaceName:version)
  22. * @param categories categories
  23. * @return metrics - key=MetricCategory value=MetricsEntityList
  24. */
  25. Map<MetricsCategory, List<MetricsEntity>> getMetricsByCategories(String serviceUniqueName, List<MetricsCategory> categories);
  26. /**
  27. * Get metrics by interface, method, and prefixes
  28. *
  29. * @param serviceUniqueName serviceUniqueName (eg.group/interfaceName:version)
  30. * @param methodName methodName
  31. * @param parameterTypes method parameter types
  32. * @param categories categories
  33. * @return metrics - key=MetricCategory value=MetricsEntityList
  34. */
  35. Map<MetricsCategory, List<MetricsEntity>> getMetricsByCategories(String serviceUniqueName, String methodName, Class<?>[] parameterTypes, List<MetricsCategory> categories);
  36. }

Where MetricsCategory is designed as follows:

  1. public enum MetricsCategory {
  2. RT,
  3. QPS,
  4. REQUESTS,
  5. }

MetricsEntity is designed as follows:

  1. public class MetricsEntity {
  2. private String name;
  3. private Map<String, String> tags;
  4. private MetricsCategory category;
  5. private Object value;
  6. }

Metrics Collection

1. Insertion Location

The architecture diagram of Dubbo is as follows: img.png

Add a layer of MetricsFilter in the provider to overwrite the invoke method and embed the call chain for metrics collection, handled using try-catch-finally. The core code is as follows:

  1. @Activate(group = PROVIDER, order = -1)
  2. public class MetricsFilter implements Filter, ScopeModelAware {
  3. @Override
  4. public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
  5. collector.increaseTotalRequests(interfaceName, methodName, group, version);
  6. collector.increaseProcessingRequests(interfaceName, methodName, group, version);
  7. Long startTime = System.currentTimeMillis();
  8. try {
  9. Result invoke = invoker.invoke(invocation);
  10. collector.increaseSucceedRequests(interfaceName, methodName, group, version);
  11. return invoke;
  12. } catch (RpcException e) {
  13. collector.increaseFailedRequests(interfaceName, methodName, group, version);
  14. throw e;
  15. } finally {
  16. Long endTime = System.currentTimeMillis();
  17. Long rt = endTime - startTime;
  18. collector.addRT(interfaceName, methodName, group, version, rt);
  19. collector.decreaseProcessingRequests(interfaceName, methodName, group, version);
  20. }
  21. }
  22. }

2. Metric Identification

Use the following five attributes as isolation levels to distinguish different methods, which are also the keys of various ConcurrentHashMap.

  1. public class MethodMetric {
  2. private String applicationName;
  3. private String interfaceName;
  4. private String methodName;
  5. private String group;
  6. private String version;
  7. }

3. Basic Metrics

Metrics are stored in the MetricsCollector under the common module.

  1. public class DefaultMetricsCollector implements MetricsCollector {
  2. private Boolean collectEnabled = false;
  3. private final List<MetricsListener> listeners = new ArrayList<>();
  4. private final ApplicationModel applicationModel;
  5. private final String applicationName;
  6. private final Map<MethodMetric, AtomicLong> totalRequests = new ConcurrentHashMap<>();
  7. private final Map<MethodMetric, AtomicLong> succeedRequests = new ConcurrentHashMap<>();
  8. private final Map<MethodMetric, AtomicLong> failedRequests = new ConcurrentHashMap<>();
  9. private final Map<MethodMetric, AtomicLong> processingRequests = new ConcurrentHashMap<>();
  10. private final Map<MethodMetric, AtomicLong> lastRT = new ConcurrentHashMap<>();
  11. private final Map<MethodMetric, LongAccumulator> minRT = new ConcurrentHashMap<>();
  12. private final Map<MethodMetric, LongAccumulator> maxRT = new ConcurrentHashMap<>();
  13. private final Map<MethodMetric, AtomicLong> avgRT = new ConcurrentHashMap<>();
  14. private final Map<MethodMetric, AtomicLong> totalRT = new ConcurrentHashMap<>();
  15. private final Map<MethodMetric, AtomicLong> rtCount = new ConcurrentHashMap<>();
  16. }

Local Aggregation

Local aggregation refers to the process of obtaining quantile metrics through calculations based on some simple metrics.

1. Parameter Design

When collecting metrics, only basic metrics are collected by default, while some single-machine aggregation metrics need to enable service flexibility or local aggregation to compute in a separate thread. If service flexibility is enabled here, local aggregation is enabled by default.

1.1 Local Aggregation Enablement

  1. <dubbo:metrics>
  2. <dubbo:aggregation enable="true" />
  3. </dubbo:metrics>

1.2 Metrics Aggregation Parameters

  1. <dubbo:metrics>
  2. <dubbo:aggregation enable="true" bucket-num="5" time-window-seconds="10"/>
  3. </dubbo:metrics>

2. Specific Metrics

The metrics module of Dubbo helps users observe the internal service status of the running system from the outside. Dubbo refers to the “Four Golden Signals”, RED Method, USE Method, and other theories combined with practical enterprise application scenarios to provide a rich set of key metrics from different dimensions. Focusing on these core metrics is crucial for providing usable services.

Key metrics of Dubbo include: Latency, Traffic, Errors, and Saturation. To better monitor service operation status, Dubbo also provides monitoring for core component states, such as Dubbo application information, thread pool information, and metrics data for interaction with the three major centers.

Key monitoring metrics in Dubbo mainly include:

InfrastructureBusiness Monitoring
LatencyIO wait; Network latency;Average time consumed by interfaces and services, TP90, TP99, TP999, etc.
TrafficNetwork and disk IO;QPS at the service level,
ErrorsDowntime; Disk (bad disk or file system error); Process or port crash; Network packet loss;Error logs; Business status codes, trends of error codes;
SaturationSystem resource utilization: CPU, memory, disk, network; Saturation: number of waiting threads, queue backlog length;This mainly includes JVM, thread pools, etc.
  • qps: dynamically obtained qps based on a sliding window
  • rt: dynamically obtained rt based on a sliding window
  • Number of failed requests: dynamically obtained number of failed requests in recent time based on a sliding window
  • Number of successful requests: dynamically obtained number of successful requests in recent time based on a sliding window
  • Number of requests being processed: simple statistics using pre-and post-Filters
  • Specific metrics rely on a sliding window, with additional use of AggregateMetricsCollector for collection

Metrics output to Prometheus can be referenced as follows:

  1. # HELP jvm_gc_live_data_size_bytes Size of long-lived heap memory pool after reclamation
  2. # TYPE jvm_gc_live_data_size_bytes gauge
  3. jvm_gc_live_data_size_bytes 1.6086528E7
  4. # HELP requests_succeed_aggregate Aggregated Succeed Requests
  5. # TYPE requests_succeed_aggregate gauge
  6. requests_succeed_aggregate{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 39.0
  7. # HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
  8. # TYPE jvm_buffer_memory_used_bytes gauge
  9. jvm_buffer_memory_used_bytes{id="direct",} 1.679975E7
  10. jvm_buffer_memory_used_bytes{id="mapped",} 0.0
  11. # HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next
  12. # TYPE jvm_gc_memory_allocated_bytes_total counter
  13. jvm_gc_memory_allocated_bytes_total 2.9884416E9
  14. # HELP requests_total_aggregate Aggregated Total Requests
  15. # TYPE requests_total_aggregate gauge
  16. requests_total_aggregate{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 39.0
  17. # HELP system_load_average_1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
  18. # TYPE system_load_average_1m gauge
  19. system_load_average_1m 0.0
  20. # HELP system_cpu_usage The "recent cpu usage" for the whole system
  21. # TYPE system_cpu_usage gauge
  22. system_cpu_usage 0.015802269043760128
  23. # HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
  24. # TYPE jvm_threads_peak_threads gauge
  25. jvm_threads_peak_threads 40.0
  26. # HELP requests_processing Processing Requests
  27. # TYPE requests_processing gauge
  28. requests_processing{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0
  29. # HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
  30. # TYPE jvm_memory_max_bytes gauge
  31. jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.22912768E8
  32. jvm_memory_max_bytes{area="heap",id="G1 Survivor Space",} -1.0
  33. jvm_memory_max_bytes{area="heap",id="G1 Old Gen",} 9.52107008E8
  34. jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0
  35. jvm_memory_max_bytes{area="heap",id="G1 Eden Space",} -1.0
  36. jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 5828608.0
  37. jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9
  38. jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 1.22916864E8
  39. # HELP jvm_threads_states_threads The current number of threads having BLOCKED state
  40. # TYPE jvm_threads_states_threads gauge
  41. jvm_threads_states_threads{state="blocked",} 0.0
  42. jvm_threads_states_threads{state="runnable",} 10.0
  43. jvm_threads_states_threads{state="waiting",} 16.0
  44. jvm_threads_states_threads{state="timed-waiting",} 13.0
  45. jvm_threads_states_threads{state="new",} 0.0
  46. jvm_threads_states_threads{state="terminated",} 0.0
  47. # HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
  48. # TYPE jvm_buffer_total_capacity_bytes gauge
  49. jvm_buffer_total_capacity_bytes{id="direct",} 1.6799749E7
  50. jvm_buffer_total_capacity_bytes{id="mapped",} 0.0
  51. # HELP rt_p99 Response Time P99
  52. # TYPE rt_p99 gauge
  53. rt_p99{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 1.0
  54. # HELP jvm_memory_used_bytes The amount of used memory
  55. # TYPE jvm_memory_used_bytes gauge
  56. jvm_memory_used_bytes{area="heap",id="G1 Survivor Space",} 1048576.0
  57. jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.462464E7
  58. jvm_memory_used_bytes{area="heap",id="G1 Old Gen",} 1.6098728E7
  59. jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 4.0126952E7
  60. jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 8.2837504E7
  61. jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 1372032.0
  62. jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",} 4519248.0
  63. jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 5697408.0
  64. # HELP qps Query Per Seconds
  65. # TYPE qps gauge
  66. qps{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.3333333333333333
  67. # HELP rt_min Min Response Time
  68. # TYPE rt_min gauge
  69. rt_min{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0
  70. # HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
  71. # TYPE jvm_buffer_count_buffers gauge
  72. jvm_buffer_count_buffers{id="mapped",} 0.0
  73. jvm_buffer_count_buffers{id="direct",} 10.0
  74. # HELP system_cpu_count The number of processors available to the Java virtual machine
  75. # TYPE system_cpu_count gauge
  76. system_cpu_count 2.0
  77. # HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
  78. # TYPE jvm_classes_loaded_classes gauge
  79. jvm_classes_loaded_classes 7325.0
  80. # HELP rt_total Total Response Time
  81. # TYPE rt_total gauge
  82. rt_total{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 2783.0
  83. # HELP rt_last Last Response Time
  84. # TYPE rt_last gauge
  85. rt_last{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0
  86. # HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
  87. # TYPE jvm_gc_memory_promoted_bytes_total counter
  88. jvm_gc_memory_promoted_bytes_total 1.4450952E7
  89. # HELP jvm_gc_pause_seconds Time spent in GC pause
  90. # TYPE jvm_gc_pause_seconds summary
  91. jvm_gc_pause_seconds_count{action="end of minor GC",cause="Metadata GC Threshold",} 2.0
  92. jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Metadata GC Threshold",} 0.026
  93. jvm_gc_pause_seconds_count{action="end of minor GC",cause="G1 Evacuation Pause",} 37.0
  94. jvm_gc_pause_seconds_sum{action="end of minor GC",cause="G1 Evacuation Pause",} 0.156
  95. # HELP jvm_gc_pause_seconds_max Time spent in GC pause
  96. # TYPE jvm_gc_pause_seconds_max gauge
  97. jvm_gc_pause_seconds_max{action="end of minor GC",cause="Metadata GC Threshold",} 0.0
  98. jvm_gc_pause_seconds_max{action="end of minor GC",cause="G1 Evacuation Pause",} 0.0
  99. # HELP rt_p95 Response Time P95
  100. # TYPE rt_p95 gauge
  101. rt_p95{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0
  102. # HELP requests_total Total Requests
  103. # TYPE requests_total gauge
  104. requests_total{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 27738.0
  105. # HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
  106. # TYPE process_cpu_usage gauge
  107. process_cpu_usage 8.103727714748784E-4
  108. # HELP rt_max Max Response Time
  109. # TYPE rt_max gauge
  110. rt_max{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 4.0
  111. # HELP jvm_gc_max_data_size_bytes Max size of long-lived heap memory pool
  112. # TYPE jvm_gc_max_data_size_bytes gauge
  113. jvm_gc_max_data_size_bytes 9.52107008E8
  114. # HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
  115. # TYPE jvm_threads_live_threads gauge
  116. jvm_threads_live_threads 39.0
  117. # HELP jvm_threads_daemon_threads The current number of live daemon threads
  118. # TYPE jvm_threads_daemon_threads gauge
  119. jvm_threads_daemon_threads 36.0
  120. # HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
  121. # TYPE jvm_classes_unloaded_classes_total counter
  122. jvm_classes_unloaded_classes_total 0.0
  123. # HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
  124. # TYPE jvm_memory_committed_bytes gauge
  125. jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.4680064E7
  126. jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space",} 1048576.0
  127. jvm_memory_committed_bytes{area="heap",id="G1 Old Gen",} 5.24288E7
  128. jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 4.1623552E7
  129. jvm_memory_committed_bytes{area="heap",id="G1 Eden Space",} 9.0177536E7
  130. jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0
  131. jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 5111808.0
  132. jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 5701632.0
  133. # HELP requests_succeed Succeed Requests
  134. # TYPE requests_succeed gauge
  135. requests_succeed{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 27738.0
  136. # HELP rt_avg Average Response Time
  137. # TYPE rt_avg gauge
  138. rt_avg{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0

Aggregation Collector

  1. public class AggregateMetricsCollector implements MetricsCollector, MetricsListener {
  2. private int bucketNum;
  3. private int timeWindowSeconds;
  4. private final Map<MethodMetric, TimeWindowCounter> totalRequests = new ConcurrentHashMap<>();
  5. private final Map<MethodMetric, TimeWindowCounter> succeedRequests = new ConcurrentHashMap<>();
  6. private final Map<MethodMetric, TimeWindowCounter> failedRequests = new ConcurrentHashMap<>();
  7. private final Map<MethodMetric, TimeWindowCounter> qps = new ConcurrentHashMap<>();
  8. private final Map<MethodMetric, TimeWindowQuantile> rt = new ConcurrentHashMap<>();
  9. private final ApplicationModel applicationModel;
  10. private static final Integer DEFAULT_COMPRESSION = 100;
  11. private static final Integer DEFAULT_BUCKET_NUM = 10;
  12. private static final Integer DEFAULT_TIME_WINDOW_SECONDS = 120;
  13. // In the constructor, parse the configuration information
  14. public AggregateMetricsCollector(ApplicationModel applicationModel) {
  15. this.applicationModel = applicationModel;
  16. ConfigManager configManager = applicationModel.getApplicationConfigManager();
  17. MetricsConfig config = configManager.getMetrics().orElse(null);
  18. if (config != null && config.getAggregation() != null && Boolean.TRUE.equals(config.getAggregation().getEnabled())) {
  19. // only registered when aggregation is enabled.
  20. registerListener();
  21. AggregationConfig aggregation = config.getAggregation();
  22. this.bucketNum = aggregation.getBucketNum() == null ? DEFAULT_BUCKET_NUM : aggregation.getBucketNum();
  23. this.timeWindowSeconds = aggregation.getTimeWindowSeconds() == null ? DEFAULT_TIME_WINDOW_SECONDS : aggregation.getTimeWindowSeconds();
  24. }
  25. }
  26. }

If local aggregation is enabled, listeners are added through the Spring BeanFactory, binding AggregateMetricsCollector with DefaultMetricsCollector, implementing a producer-consumer model. The DefaultMetricsCollector uses a list of listeners for easy expansion.

  1. private void registerListener() {
  2. applicationModel.getBeanFactory().getBean(DefaultMetricsCollector.class).addListener(this);
  3. }

3. Metrics Aggregation

Sliding Window Assuming we initially have 6 buckets, each with a window time set to 2 minutes. Each time metrics data is written, it will be written into 6 buckets. Every two minutes, a bucket will be moved, and the data in the original bucket will be cleared. When reading metrics, the current bucket pointed to by current will be read to achieve the sliding window effect. As shown in the figure below, it implements that the current bucket stores data within the lifecycle of the bucket as configured, i.e., recent data. img_1.png

In each bucket, the TDigest algorithm is used to calculate quantile metrics.

TDigest Algorithm (high accuracy for extreme quantiles, such as p1 p99, low accuracy for central quantiles, such as p50). Related materials are as follows:

The code implementation is as follows. In addition to TimeWindowQuantile for calculating quantile metrics, TimeWindowCounter is provided to collect the count of metrics within the time interval.

  1. public class TimeWindowQuantile {
  2. private final double compression;
  3. private final TDigest[] ringBuffer;
  4. private int currentBucket;
  5. private long lastRotateTimestampMillis;
  6. private final long durationBetweenRotatesMillis;
  7. public TimeWindowQuantile(double compression, int bucketNum, int timeWindowSeconds) {
  8. this.compression = compression;
  9. this.ringBuffer = new TDigest[bucketNum];
  10. for (int i = 0; i < bucketNum; i++) {
  11. this.ringBuffer[i] = TDigest.createDigest(compression);
  12. }
  13. this.currentBucket = 0;
  14. this.lastRotateTimestampMillis = System.currentTimeMillis();
  15. this.durationBetweenRotatesMillis = TimeUnit.SECONDS.toMillis(timeWindowSeconds) / bucketNum;
  16. }
  17. public synchronized double quantile(double q) {
  18. TDigest currentBucket = rotate();
  19. return currentBucket.quantile(q);
  20. }
  21. public synchronized void add(double value) {
  22. rotate();
  23. for (TDigest bucket : ringBuffer) {
  24. bucket.add(value);
  25. }
  26. }
  27. private TDigest rotate() {
  28. long timeSinceLastRotateMillis = System.currentTimeMillis() - lastRotateTimestampMillis;
  29. while (timeSinceLastRotateMillis > durationBetweenRotatesMillis) {
  30. ringBuffer[currentBucket] = TDigest.createDigest(compression);
  31. if (++currentBucket >= ringBuffer.length) {
  32. currentBucket = 0;
  33. }
  34. timeSinceLastRotateMillis -= durationBetweenRotatesMillis;
  35. lastRotateTimestampMillis += durationBetweenRotatesMillis;
  36. }
  37. return ringBuffer[currentBucket];
  38. }
  39. }

Metrics Push

Metrics pushing is only enabled when the user has set the <dubbo:metrics /> configuration and set the protocol parameter. If only metrics aggregation is enabled, no metrics will be pushed by default.

1. Prometheus Pull ServiceDiscovery

Using intermediate layers such as dubbo-admin, at startup, the local IP, Port, and MetricsURL push address information are pushed to dubbo-admin (or any intermediate layer) based on the configuration, exposing HTTP ServiceDiscovery for Prometheus to read, with the configuration like <dubbo:metrics protocol=“prometheus” mode=“pull” address=”${dubbo-admin.address}” port=“20888” url=”/metrics”/>. In pull mode, address is an optional parameter; if not filled, the user must manually configure the address in the Prometheus configuration file.

  1. private void exportHttpServer() {
  2. boolean exporterEnabled = url.getParameter(PROMETHEUS_EXPORTER_ENABLED_KEY, false);
  3. if (exporterEnabled) {
  4. int port = url.getParameter(PROMETHEUS_EXPORTER_METRICS_PORT_KEY, PROMETHEUS_DEFAULT_METRICS_PORT);
  5. String path = url.getParameter(PROMETHEUS_EXPORTER_METRICS_PATH_KEY, PROMETHEUS_DEFAULT_METRICS_PATH);
  6. if (!path.startsWith("/")) {
  7. path = "/" + path;
  8. }
  9. try {
  10. prometheusExporterHttpServer = HttpServer.create(new InetSocketAddress(port), 0);
  11. prometheusExporterHttpServer.createContext(path, httpExchange -> {
  12. String response = prometheusRegistry.scrape();
  13. httpExchange.sendResponseHeaders(200, response.getBytes().length);
  14. try (OutputStream os = httpExchange.getResponseBody()) {
  15. os.write(response.getBytes());
  16. }
  17. });
  18. httpServerThread = new Thread(prometheusExporterHttpServer::start);
  19. httpServerThread.start();
  20. } catch (IOException e) {
  21. throw new RuntimeException(e);
  22. }
  23. }
  24. }

2. Prometheus Push Pushgateway

Users can directly configure the address of the Prometheus Pushgateway in the Dubbo configuration file, e.g. <dubbo:metrics protocol=“prometheus” mode=“push” address=”${prometheus.pushgateway-url}” interval=“5” />, where the interval represents the push interval.

  1. private void schedulePushJob() {
  2. boolean pushEnabled = url.getParameter(PROMETHEUS_PUSHGATEWAY_ENABLED_KEY, false);
  3. if (pushEnabled) {
  4. String baseUrl = url.getParameter(PROMETHEUS_PUSHGATEWAY_BASE_URL_KEY);
  5. String job = url.getParameter(PROMETHEUS_PUSHGATEWAY_JOB_KEY, PROMETHEUS_DEFAULT_JOB_NAME);
  6. int pushInterval = url.getParameter(PROMETHEUS_PUSHGATEWAY_PUSH_INTERVAL_KEY, PROMETHEUS_DEFAULT_PUSH_INTERVAL);
  7. String username = url.getParameter(PROMETHEUS_PUSHGATEWAY_USERNAME_KEY);
  8. String password = url.getParameter(PROMETHEUS_PUSHGATEWAY_PASSWORD_KEY);
  9. NamedThreadFactory threadFactory = new NamedThreadFactory("prometheus-push-job", true);
  10. pushJobExecutor = Executors.newScheduledThreadPool(1, threadFactory);
  11. PushGateway pushGateway = new PushGateway(baseUrl);
  12. if (!StringUtils.isBlank(username)) {
  13. pushGateway.setConnectionFactory(new BasicAuthHttpConnectionFactory(username, password));
  14. }
  15. pushJobExecutor.scheduleWithFixedDelay(() -> push(pushGateway, job), pushInterval, pushInterval, TimeUnit.SECONDS);
  16. }
  17. }
  18. protected void push(PushGateway pushGateway, String job) {
  19. try {
  20. pushGateway.pushAdd(prometheusRegistry.getPrometheusRegistry(), job);
  21. } catch (IOException e) {
  22. logger.error("Error occurred when pushing metrics to prometheus: ", e);
  23. }
  24. }

Visualization

Currently, it is recommended to use Prometheus for service monitoring and Grafana to display metrics data. You can quickly get started with a case of Dubbo Visualization Monitoring .

Feedback

Was this page helpful?

Yes No

Last modified September 30, 2024: Update & Translate Overview Docs (#3040) (d37ebceaea7)