Metrics

RocketMQ exposes the following metrics in Prometheus format. You can monitor your clusters with those metrics.

  • Broker metrics
  • Producer metrics
  • Consumer metrics

Version support: The following metrics for RocketMQ were introduced since 5.1.0 and only support the broker.

Details of metrics

Metric types

The standard for defining metrics in RocketMQ complies with that for defining the metrics in open source Prometheus. The metric types that RocketMQ offers include counters, gauges, and histograms. For more information, see METRIC TYPES.

Broker metrics

The following table describes the labels of the metrics that are related to the Message Queue for Apache RocketMQ broker.

  • cluster: RocketMQ cluster name.
  • node_type: the type of service node, whitch includes the following:proxy,broker,nameserver.
  • node_id: the ID of the service node.
  • topic: the topic of RocketMQ.
  • message_type: the type of a message, which includes the following:
    normal:normal messages;
    fifo:ordered messages;
    transaction:Transactional messages;
    delay:scheduled or delayed messages.
  • consumer_group: the ID of the consumer group.
  • invocation_status: the result of the API call for create topic or consumer group, which includes success and failure.
TypeNameUnitDescriptionLabel
counterrocketmq_messages_in_totalcountThe number of messages that are produced.cluster,node_type,node_id,topic,message_type
counterrocketmq_messages_out_totalcountThe number of messages that are consumed.cluster,node_type,node_id,topic, consumer_group
counterrocketmq_throughput_in_totalbyteThe write throughput that are produced.cluster,node_type,node_id,topic,message_type
counterrocketmq_throughput_out_totalbyteThe read throughput that are produced.cluster,node_type,node_id,topic, consumer_group
histogramrocketmq_message_sizebyteThe distribution of message sizes. This metric is counted only when messages are sent. The following shows the distribution ranges:
le_1_kb: ≤ 1 KB
le_4_kb: ≤ 4 KB
le_512_kb: ≤ 512 KB
le_1_mb: ≤ 1 MB
le_2_mb: ≤ 2 MB
le_4_mb: ≤ 4 MB
le_overflow: > 4 MB
cluster,node_type,node_id,topic,message_type
gaugerocketmq_consumer_ready_messagescountThe number of ready messages.cluster,node_type,node_id,topic, consumer_group
gaugerocketmq_consumer_inflight_messagescountThe number of inflight messages.cluster,node_type,node_id,topic, consumer_group
gaugerocketmq_consumer_queueing_latencymillisecondReady messages queueing delay time.cluster,node_type,node_id,topic, consumer_group
gaugerocketmq_consumer_lag_latencymillisecondThe delayed time before messages are consumed.cluster,node_type,node_id,topic, consumer_group
counterrocketmq_send_to_dlq_messages_totalcountThe number of messages that are sent to the dead-letter queue.cluster,node_type,node_id,topic, consumer_group
histogramrocketmq_rpc_latencymillisecondThe rpc call latencycluster,node_typ,node_id,protocol_type,request_code,response_code
gaugerocketmq_storage_message_reserve_timemillisecondMessage retention time.cluster,node_type,node_id
gaugerocketmq_storage_dispatch_behind_bytesbyteUndispatched message size.cluster,node_type,node_id
gaugerocketmq_storage_flush_behind_bytesbyteUnflushed messsage size.cluster,node_type,node_id
gaugerocketmq_thread_pool_wartermarkcountThe number of tasks queued in the thread pool.cluster,node_type,node_id,name
histogramrocketmq_topic_create_execution_timemillisecondThe execution time for creating topic:
le_10_ms
le_100_ms
le_1_s
le_3_s
le_5_s
le_overflow
cluster,node_type,node_id,invocation_status,is_system
histogramrocketmq_consumer_group_create_execution_timemillisecondThe execution time for creating consumer group:
le_10_ms
le_100_ms
le_1_s
le_3_s
le_5_s
le_overflow
cluster,node_type,node_id,invocation_status
gaugerocketmq_topic_numbercountThe number of topicscluster,node_type,node_id
gaugerocketmq_consumer_group_numbercountThe number of consumer groupcluster,node_type,node_id

Producer metrics

The following table describes the labels of the metrics that are related to the producers in Message Queue for Apache RocketMQ.

  • cluster: RocketMQ cluster name.
  • node_type: the type of service node, whitch includes the following:proxy,broker,nameserver.
  • node_id: the ID of the service node.
  • topic: the topic of Message Queue for Apache RocketMQ.
  • message_type: the type of a message, which includes the following:
    normal:normal messages;
    fifo:ordered messages;
    transaction:Transactional messages;
    delay:scheduled or delayed messages.
  • client_id: the ID of the client.
  • invocation_status: the result of the API call for sending messages, which includes success and failure.
TypeNameUnitDescriptionLabel
Histogramrocketmq_send_cost_timemillisecondThe distribution of production API call time. The following shows the distribution ranges:
le_1_ms
le_5_ms
le_10_ms
le_20_ms
le_50_ms
le_200_ms
le_500_ms
le_overflow
topic,client_id,invocation_status

Consumer metrics

The following table describes the labels of the metrics that are related to the consumers in Message Queue for Apache RocketMQ.

  • topic: the topic of Message Queue for Apache RocketMQ.
  • consumer_group: the ID of the consumer group.
  • client_id: the ID of the client.
  • invocation_status: the result of the API call for consuming messages, which includes success and failure.
TypeNameUnitDescriptionLabel
Histogramrocketmq_process_timemillisecondThe distribution of message process time.The following shows the distribution ranges:
le_1_ms
le_5_ms  
le_10_ms
le_100_ms
le_10000_ms
le_60000_ms
le_overflow
topic,consumer_group,client_id,invocation_status
gaugerocketmq_consumer_cached_messagesmessageThe number of messages in the local buffer queue of PushConsumer.topic,consumer_group,client_id
gaugerocketmq_consumer_cached_bytesbyteThe total size of messages in the local buffer queue of PushConsumer.topic,consumer_group,client_id
Histogramrocketmq_await_timemillisecondThe distribution of queuing time for messages in the local buffer queue of PushConsumer. The following shows the distribution ranges:
le_1_ms
le_5_ms
le_20_ms
le_100_ms
le_1000_ms
le_5000_ms
le_10000_ms
le_overflow
topic,consumer_group,client_id

Background information

RocketMQ defines metrics based on the following business scenarios.

Message accumulation scenarios

rocketmq queue meesage stuatus
The above figure shows the number and duration of messages in different stages. By monitoring these metrics, you can determine whether the business consumption is abnormal. The following table describes the meaning of these metrics and the formulas that are used to calculate these metrics.

NameDescriptionFormula
Inflight messagesThe number of messages being processed by consumer but not acked yetOffset of the latest pulled message - Offset of the latest committed message
Ready messagesThe number of messages that are ready for consumption.Maximum offset - Offset of the latest pulled message
Ready timenormal message or ordered message:the time when the message is stored to the broker.  
Scheduled message:timing end time.
 Transactional message: transaction commit time.
Ready message queue timeThe time interval between the ready time of the earliest ready message and the current time. This time reflects the timeliness of consumers pulling messages.Current time - Ready time of the earliest ready message
Consumer lag timeThe time difference between the ready time of the earliest unacked message and the current moment.
This time reflects the timeliness of the consumer to complete message processing.
Current time - Ready time of the earliest unacked message

PushConsumer consumption scenarios

In PushConsumer, real-time message processing capability is implemented based on the typical Reactor thread model inside the SDK.As shown below, the SDK has a built-in long polling thread that asynchronously pulls messages into the SDK’s built-in buffer queue and then separately commits them to the consumer thread, triggering the listener to execute the local consumption logic.
PushConsumer client
The metrics of local buffer queues in the PushConsumer scenario are as follows:

  • Number of messages in the local buffer queue: Total number of messages in the local buffer queue.
  • Message size in the local buffer queue: The sum of all message sizes in the local buffer queue.
  • Message waiting time: the time that the message is temporarily cached in the local buffer queue waiting to be processed.

How to Obtain Metrics

Currently, two exporters are supported: gRPC OTLP and Prometheus.

gRPC OTLP Exporter

The gRPC OTLP exporter periodically reports metrics to the specified OpenTelemetry Collector.

Prerequisites: Deploy an OpenTelemetry Collector that supports the GRPC OpenTelemetry Protocol.

To enable the gRPC OTLP exporter of Broker metrics, do the following:

  1. Set metricsExporterType to OTLP_GRPC.
  2. Set metricsGrpcExporterTarget to the endpoint provided by the OpenTelemetry Collector.

Optional configurations:

  1. metricsGrpcExporterHeader: Attach request headers to the gRPC OTLP exporter in the format of key1:value1,key2:value2.
  2. metricGrpcExporterTimeOutInMills: Set the request timeout for the gRPC OTLP exporter.
  3. metricGrpcExporterIntervalInMills: Set the reporting interval for the gRPC OTLP exporter.

Prometheus Exporter

The Prometheus exporter only supports Pull mode and Cumulative aggregation. See OpenTelemetry Metrics Exporter - Prometheus for more information.

To enable the Prometheus exporter of Broker metrics, do the following:

  1. Set metricsExporterType to PROM.

Visit http://<broker-ip>:5557/metrics to view metrics. Configure service discovery or manually configure a pull task in Prometheus to collect metrics.

Optional configurations:

  1. metricsPromExporterPort: The port number on which Broker exposes the metrics service. The default is 5557.
  2. metricGrpcExporterTimeOutInMills: The hostname for the exposed metrics service. The default is the IP to which Broker registers with NameServer, brokerIP1.