Paimon Metrics

Paimon has built a metrics system to measure the behaviours of reading and writing, like how many manifest files it scanned in the last planning, how long it took in the last commit operation, how many files it deleted in the last compact operation.

In Paimon’s metrics system, metrics are updated and reported at table granularity.

There are three types of metrics provided in the Paimon metric system, Gauge, Counter, Histogram.

  • Gauge: Provides a value of any type at a point in time.
  • Counter: Used to count values by incrementing and decrementing.
  • Histogram: Measure the statistical distribution of a set of values including the min, max, mean, standard deviation and percentile.

Paimon has supported built-in metrics to measure operations of commits, scans, writes and compactions, which can be bridged to any computing engine that supports, like Flink, Spark etc.

Metrics List

Below is lists of Paimon built-in metrics. They are summarized into types of scan metrics, commit metrics, write metrics, write buffer metrics and compaction metrics.

Scan Metrics

Metrics NameTypeDescription
lastScanDurationGaugeThe time it took to complete the last scan.
scanDurationHistogramDistributions of the time taken by the last few scans.
lastScannedManifestsGaugeNumber of scanned manifest files in the last scan.
lastSkippedByPartitionAndStatsGaugeSkipped table files by partition filter and value / key stats information in the last scan.
lastSkippedByBucketAndLevelFilterGaugeSkipped table files by bucket, bucket key and level filter in the last scan.
lastSkippedByWholeBucketFilesFilterGaugeSkipped table files by bucket level value filter (only primary key table) in the last scan.
lastScanSkippedTableFilesGaugeTotal skipped table files in the last scan.
lastScanResultedTableFilesGaugeResulted table files in the last scan.

Commit Metrics

Metrics NameTypeDescription
lastCommitDurationGaugeThe time it took to complete the last commit.
commitDurationHistogramDistributions of the time taken by the last few commits.
lastCommitAttemptsGaugeThe number of attempts the last commit made.
lastTableFilesAddedGaugeNumber of added table files in the last commit, including newly created data files and compacted after.
lastTableFilesDeletedGaugeNumber of deleted table files in the last commit, which comes from compacted before.
lastTableFilesAppendedGaugeNumber of appended table files in the last commit, which means the newly created data files.
lastTableFilesCommitCompactedGaugeNumber of compacted table files in the last commit, including compacted before and after.
lastChangelogFilesAppendedGaugeNumber of appended changelog files in last commit.
lastChangelogFileCommitCompactedGaugeNumber of compacted changelog files in last commit.
lastGeneratedSnapshotsGaugeNumber of snapshot files generated in the last commit, maybe 1 snapshot or 2 snapshots.
lastDeltaRecordsAppendedGaugeDelta records count in last commit with APPEND commit kind.
lastChangelogRecordsAppendedGaugeChangelog records count in last commit with APPEND commit kind.
lastDeltaRecordsCommitCompactedGaugeDelta records count in last commit with COMPACT commit kind.
lastChangelogRecordsCommitCompactedGaugeChangelog records count in last commit with COMPACT commit kind.
lastPartitionsWrittenGaugeNumber of partitions written in the last commit.
lastBucketsWrittenGaugeNumber of buckets written in the last commit.

Write Buffer Metrics

Metrics NameTypeDescription
numWritersGaugeNumber of writers in this parallelism.
bufferPreemptCountGaugeThe total number of memory preempted.
usedWriteBufferSizeByteGaugeCurrent used write buffer size in byte.
totalWriteBufferSizeByteGaugeThe total write buffer size configured in byte.

Compaction Metrics

Metrics NameTypeDescription
maxLevel0FileCountGaugeThe maximum number of level 0 files currently handled by this writer. This value will become larger if asynchronous compaction cannot be done in time.
avgLevel0FileCountGaugeThe average number of level 0 files currently handled by this writer. This value will become larger if asynchronous compaction cannot be done in time.
compactionThreadBusyGaugeThe maximum business of compaction threads in this parallelism. Currently, there is only one compaction thread in each parallelism, so value of business ranges from 0 (idle) to 100 (compaction running all the time).
avgCompactionTimeGaugeThe average runtime of compaction threads, calculated based on recorded compaction time data in milliseconds. The value represents the average duration of compaction operations. Higher values indicate longer average compaction times, which may suggest the need for performance optimization.

Paimon has implemented bridging metrics to Flink’s metrics system, which can be reported by Flink, and the lifecycle of metric groups are managed by Flink.

Please join the <scope>.<infix>.<metric_name> to get the complete metric identifier when using Flink to access Paimon, metric_name can be got from Metric List.

For example, the identifier of metric lastPartitionsWritten for table word_count in Flink job named insert_word_count is:

localhost.taskmanager.localhost:60340-775a20.insert_word_count.Global Committer : word_count.0.paimon.table.word_count.commit.lastPartitionsWritten.

From Flink Web-UI, go to the committer operator’s metrics, it’s shown as:

0.Global_Committer___word_count.paimon.table.word_count.commit.lastPartitionsWritten.

  1. Please refer to System Scope to understand Flink scope
  2. Scan metrics are only supported by Flink versions >= 1.18
ScopeInfix
Scan Metrics<host>.jobmanager.<job_name><source_operator_name>.coordinator. enumerator.paimon.table.<table_name>.scan
Commit Metrics<host>.taskmanager.<tm_id>.<job_name>.<committer_operator_name>.<subtask_index>paimon.table.<table_name>.commit
Write Metrics<host>.taskmanager.<tm_id>.<job_name>.<writer_operator_name>.<subtask_index>paimon.table.<table_name>.partition.<partition_string>.bucket.<bucket_index>.writer
Write Buffer Metrics<host>.taskmanager.<tm_id>.<job_name>.<writer_operator_name>.<subtask_index>paimon.table.<table_name>.writeBuffer
Compaction Metrics<host>.taskmanager.<tm_id>.<job_name>.<writer_operator_name>.<subtask_index>paimon.table.<table_name>.partition.<partition_string>.bucket.<bucket_index>.compaction
Flink Source Metrics<host>.taskmanager.<tm_id>.<job_name>.<source_operator_name>.<subtask_index>-
Flink Sink Metrics<host>.taskmanager.<tm_id>.<job_name>.<committer_operator_name>.<subtask_index>-

When using Flink to read and write, Paimon has implemented some key standard Flink connector metrics to measure the source latency and output of sink, see FLIP-33: Standardize Connector Metrics. Flink source / sink metrics implemented are listed here.

Metrics NameLevelTypeDescription
currentEmitEventTimeLagFlink Source OperatorGaugeTime difference between sending the record out of source and file creation.
currentFetchEventTimeLagFlink Source OperatorGaugeTime difference between reading the data file and file creation.

Please note that if you specified consumer-id in your streaming query, the level of source metrics should turn into the reader operator, which is behind the Monitor operator.

Metrics NameLevelTypeDescription
numBytesOutTableCounterThe total number of output bytes.
numBytesOutPerSecondTableMeterThe output bytes per second.
numRecordsOutTableCounterThe total number of output records.
numRecordsOutPerSecondTableMeterThe output records per second.