Metrics

In this section, we will introduce the MetricsReporter and HoodieMetrics in Hudi. You can view the metrics-related configurations here.

MetricsReporter

MetricsReporter provides APIs for reporting HoodieMetrics to user-specified backends. Currently, the implementations include InMemoryMetricsReporter, JmxMetricsReporter, MetricsGraphiteReporter and DatadogMetricsReporter. Since InMemoryMetricsReporter is only used for testing, we will introduce the other three implementations.

JmxMetricsReporter

JmxMetricsReporter is an implementation of JMX reporter, which used to report JMX metrics.

Configurations

The following is an example of JmxMetricsReporter. More detailed configurations can be referenced here.

  1. hoodie.metrics.on=true
  2. hoodie.metrics.reporter.type=JMX
  3. hoodie.metrics.jmx.host=192.168.0.106
  4. hoodie.metrics.jmx.port=4001

Demo

As configured above, JmxMetricsReporter will started JMX server on port 4001. We can start a jconsole to connect to 192.168.0.106:4001. Below is an illustration of monitoring Hudi JMX metrics through jconsole.

hudi_jxm_metrics.png

MetricsGraphiteReporter

MetricsGraphiteReporter is an implementation of Graphite reporter, which connects to a Graphite server, and send HoodieMetrics to it.

Configurations

The following is an example of MetricsGraphiteReporter. More detaile configurations can be referenced here.

  1. hoodie.metrics.on=true
  2. hoodie.metrics.reporter.type=GRAPHITE
  3. hoodie.metrics.graphite.host=192.168.0.106
  4. hoodie.metrics.graphite.port=2003
  5. hoodie.metrics.graphite.metric.prefix=<your metrics prefix>

Demo

As configured above, assuming a Graphite server is running on host 192.168.0.106 and port 2003, a running Hudi job will connect and report metrics data to it. Below is an illustration of monitoring hudi metrics through Graphite.

hudi_graphite_metrics.png

DatadogMetricsReporter

DatadogMetricsReporter is an implementation of Datadog reporter. A reporter which publishes metric values to Datadog monitoring service via Datadog HTTP API.

Configurations

The following is an example of DatadogMetricsReporter. More detailed configurations can be referenced here.

  1. hoodie.metrics.on=true
  2. hoodie.metrics.reporter.type=DATADOG
  3. hoodie.metrics.datadog.api.site=EU # or US
  4. hoodie.metrics.datadog.api.key=<your api key>
  5. hoodie.metrics.datadog.metric.prefix=<your metrics prefix>
  • hoodie.metrics.datadog.api.site will set the Datadog API site, which determines whether the requests will be sent to api.datadoghq.eu (EU) or api.datadoghq.com (US). Set this according to your Datadog account settings.
  • hoodie.metrics.datadog.api.key will set the api key.
  • hoodie.metrics.datadog.metric.prefix will help segregate metrics by setting different prefixes for different jobs. Note that it will use . to delimit the prefix and the metric name. For example, if the prefix is set to foo, then foo. will be prepended to the metric name.

Demo

In this demo, we ran a HoodieStreamer job with HoodieMetrics turned on and other configurations set properly.

hudi_datadog_metrics.png

As shown above, we were able to collect Hudi’s action-related metrics like

  • <prefix>.<table name>.commit.totalScanTime
  • <prefix>.<table name>.clean.duration
  • <prefix>.<table name>.index.lookup.duration

as well as HoodieStreamer-specific metrics

  • <prefix>.<table name>.deltastreamer.duration
  • <prefix>.<table name>.deltastreamer.hiveSyncDuration

PrometheusMetricsReporter

Prometheus is an open source systems monitoring and alerting toolkit. Prometheus has a PushGateway that Apache Hudi can leverage for metrics reporting. Follow Prometheus documentation for basic setup instructions.

Similar to other supported reporters, the following attributes are required to enable pushgateway reporters:

  1. hoodie.metrics.on=true
  2. hoodie.metrics.reporter.type=PROMETHEUS_PUSHGATEWAY

The following properties are used to configure the address and port number of pushgateway. The default address is localhost, and the default port is 9091

  1. hoodie.metrics.pushgateway.host=xxxx
  2. hoodie.metrics.pushgateway.port=9091

You can configure whether to delete the monitoring information from pushgateway at the end of the task, the default is true

  1. hoodie.metrics.pushgateway.delete.on.shutdown=false

You can configure the task name prefix and whether a random suffix is required. The default is true

  1. hoodie.metrics.pushgateway.job.name=xxxx
  2. hoodie.metrics.pushgateway.random.job.name.suffix=false

AWS CloudWatchReporter

Hudi supports publishing metrics to Amazon CloudWatch. It can be configured by setting hoodie.metrics.reporter.type to “CLOUDWATCH”. Static AWS credentials to be used can be configured using hoodie.aws.access.key, hoodie.aws.secret.key, hoodie.aws.session.token properties. In the absence of static AWS credentials being configured, DefaultAWSCredentialsProviderChain will be used to get credentials by checking environment properties. Additional Amazon CloudWatch reporter specific properties that can be tuned are in the HoodieMetricsCloudWatchConfig class.

UserDefinedMetricsReporter

Allows users to define a custom metrics reporter.

Configurations

The following is an example of UserDefinedMetricsReporter. More detailed configurations can be referenced here.

  1. hoodie.metrics.on=true
  2. hoodie.metrics.reporter.class=test.TestUserDefinedMetricsReporter

Demo

In this simple demo, TestMetricsReporter will print all gauges every 10 seconds

  1. public static class TestUserDefinedMetricsReporter
  2. extends AbstractUserDefinedMetricsReporter {
  3. private static final Logger log = LogManager.getLogger(DummyMetricsReporter.class);
  4. private ScheduledExecutorService exec = Executors.newScheduledThreadPool(1, r -> {
  5. Thread t = Executors.defaultThreadFactory().newThread(r);
  6. t.setDaemon(true);
  7. return t;
  8. });
  9. public TestUserDefinedMetricsReporter(Properties props, MetricRegistry registry) {
  10. super(props, registry);
  11. }
  12. @Override
  13. public void start() {
  14. exec.schedule(this::report, 10, TimeUnit.SECONDS);
  15. }
  16. @Override
  17. public void report() {
  18. this.getRegistry().getGauges().forEach((key, value) ->
  19. log.info("key: " + key + " value: " + value.getValue().toString()));
  20. }
  21. @Override
  22. public Closeable getReporter() {
  23. return null;
  24. }
  25. @Override
  26. public void stop() {
  27. exec.shutdown();
  28. }
  29. }

HoodieMetrics

Once the Hudi writer is configured with the right table and environment for HoodieMetrics, it produces the following HoodieMetrics, that aid in debugging hudi tables

  • Commit Duration - The amount of time it took to successfully commit a batch of records
  • Rollback Duration - Similarly, the amount of time taken to undo partial data left over by a failed commit (rollback happens automatically after a failing write)
  • File Level metrics - Shows the amount of new files added, versions, deleted (cleaned) in each commit
  • Record Level Metrics - Total records inserted/updated etc per commit
  • Partition Level metrics - number of partitions upserted (super useful to understand sudden spikes in commit duration)

These HoodieMetrics can then be plotted on a standard tool like grafana. Below is a sample commit duration chart.

hudi_commit_duration.png

List of metrics:

The below metrics are available in all timeline operations that involves a commit such as deltacommit, compaction, clustering and rollback.

NameDescription
commitFreshnessInMsMilliseconds from the commit end time and the maximum event time of the incoming records
commitLatencyInMsMilliseconds from the commit end time and the minimum event time of incoming records
commitTimeTime of commit in epoch milliseconds
durationTotal time taken for the commit/rollback in milliseconds
numFilesDeletedNumber of files deleted during a clean/rollback
numFilesFinalizedNumber of files finalized in a write
totalBytesWrittenBytes written in a HoodieCommit
totalCompactedRecordsUpdatedNumber of records updated in a compaction operation
totalCreateTimeTime taken for file creation during a Hoodie Insert operation
totalFilesInsertNumber of newly written files in a HoodieCommit
totalFilesUpdateNumber of files updated in a HoodieCommit
totalInsertRecordsWrittenNumber of records inserted or converted to updates(for small file handling) in a HoodieCommit
totalLogFilesCompactedNumber of log files under a base file in a file group compacted
totalLogFilesSizeTotal size in bytes of all log files under a base file in a file group
totalPartitionsWrittenNumber of partitions that took writes in a HoodieCommit
totalRecordsWrittenNumber of records written in a HoodieCommit. For inserts, it is the total numbers of records inserted. And for updates, it the total number of records in the file.
totalScanTimeTime taken for reading and merging logblocks in a log file
totalUpdateRecordsWrittenNumber of records that got changed in a HoodieCommit
totalUpsertTimeTime taken for Hoodie Merge

These metrics can be found at org.apache.hudi.metrics.HoodieMetrics and referenced from org.apache.hudi.common.model.HoodieCommitMetadata and org.apache.hudi.common.model.HoodieWriteStat