InfluxDB Emitter
To use this Apache Druid extension, make sure to include druid-influxdb-emitter
extension.
Introduction
This extension emits druid metrics to InfluxDB over HTTP. Currently this emitter only emits service metric events to InfluxDB (See Druid metrics for a list of metrics). When a metric event is fired it is added to a queue of events. After a configurable amount of time, the events on the queue are transformed to InfluxDB’s line protocol and POSTed to the InfluxDB HTTP API. The entire queue is flushed at this point. The queue is also flushed as the emitter is shutdown.
Note that authentication and authorization must be enabled on the InfluxDB server.
Configuration
All the configuration parameters for the influxdb emitter are under druid.emitter.influxdb
.
Property | Description | Required? | Default |
---|---|---|---|
druid.emitter.influxdb.hostname | The hostname of the InfluxDB server. | Yes | N/A |
druid.emitter.influxdb.port | The port of the InfluxDB server. | No | 8086 |
druid.emitter.influxdb.databaseName | The name of the database in InfluxDB. | Yes | N/A |
druid.emitter.influxdb.maxQueueSize | The size of the queue that holds events. | No | Integer.MAX_VALUE(=2^31-1) |
druid.emitter.influxdb.flushPeriod | How often (in milliseconds) the events queue is parsed into Line Protocol and POSTed to InfluxDB. | No | 60000 |
druid.emitter.influxdb.flushDelay | How long (in milliseconds) the scheduled method will wait until it first runs. | No | 60000 |
druid.emitter.influxdb.influxdbUserName | The username for authenticating with the InfluxDB database. | Yes | N/A |
druid.emitter.influxdb.influxdbPassword | The password of the database authorized user | Yes | N/A |
druid.emitter.influxdb.dimensionWhitelist | A whitelist of metric dimensions to include as tags | No | [“dataSource”,”type”,”numMetrics”,”numDimensions”,”threshold”,”dimension”,”taskType”,”taskStatus”,”tier”] |
InfluxDB Line Protocol
An example of how this emitter parses a Druid metric event into InfluxDB’s line protocol is given here:
The syntax of the line protocol is :
<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]
where timestamp is in nanoseconds since epoch.
A typical service metric event as recorded by Druid’s logging emitter is: Event [{"feed":"metrics","timestamp":"2017-10-31T09:09:06.857Z","service":"druid/historical","host":"historical001:8083","version":"0.11.0-SNAPSHOT","metric":"query/cache/total/hits","value":34787256}]
.
This event is parsed into line protocol according to these rules:
- The measurement becomes druid_query since query is the first part of the metric.
- The tags are service=druid/historical, hostname=historical001, metric=druid_cache_total. (The metric tag is the middle part of the druid metric separated with _ and preceded by druid_. Another example would be if an event has metric=query/time then there is no middle part and hence no metric tag)
- The field is druid_hits since this is the last part of the metric.
This gives the following String which can be POSTed to InfluxDB: "druid_query,service=druid/historical,hostname=historical001,metric=druid_cache_total druid_hits=34787256 1509440946857000000"
The InfluxDB emitter has a white list of dimensions which will be added as a tag to the line protocol string if the metric has a dimension from the white list. The value of the dimension is sanitized such that every occurrence of a dot or whitespace is replaced with a _
.