Apache Kafka Lookups
Lookups are an experimental feature.
To use this Apache Druid extension, make sure to include druid-lookups-cached-global
and druid-kafka-extraction-namespace
as an extension.
If you need updates to populate as promptly as possible, it is possible to plug into a Kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.
{
"type":"kafka",
"kafkaTopic":"testTopic",
"kafkaProperties":{"zookeeper.connect":"somehost:2181/kafka"}
}
Parameter | Description | Required | Default |
---|---|---|---|
kafkaTopic | The Kafka topic to read the data from | Yes | |
kafkaProperties | Kafka consumer properties. At least”zookeeper.connect” must be specified. Only the zookeeper connector is supported | Yes | |
connectTimeout | How long to wait for an initial connection | No | 0 (do not wait) |
isOneToOne | The map is a one-to-one (see Lookup DimensionSpecs) | No | false |
The extension kafka-extraction-namespace
enables reading from a Kafka feed which has name/key pairs to allow renaming of dimension values. An example use case would be to rename an ID to a human readable format.
The consumer properties group.id
and auto.offset.reset
CANNOT be set in kafkaProperties
as they are set by the extension as UUID.randomUUID().toString()
and smallest
respectively.
See lookups for how to configure and use lookups.
Limitations
Currently the Kafka lookup extractor feeds the entire Kafka stream into a local cache. If you are using on-heap caching, this can easily clobber your java heap if the Kafka stream spews a lot of unique keys. off-heap caching should alleviate these concerns, but there is still a limit to the quantity of data that can be stored. There is currently no eviction policy.
Testing the Kafka rename functionality
To test this setup, you can send key/value pairs to a Kafka stream via the following producer console:
./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic
Renames can then be published as OLD_VAL->NEW_VAL
followed by newline (enter or return)