Migration guide: front-coded dictionaries
info
Front coding is an experimental feature introduced in Druid 25.0.0.
Apache Druid encodes string columns into dictionaries for better compression. Front coding is an incremental encoding strategy that lets you store STRING and COMPLEX<json> columns in Druid with minimal performance impact. Front-coded dictionaries reduce storage and improve performance by optimizing for strings where the front part looks similar. For example, if you are tracking website visits, most URLs start with https://domain.xyz/
, and front coding is able to exploit this pattern for more optimal compression when storing such datasets. Druid performs the optimization automatically, which means that the performance of string columns is generally not affected when they don’t match the front-coded pattern. Consequently, you can enable this feature universally without having to know the underlying data shapes of the columns.
You can use front coding with all types of ingestion.
Enable front coding
To enable front coding, set indexSpec.stringDictionaryEncoding.type
to frontCoded
in the tuningConfig
object of your ingestion spec.
You can specify the following optional properties:
bucketSize
: Number of values to place in a bucket to perform delta encoding. Setting this property instructs indexing tasks to write segments using compressed dictionaries of the specified bucket size. You can set it to any power of 2 less than or equal to 128.bucketSize
defaults to 4.formatVersion
: Specifies which front coding version to use. Options are 0 and 1 (supported for Druid versions 26.0.0 and higher).formatVersion
defaults to 0.
For example:
"tuningConfig": {
"indexSpec": {
"stringDictionaryEncoding": {
"type":"frontCoded",
"bucketSize": 4,
"formatVersion": 0
}
}
}
For SQL based ingestion, you can add the indexSpec
to your query context. In the Web Console, select Edit context from the context from the Engine: menu and enter the indexSpec
. For example:
{
...
"indexSpec": {
"stringDictionaryEncoding": {
"type": "frontCoded",
"bucketSize": 4,
"formatVersion": 1
}
}
}
For API calls to the SQL-based ingestion API, include the indexSpec
in the context in the request payload. For example:
{
"query": ...
"context": {
"maxNumTasks": 3
"indexSpec": {
"stringDictionaryEncoding": {
"type": "frontCoded",
"bucketSize": 4,
"formatVersion": 1}
}
}
}
Upgrade from Druid 25.0.0
Druid 26.0.0 introduced a new version of the front-coded dictionary, version 1, offering typically faster read speeds and smaller storage sizes. When upgrading to versions Druid 26.0.0 and higher, Druid continues to default front coding settings to version 0. This default enables seamless downgrades to Druid 25.0.0.
To use the newer version, set the formatVersion
property to 1:
"tuningConfig": {
"indexSpec": {
"stringDictionaryEncoding": {
"type":"frontCoded",
"bucketSize": 4,
"formatVersion": 1
}
}
}
Downgrade to Druid 25.0.0
After upgrading to version 1, you can no longer downgrade to Druid 25.0.0 seamlessly. To downgrade to Druid 25.0.0, re-ingest your data with the stringDictionaryEncoding.formatVersion
property set to 0.
Downgrade to a version preceding Druid 25.0.0
Druid versions preceding 25.0.0 can’t read segments with front-coded dictionaries. To downgrade to an older version, you must either delete the segments containing front-coded dictionaries or re-ingest them with stringDictionaryEncoding.type
set to utf8
.