Use constant_keyword to speed up filtering
Use constant_keyword
to speed up filtering
There is a general rule that the cost of a filter is mostly a function of the number of matched documents. Imagine that you have an index containing cycles. There are a large number of bicycles and many searches perform a filter on cycle_type: bicycle
. This very common filter is unfortunately also very costly since it matches most documents. There is a simple way to avoid running this filter: move bicycles to their own index and filter bicycles by searching this index instead of adding a filter to the query.
Unfortunately this can make client-side logic tricky, which is where constant_keyword
helps. By mapping cycle_type
as a constant_keyword
with value bicycle
on the index that contains bicycles, clients can keep running the exact same queries as they used to run on the monolithic index and Elasticsearch will do the right thing on the bicycles index by ignoring filters on cycle_type
if the value is bicycle
and returning no hits otherwise.
Here is what mappings could look like:
PUT bicycles
{
"mappings": {
"properties": {
"cycle_type": {
"type": "constant_keyword",
"value": "bicycle"
},
"name": {
"type": "text"
}
}
}
}
PUT other_cycles
{
"mappings": {
"properties": {
"cycle_type": {
"type": "keyword"
},
"name": {
"type": "text"
}
}
}
}
We are splitting our index in two: one that will contain only bicycles, and another one that contains other cycles: unicycles, tricycles, etc. Then at search time, we need to search both indices, but we don’t need to modify queries.
GET bicycles,other_cycles/_search
{
"query": {
"bool": {
"must": {
"match": {
"description": "dutch"
}
},
"filter": {
"term": {
"cycle_type": "bicycle"
}
}
}
}
}
On the bicycles
index, Elasticsearch will simply ignore the cycle_type
filter and rewrite the search request to the one below:
GET bicycles,other_cycles/_search
{
"query": {
"match": {
"description": "dutch"
}
}
}
On the other_cycles
index, Elasticsearch will quickly figure out that bicycle
doesn’t exist in the terms dictionary of the cycle_type
field and return a search response with no hits.
This is a powerful way of making queries cheaper by putting common values in a dedicated index. This idea can also be combined across multiple fields: for instance if you track the color of each cycle and your bicycles
index ends up having a majority of black bikes, you could split it into a bicycles-black
and a bicycles-other-colors
indices.
The constant_keyword
is not strictly required for this optimization: it is also possible to update the client-side logic in order to route queries to the relevant indices based on filters. However constant_keyword
makes it transparently and allows to decouple search requests from the index topology in exchange of very little overhead.