Use constant_keyword to speed up filtering

Use constant_keyword to speed up filtering

There is a general rule that the cost of a filter is mostly a function of the number of matched documents. Imagine that you have an index containing cycles. There are a large number of bicycles and many searches perform a filter on cycle_type: bicycle. This very common filter is unfortunately also very costly since it matches most documents. There is a simple way to avoid running this filter: move bicycles to their own index and filter bicycles by searching this index instead of adding a filter to the query.

Unfortunately this can make client-side logic tricky, which is where constant_keyword helps. By mapping cycle_type as a constant_keyword with value bicycle on the index that contains bicycles, clients can keep running the exact same queries as they used to run on the monolithic index and Elasticsearch will do the right thing on the bicycles index by ignoring filters on cycle_type if the value is bicycle and returning no hits otherwise.

Here is what mappings could look like:

  1. PUT bicycles
  2. {
  3. "mappings": {
  4. "properties": {
  5. "cycle_type": {
  6. "type": "constant_keyword",
  7. "value": "bicycle"
  8. },
  9. "name": {
  10. "type": "text"
  11. }
  12. }
  13. }
  14. }
  15. PUT other_cycles
  16. {
  17. "mappings": {
  18. "properties": {
  19. "cycle_type": {
  20. "type": "keyword"
  21. },
  22. "name": {
  23. "type": "text"
  24. }
  25. }
  26. }
  27. }

We are splitting our index in two: one that will contain only bicycles, and another one that contains other cycles: unicycles, tricycles, etc. Then at search time, we need to search both indices, but we don’t need to modify queries.

  1. GET bicycles,other_cycles/_search
  2. {
  3. "query": {
  4. "bool": {
  5. "must": {
  6. "match": {
  7. "description": "dutch"
  8. }
  9. },
  10. "filter": {
  11. "term": {
  12. "cycle_type": "bicycle"
  13. }
  14. }
  15. }
  16. }
  17. }

On the bicycles index, Elasticsearch will simply ignore the cycle_type filter and rewrite the search request to the one below:

  1. GET bicycles,other_cycles/_search
  2. {
  3. "query": {
  4. "match": {
  5. "description": "dutch"
  6. }
  7. }
  8. }

On the other_cycles index, Elasticsearch will quickly figure out that bicycle doesn’t exist in the terms dictionary of the cycle_type field and return a search response with no hits.

This is a powerful way of making queries cheaper by putting common values in a dedicated index. This idea can also be combined across multiple fields: for instance if you track the color of each cycle and your bicycles index ends up having a majority of black bikes, you could split it into a bicycles-black and a bicycles-other-colors indices.

The constant_keyword is not strictly required for this optimization: it is also possible to update the client-side logic in order to route queries to the relevant indices based on filters. However constant_keyword makes it transparently and allows to decouple search requests from the index topology in exchange of very little overhead.