Edge n-gram token filter

The edge_ngram token filter is very similar to the ngram token filter, where a particular string is split into substrings of different lengths. The edge_ngram token filter, however, generates n-grams (substrings) only from the beginning (edge) of a token. It’s particularly useful in scenarios like autocomplete or prefix matching, where you want to match the beginning of words or phrases as the user types them.

Parameters

The edge_ngram token filter can be configured with the following parameters.

ParameterRequired/OptionalData typeDescription
min_gramOptionalIntegerThe minimum length of the n-grams that will be generated. Default is 1.
max_gramOptionalIntegerThe maximum length of the n-grams that will be generated. Default is 1 for the edge_ngram filter and 2 for custom token filters. Avoid setting this parameter to a low value. If the value is set too low, only very short n-grams will be generated and the search term will not be found. For example, if max_gram is set to 3 and you index the word “banana”, the longest generated token will be “ban”. If the user searches for “banana”, no matches will be returned. You can use the truncate token filter as a search analyzer to mitigate this risk.
preserve_originalOptionalBooleanIncludes the original token in the output. Default is false .

Example

The following example request creates a new index named edge_ngram_example and configures an analyzer with the edge_ngram filter:

  1. PUT /edge_ngram_example
  2. {
  3. "settings": {
  4. "analysis": {
  5. "filter": {
  6. "my_edge_ngram": {
  7. "type": "edge_ngram",
  8. "min_gram": 3,
  9. "max_gram": 4
  10. }
  11. },
  12. "analyzer": {
  13. "my_analyzer": {
  14. "type": "custom",
  15. "tokenizer": "standard",
  16. "filter": ["lowercase", "my_edge_ngram"]
  17. }
  18. }
  19. }
  20. }
  21. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. POST /edge_ngram_example/_analyze
  2. {
  3. "analyzer": "my_analyzer",
  4. "text": "slow green turtle"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "slo",
  5. "start_offset": 0,
  6. "end_offset": 4,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "slow",
  12. "start_offset": 0,
  13. "end_offset": 4,
  14. "type": "<ALPHANUM>",
  15. "position": 0
  16. },
  17. {
  18. "token": "gre",
  19. "start_offset": 5,
  20. "end_offset": 10,
  21. "type": "<ALPHANUM>",
  22. "position": 1
  23. },
  24. {
  25. "token": "gree",
  26. "start_offset": 5,
  27. "end_offset": 10,
  28. "type": "<ALPHANUM>",
  29. "position": 1
  30. },
  31. {
  32. "token": "tur",
  33. "start_offset": 11,
  34. "end_offset": 17,
  35. "type": "<ALPHANUM>",
  36. "position": 2
  37. },
  38. {
  39. "token": "turt",
  40. "start_offset": 11,
  41. "end_offset": 17,
  42. "type": "<ALPHANUM>",
  43. "position": 2
  44. }
  45. ]
  46. }