Length token filter

The length token filter is used to remove tokens that don’t meet specified length criteria (minimum and maximum values) from the token stream.

Parameters

The length token filter can be configured with the following parameters.

ParameterRequired/OptionalData typeDescription
minOptionalIntegerThe minimum token length. Default is 0.
maxOptionalIntegerThe maximum token length. Default is Integer.MAX_VALUE (2147483647).

Example

The following example request creates a new index named my_index and configures an analyzer with a length filter:

  1. PUT my_index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "only_keep_4_to_10_characters": {
  7. "tokenizer": "whitespace",
  8. "filter": [ "length_4_to_10" ]
  9. }
  10. },
  11. "filter": {
  12. "length_4_to_10": {
  13. "type": "length",
  14. "min": 4,
  15. "max": 10
  16. }
  17. }
  18. }
  19. }
  20. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my_index/_analyze
  2. {
  3. "analyzer": "only_keep_4_to_10_characters",
  4. "text": "OpenSearch is a great tool!"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "OpenSearch",
  5. "start_offset": 0,
  6. "end_offset": 10,
  7. "type": "word",
  8. "position": 0
  9. },
  10. {
  11. "token": "great",
  12. "start_offset": 16,
  13. "end_offset": 21,
  14. "type": "word",
  15. "position": 3
  16. },
  17. {
  18. "token": "tool!",
  19. "start_offset": 22,
  20. "end_offset": 27,
  21. "type": "word",
  22. "position": 4
  23. }
  24. ]
  25. }