Limit token filter

The limit token filter is used to limit the number of tokens passed through the analysis chain.

Parameters

The limit token filter can be configured with the following parameters.

ParameterRequired/OptionalData typeDescription
max_token_countOptionalIntegerThe maximum number of tokens to be generated. Default is 1.
consume_all_tokensOptionalBoolean(Expert-level setting) Uses all tokens from the tokenizer, even if the result exceeds max_token_count. When this parameter is set, the output still only contains the number of tokens specified by max_token_count. However, all tokens generated by the tokenizer are processed. Default is false.

Example

The following example request creates a new index named my_index and configures an analyzer with a limit filter:

  1. PUT my_index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "three_token_limit": {
  7. "tokenizer": "standard",
  8. "filter": [ "custom_token_limit" ]
  9. }
  10. },
  11. "filter": {
  12. "custom_token_limit": {
  13. "type": "limit",
  14. "max_token_count": 3
  15. }
  16. }
  17. }
  18. }
  19. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my_index/_analyze
  2. {
  3. "analyzer": "three_token_limit",
  4. "text": "OpenSearch is a powerful and flexible search engine."
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "OpenSearch",
  5. "start_offset": 0,
  6. "end_offset": 10,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "is",
  12. "start_offset": 11,
  13. "end_offset": 13,
  14. "type": "<ALPHANUM>",
  15. "position": 1
  16. },
  17. {
  18. "token": "a",
  19. "start_offset": 14,
  20. "end_offset": 15,
  21. "type": "<ALPHANUM>",
  22. "position": 2
  23. }
  24. ]
  25. }