Keep types token filter

The keep_types token filter is a type of token filter used in text analysis to control which token types are kept or discarded. Different tokenizers produce different token types, for example, <HOST>, <NUM>, or <ALPHANUM>.

The keyword, simple_pattern, and simple_pattern_split tokenizers do not support the keep_types token filter because these tokenizers do not support token type attributes.

Parameters

The keep_types token filter can be configured with the following parameters.

ParameterRequired/OptionalData typeDescription
typesRequiredList of stringsList of token types to be kept or discarded (determined by the mode).
modeOptionalStringWhether to include or exclude the token types specified in types. Default is include.

Example

The following example request creates a new index named test_index and configures an analyzer with a keep_types filter:

  1. PUT /test_index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "custom_analyzer": {
  7. "type": "custom",
  8. "tokenizer": "standard",
  9. "filter": ["lowercase", "keep_types_filter"]
  10. }
  11. },
  12. "filter": {
  13. "keep_types_filter": {
  14. "type": "keep_types",
  15. "types": ["<ALPHANUM>"]
  16. }
  17. }
  18. }
  19. }
  20. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /test_index/_analyze
  2. {
  3. "analyzer": "custom_analyzer",
  4. "text": "Hello 2 world! This is an example."
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "hello",
  5. "start_offset": 0,
  6. "end_offset": 5,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "world",
  12. "start_offset": 8,
  13. "end_offset": 13,
  14. "type": "<ALPHANUM>",
  15. "position": 2
  16. },
  17. {
  18. "token": "this",
  19. "start_offset": 15,
  20. "end_offset": 19,
  21. "type": "<ALPHANUM>",
  22. "position": 3
  23. },
  24. {
  25. "token": "is",
  26. "start_offset": 20,
  27. "end_offset": 22,
  28. "type": "<ALPHANUM>",
  29. "position": 4
  30. },
  31. {
  32. "token": "an",
  33. "start_offset": 23,
  34. "end_offset": 25,
  35. "type": "<ALPHANUM>",
  36. "position": 5
  37. },
  38. {
  39. "token": "example",
  40. "start_offset": 26,
  41. "end_offset": 33,
  42. "type": "<ALPHANUM>",
  43. "position": 6
  44. }
  45. ]
  46. }