Condition token filter

The condition token filter is a special type of filter that allows you to apply other token filters conditionally based on certain criteria. This provides more control over when certain token filters should be applied during text analysis. Multiple filters can be configured and only applied when they meet the conditions you define. This token filter can be very useful for language-specific processing and handling of special characters.

Parameters

There are two parameters that must be configured in order to use the condition token filter.

ParameterRequired/OptionalData typeDescription
filterRequiredArraySpecifies which token filters should be applied to the tokens when the specified condition (defined by the script parameter) is met.
scriptRequiredObjectConfigures an inline script that defines the condition that needs to be met in order for the filters specified in the filter parameter to be applied (only inline scripts are accepted).

Example

The following example request creates a new index named my_conditional_index and configures an analyzer with a condition filter. This filter applies a lowercase filter to any tokens that contain the character sequence “um”:

  1. PUT /my_conditional_index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "filter": {
  6. "my_conditional_filter": {
  7. "type": "condition",
  8. "filter": ["lowercase"],
  9. "script": {
  10. "source": "token.getTerm().toString().contains('um')"
  11. }
  12. }
  13. },
  14. "analyzer": {
  15. "my_analyzer": {
  16. "type": "custom",
  17. "tokenizer": "standard",
  18. "filter": [
  19. "my_conditional_filter"
  20. ]
  21. }
  22. }
  23. }
  24. }
  25. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my_conditional_index/_analyze
  2. {
  3. "analyzer": "my_analyzer",
  4. "text": "THE BLACK CAT JUMPS OVER A LAZY DOG"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "THE",
  5. "start_offset": 0,
  6. "end_offset": 3,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "BLACK",
  12. "start_offset": 4,
  13. "end_offset": 9,
  14. "type": "<ALPHANUM>",
  15. "position": 1
  16. },
  17. {
  18. "token": "CAT",
  19. "start_offset": 10,
  20. "end_offset": 13,
  21. "type": "<ALPHANUM>",
  22. "position": 2
  23. },
  24. {
  25. "token": "jumps",
  26. "start_offset": 14,
  27. "end_offset": 19,
  28. "type": "<ALPHANUM>",
  29. "position": 3
  30. },
  31. {
  32. "token": "OVER",
  33. "start_offset": 20,
  34. "end_offset": 24,
  35. "type": "<ALPHANUM>",
  36. "position": 4
  37. },
  38. {
  39. "token": "A",
  40. "start_offset": 25,
  41. "end_offset": 26,
  42. "type": "<ALPHANUM>",
  43. "position": 5
  44. },
  45. {
  46. "token": "LAZY",
  47. "start_offset": 27,
  48. "end_offset": 31,
  49. "type": "<ALPHANUM>",
  50. "position": 6
  51. },
  52. {
  53. "token": "DOG",
  54. "start_offset": 32,
  55. "end_offset": 35,
  56. "type": "<ALPHANUM>",
  57. "position": 7
  58. }
  59. ]
  60. }