Synonym token filter

The synonym token filter allows you to map multiple terms to a single term or create equivalence groups between words, improving search flexibility.

Parameters

The synonym token filter can be configured with the following parameters.

ParameterRequired/OptionalData typeDescription
synonymsEither synonyms or synonyms_path must be specifiedStringA list of synonym rules defined directly in the configuration.
synonyms_pathEither synonyms or synonyms_path must be specifiedStringThe file path to a file containing synonym rules (either an absolute path or a path relative to the config directory).
lenientOptionalBooleanWhether to ignore exceptions when loading the rule configurations. Default is false.
formatOptionalStringSpecifies the format used to determine how OpenSearch defines and interprets synonyms. Valid values are:
- solr
- wordnet.
Default is solr.
expandOptionalBooleanWhether to expand equivalent synonym rules. Default is false.

For example:
If synonyms are defined as “quick, fast” and expand is set to true, then the synonym rules are configured as follows:
- quick => quick
- quick => fast
- fast => quick
- fast => fast

If expand is set to false, the synonym rules are configured as follows:
- quick => quick
- fast => quick

Example: Solr format

The following example request creates a new index named my-synonym-index and configures an analyzer with a synonym filter. The filter is configured with the default solr rule format:

  1. PUT /my-synonym-index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "filter": {
  6. "my_synonym_filter": {
  7. "type": "synonym",
  8. "synonyms": [
  9. "car, automobile",
  10. "quick, fast, speedy",
  11. "laptop => computer"
  12. ]
  13. }
  14. },
  15. "analyzer": {
  16. "my_synonym_analyzer": {
  17. "type": "custom",
  18. "tokenizer": "standard",
  19. "filter": [
  20. "lowercase",
  21. "my_synonym_filter"
  22. ]
  23. }
  24. }
  25. }
  26. }
  27. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my-synonym-index/_analyze
  2. {
  3. "analyzer": "my_synonym_analyzer",
  4. "text": "The quick dog jumps into the car with a laptop"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "the",
  5. "start_offset": 0,
  6. "end_offset": 3,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "quick",
  12. "start_offset": 4,
  13. "end_offset": 9,
  14. "type": "<ALPHANUM>",
  15. "position": 1
  16. },
  17. {
  18. "token": "fast",
  19. "start_offset": 4,
  20. "end_offset": 9,
  21. "type": "SYNONYM",
  22. "position": 1
  23. },
  24. {
  25. "token": "speedy",
  26. "start_offset": 4,
  27. "end_offset": 9,
  28. "type": "SYNONYM",
  29. "position": 1
  30. },
  31. {
  32. "token": "dog",
  33. "start_offset": 10,
  34. "end_offset": 13,
  35. "type": "<ALPHANUM>",
  36. "position": 2
  37. },
  38. {
  39. "token": "jumps",
  40. "start_offset": 14,
  41. "end_offset": 19,
  42. "type": "<ALPHANUM>",
  43. "position": 3
  44. },
  45. {
  46. "token": "into",
  47. "start_offset": 20,
  48. "end_offset": 24,
  49. "type": "<ALPHANUM>",
  50. "position": 4
  51. },
  52. {
  53. "token": "the",
  54. "start_offset": 25,
  55. "end_offset": 28,
  56. "type": "<ALPHANUM>",
  57. "position": 5
  58. },
  59. {
  60. "token": "car",
  61. "start_offset": 29,
  62. "end_offset": 32,
  63. "type": "<ALPHANUM>",
  64. "position": 6
  65. },
  66. {
  67. "token": "automobile",
  68. "start_offset": 29,
  69. "end_offset": 32,
  70. "type": "SYNONYM",
  71. "position": 6
  72. },
  73. {
  74. "token": "with",
  75. "start_offset": 33,
  76. "end_offset": 37,
  77. "type": "<ALPHANUM>",
  78. "position": 7
  79. },
  80. {
  81. "token": "a",
  82. "start_offset": 38,
  83. "end_offset": 39,
  84. "type": "<ALPHANUM>",
  85. "position": 8
  86. },
  87. {
  88. "token": "computer",
  89. "start_offset": 40,
  90. "end_offset": 46,
  91. "type": "SYNONYM",
  92. "position": 9
  93. }
  94. ]
  95. }

Example: WordNet format

The following example request creates a new index named my-wordnet-index and configures an analyzer with a synonym filter. The filter is configured with the wordnet rule format:

  1. PUT /my-wordnet-index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "filter": {
  6. "my_wordnet_synonym_filter": {
  7. "type": "synonym",
  8. "format": "wordnet",
  9. "synonyms": [
  10. "s(100000001,1,'fast',v,1,0).",
  11. "s(100000001,2,'quick',v,1,0).",
  12. "s(100000001,3,'swift',v,1,0)."
  13. ]
  14. }
  15. },
  16. "analyzer": {
  17. "my_wordnet_analyzer": {
  18. "type": "custom",
  19. "tokenizer": "standard",
  20. "filter": [
  21. "lowercase",
  22. "my_wordnet_synonym_filter"
  23. ]
  24. }
  25. }
  26. }
  27. }
  28. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my-wordnet-index/_analyze
  2. {
  3. "analyzer": "my_wordnet_analyzer",
  4. "text": "I have a fast car"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "i",
  5. "start_offset": 0,
  6. "end_offset": 1,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "have",
  12. "start_offset": 2,
  13. "end_offset": 6,
  14. "type": "<ALPHANUM>",
  15. "position": 1
  16. },
  17. {
  18. "token": "a",
  19. "start_offset": 7,
  20. "end_offset": 8,
  21. "type": "<ALPHANUM>",
  22. "position": 2
  23. },
  24. {
  25. "token": "fast",
  26. "start_offset": 9,
  27. "end_offset": 13,
  28. "type": "<ALPHANUM>",
  29. "position": 3
  30. },
  31. {
  32. "token": "quick",
  33. "start_offset": 9,
  34. "end_offset": 13,
  35. "type": "SYNONYM",
  36. "position": 3
  37. },
  38. {
  39. "token": "swift",
  40. "start_offset": 9,
  41. "end_offset": 13,
  42. "type": "SYNONYM",
  43. "position": 3
  44. },
  45. {
  46. "token": "car",
  47. "start_offset": 14,
  48. "end_offset": 17,
  49. "type": "<ALPHANUM>",
  50. "position": 4
  51. }
  52. ]
  53. }