Keyword marker token filter

The keyword_marker token filter is used to prevent certain tokens from being altered by stemmers or other filters. The keyword_marker token filter does this by marking the specified tokens as keywords, which prevents any stemming or other processing. This ensures that specific words remain in their original form.

Parameters

The keyword_marker token filter can be configured with the following parameters.

ParameterRequired/OptionalData typeDescription
ignore_caseOptionalBooleanWhether to ignore the letter case when matching keywords. Default is false.
keywordsRequired if either keywords_path or keywords_pattern is not setList of stringsThe list of tokens to mark as keywords.
keywords_pathRequired if either keywords or keywords_pattern is not setStringThe path (relative to the config directory or absolute) to the list of keywords.
keywords_patternRequired if either keywords or keywords_path is not setStringA regular expression used for matching tokens to be marked as keywords.

Example

The following example request creates a new index named my_index and configures an analyzer with a keyword_marker filter. The filter marks the word example as a keyword:

  1. PUT /my_index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "custom_analyzer": {
  7. "type": "custom",
  8. "tokenizer": "standard",
  9. "filter": ["lowercase", "keyword_marker_filter", "stemmer"]
  10. }
  11. },
  12. "filter": {
  13. "keyword_marker_filter": {
  14. "type": "keyword_marker",
  15. "keywords": ["example"]
  16. }
  17. }
  18. }
  19. }
  20. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my_index/_analyze
  2. {
  3. "analyzer": "custom_analyzer",
  4. "text": "Favorite example"
  5. }

copy

The response contains the generated tokens. Note that while the word favorite was stemmed, the word example was not stemmed because it was marked as a keyword:

  1. {
  2. "tokens": [
  3. {
  4. "token": "favorit",
  5. "start_offset": 0,
  6. "end_offset": 8,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "example",
  12. "start_offset": 9,
  13. "end_offset": 16,
  14. "type": "<ALPHANUM>",
  15. "position": 1
  16. }
  17. ]
  18. }

You can further examine the impact of the keyword_marker token filter by adding the following parameters to the _analyze query:

  1. GET /my_index/_analyze
  2. {
  3. "analyzer": "custom_analyzer",
  4. "text": "This is an OpenSearch example demonstrating keyword marker.",
  5. "explain": true,
  6. "attributes": "keyword"
  7. }

copy

This will produce additional details in the response similar to the following:

  1. {
  2. "name": "porter_stem",
  3. "tokens": [
  4. ...
  5. {
  6. "token": "example",
  7. "start_offset": 22,
  8. "end_offset": 29,
  9. "type": "<ALPHANUM>",
  10. "position": 4,
  11. "keyword": true
  12. },
  13. {
  14. "token": "demonstr",
  15. "start_offset": 30,
  16. "end_offset": 43,
  17. "type": "<ALPHANUM>",
  18. "position": 5,
  19. "keyword": false
  20. },
  21. ...
  22. ]
  23. }