Stop token filter

The stop token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a or for. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.

The default list of English stopwords includes the following words: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, and with.

Parameters

The stop token filter can be configured with the following parameters.

ParameterRequired/OptionalData typeDescription
stopwordsOptionalStringSpecifies either a custom array of stopwords or a language for which to fetch the predefined Lucene stopword list:

- arabic
- armenian
- basque
- bengali
- brazilian (Brazilian Portuguese)
- bulgarian
- catalan
- cjk (Chinese, Japanese, and Korean)
- czech
- danish
- dutch
- english (Default)
- estonian
- finnish
- french
- galician
- german
- greek
- hindi
- hungarian
- indonesian
- irish
- italian
- latvian
- lithuanian
- norwegian
- persian
- portuguese
- romanian
- russian
- sorani
- spanish
- swedish
- thai
- turkish
stopwords_pathOptionalStringSpecifies the file path (absolute or relative to the config directory) of the file containing custom stopwords.
ignore_caseOptionalBooleanIf true, stopwords will be matched regardless of their case. Default is false.
remove_trailingOptionalBooleanIf true, trailing stopwords will be removed during analysis. Default is true.

Example

The following example request creates a new index named my-stopword-index and configures an analyzer with a stop filter that uses the predefined stopword list for the English language:

  1. PUT /my-stopword-index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "filter": {
  6. "my_stop_filter": {
  7. "type": "stop",
  8. "stopwords": "_english_"
  9. }
  10. },
  11. "analyzer": {
  12. "my_stop_analyzer": {
  13. "type": "custom",
  14. "tokenizer": "standard",
  15. "filter": [
  16. "lowercase",
  17. "my_stop_filter"
  18. ]
  19. }
  20. }
  21. }
  22. }
  23. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my-stopword-index/_analyze
  2. {
  3. "analyzer": "my_stop_analyzer",
  4. "text": "A quick dog jumps over the turtle"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "quick",
  5. "start_offset": 2,
  6. "end_offset": 7,
  7. "type": "<ALPHANUM>",
  8. "position": 1
  9. },
  10. {
  11. "token": "dog",
  12. "start_offset": 8,
  13. "end_offset": 11,
  14. "type": "<ALPHANUM>",
  15. "position": 2
  16. },
  17. {
  18. "token": "jumps",
  19. "start_offset": 12,
  20. "end_offset": 17,
  21. "type": "<ALPHANUM>",
  22. "position": 3
  23. },
  24. {
  25. "token": "over",
  26. "start_offset": 18,
  27. "end_offset": 22,
  28. "type": "<ALPHANUM>",
  29. "position": 4
  30. },
  31. {
  32. "token": "turtle",
  33. "start_offset": 27,
  34. "end_offset": 33,
  35. "type": "<ALPHANUM>",
  36. "position": 6
  37. }
  38. ]
  39. }