Stemmer token filter

The stemmer token filter reduces words to their root or base form (also known as their stem).

Parameters

The stemmer token filter can be configured with a language parameter that accepts the following values:

  • Arabic: arabic
  • Armenian: armenian
  • Basque: basque
  • Bengali: bengali
  • Brazilian Portuguese: brazilian
  • Bulgarian: bulgarian
  • Catalan: catalan
  • Czech: czech
  • Danish: danish
  • Dutch: dutch, dutch_kp
  • English: english (default), light_english, lovins, minimal_english, porter2, possessive_english
  • Estonian: estonian
  • Finnish: finnish, light_finnish
  • French: light_french, french, minimal_french
  • Galician: galician, minimal_galician (plural step only)
  • German: light_german, german, german2, minimal_german
  • Greek: greek
  • Hindi: hindi
  • Hungarian: hungarian, light_hungarian
  • Indonesian: indonesian
  • Irish: irish
  • Italian: light_italian, italian
  • Kurdish (Sorani): sorani
  • Latvian: latvian
  • Lithuanian: lithuanian
  • Norwegian (Bokmål): norwegian, light_norwegian, minimal_norwegian
  • Norwegian (Nynorsk): light_nynorsk, minimal_nynorsk
  • Portuguese: light_portuguese, minimal_portuguese, portuguese, portuguese_rslp
  • Romanian: romanian
  • Russian: russian, light_russian
  • Spanish: light_spanish, spanish
  • Swedish: swedish, light_swedish
  • Turkish: turkish

You can also use the name parameter as an alias for the language parameter. If both are set, the name parameter is ignored.

Example

The following example request creates a new index named my-stemmer-index and configures an analyzer with a stemmer filter:

  1. PUT /my-stemmer-index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "filter": {
  6. "my_english_stemmer": {
  7. "type": "stemmer",
  8. "language": "english"
  9. }
  10. },
  11. "analyzer": {
  12. "my_stemmer_analyzer": {
  13. "type": "custom",
  14. "tokenizer": "standard",
  15. "filter": [
  16. "lowercase",
  17. "my_english_stemmer"
  18. ]
  19. }
  20. }
  21. }
  22. }
  23. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. GET /my-stemmer-index/_analyze
  2. {
  3. "analyzer": "my_stemmer_analyzer",
  4. "text": "running runs"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "run",
  5. "start_offset": 0,
  6. "end_offset": 7,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "run",
  12. "start_offset": 8,
  13. "end_offset": 12,
  14. "type": "<ALPHANUM>",
  15. "position": 1
  16. }
  17. ]
  18. }