KStem token filter

The kstem token filter is a stemming filter used to reduce words to their root forms. The filter is a lightweight algorithmic stemmer designed for the English language that performs the following stemming operations:

  • Reduces plurals to their singular form.
  • Converts different verb tenses to their base form.
  • Removes common derivational endings, such as “-ing” or “-ed”.

The kstem token filter is equivalent to the a stemmer filter configured with a light_english language. It provides a more conservative stemming compared to other stemming filters like porter_stem.

The kstem token filter is based on the Lucene KStemFilter. For more information, see the Lucene documentation.

Example

The following example request creates a new index named my_kstem_index and configures an analyzer with a kstem filter:

  1. PUT /my_kstem_index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "filter": {
  6. "kstem_filter": {
  7. "type": "kstem"
  8. }
  9. },
  10. "analyzer": {
  11. "my_kstem_analyzer": {
  12. "type": "custom",
  13. "tokenizer": "standard",
  14. "filter": [
  15. "lowercase",
  16. "kstem_filter"
  17. ]
  18. }
  19. }
  20. }
  21. },
  22. "mappings": {
  23. "properties": {
  24. "content": {
  25. "type": "text",
  26. "analyzer": "my_kstem_analyzer"
  27. }
  28. }
  29. }
  30. }

copy

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

  1. POST /my_kstem_index/_analyze
  2. {
  3. "analyzer": "my_kstem_analyzer",
  4. "text": "stops stopped"
  5. }

copy

The response contains the generated tokens:

  1. {
  2. "tokens": [
  3. {
  4. "token": "stop",
  5. "start_offset": 0,
  6. "end_offset": 5,
  7. "type": "<ALPHANUM>",
  8. "position": 0
  9. },
  10. {
  11. "token": "stop",
  12. "start_offset": 6,
  13. "end_offset": 13,
  14. "type": "<ALPHANUM>",
  15. "position": 1
  16. }
  17. ]
  18. }