KStem token filter

KStem token filter

Provides KStem-based stemming for the English language. The kstem filter combines algorithmic stemming with a built-in dictionary.

The kstem filter tends to stem less aggressively than other English stemmer filters, such as the porter_stem filter.

The kstem filter is equivalent to the stemmer filter’s light_english variant.

This filter uses Lucene’s KStemFilter.

Example

The following analyze API request uses the kstem filter to stem the foxes jumping quickly to the fox jump quick:

  1. GET /_analyze
  2. {
  3. "tokenizer": "standard",
  4. "filter": [ "kstem" ],
  5. "text": "the foxes jumping quickly"
  6. }

The filter produces the following tokens:

  1. [ the, fox, jump, quick ]

Add to an analyzer

The following create index API request uses the kstem filter to configure a new custom analyzer.

To work properly, the kstem filter requires lowercase tokens. To ensure tokens are lowercased, add the lowercase filter before the kstem filter in the analyzer configuration.

  1. PUT /my-index-000001
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "my_analyzer": {
  7. "tokenizer": "whitespace",
  8. "filter": [
  9. "lowercase",
  10. "kstem"
  11. ]
  12. }
  13. }
  14. }
  15. }
  16. }