KStem token filter
KStem token filter
Provides KStem-based stemming for the English language. The kstem
filter combines algorithmic stemming with a built-in dictionary.
The kstem
filter tends to stem less aggressively than other English stemmer filters, such as the porter_stem filter.
The kstem
filter is equivalent to the stemmer filter’s light_english variant.
This filter uses Lucene’s KStemFilter.
Example
The following analyze API request uses the kstem
filter to stem the foxes jumping quickly
to the fox jump quick
:
resp = client.indices.analyze(
tokenizer="standard",
filter=[
"kstem"
],
text="the foxes jumping quickly",
)
print(resp)
response = client.indices.analyze(
body: {
tokenizer: 'standard',
filter: [
'kstem'
],
text: 'the foxes jumping quickly'
}
)
puts response
const response = await client.indices.analyze({
tokenizer: "standard",
filter: ["kstem"],
text: "the foxes jumping quickly",
});
console.log(response);
GET /_analyze
{
"tokenizer": "standard",
"filter": [ "kstem" ],
"text": "the foxes jumping quickly"
}
The filter produces the following tokens:
[ the, fox, jump, quick ]
Add to an analyzer
The following create index API request uses the kstem
filter to configure a new custom analyzer.
To work properly, the kstem
filter requires lowercase tokens. To ensure tokens are lowercased, add the lowercase filter before the kstem
filter in the analyzer configuration.
resp = client.indices.create(
index="my-index-000001",
settings={
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"kstem"
]
}
}
}
},
)
print(resp)
response = client.indices.create(
index: 'my-index-000001',
body: {
settings: {
analysis: {
analyzer: {
my_analyzer: {
tokenizer: 'whitespace',
filter: [
'lowercase',
'kstem'
]
}
}
}
}
}
)
puts response
const response = await client.indices.create({
index: "my-index-000001",
settings: {
analysis: {
analyzer: {
my_analyzer: {
tokenizer: "whitespace",
filter: ["lowercase", "kstem"],
},
},
},
},
});
console.log(response);
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"kstem"
]
}
}
}
}
}