Stemmer token filter
Stemmer token filter
Provides algorithmic stemming for several languages, some with additional variants. For a list of supported languages, see the language parameter.
When not customized, the filter uses the porter stemming algorithm for English.
Example
The following analyze API request uses the stemmer
filter’s default porter stemming algorithm to stem the foxes jumping quickly
to the fox jump quickli
:
GET /_analyze
{
"tokenizer": "standard",
"filter": [ "stemmer" ],
"text": "the foxes jumping quickly"
}
The filter produces the following tokens:
[ the, fox, jump, quickli ]
Add to an analyzer
The following create index API request uses the stemmer
filter to configure a new custom analyzer.
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [ "stemmer" ]
}
}
}
}
}
Configurable parameters
language
(Optional, string) Language-dependent stemming algorithm used to stem tokens. If both this and the name
parameter are specified, the language
parameter argument is used.
Valid values for language
Valid values are sorted by language. Defaults to english. Recommended algorithms are bolded.
Arabic
Armenian
Basque
Bengali
Brazilian Portuguese
Bulgarian
Catalan
Czech
Danish
Dutch
English
english, light_english, lovins, minimal_english, porter2, possessive_english
Estonian
Finnish
French
light_french, french, minimal_french
Galician
galician, minimal_galician (Plural step only)
German
light_german, german, german2, minimal_german
Greek
Hindi
Hungarian
Indonesian
Irish
Italian
Kurdish (Sorani)
Latvian
Lithuanian
Norwegian (Bokmål)
norwegian, light_norwegian, minimal_norwegian
Norwegian (Nynorsk)
light_nynorsk, minimal_nynorsk
Portuguese
light_portuguese, minimal_portuguese, portuguese, portuguese_rslp
Romanian
Russian
Spanish
Swedish
Turkish
name
An alias for the language parameter. If both this and the language
parameter are specified, the language
parameter argument is used.
Customize
To customize the stemmer
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following request creates a custom stemmer
filter that stems words using the light_german
algorithm:
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stemmer"
]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"language": "light_german"
}
}
}
}
}