Remove duplicates token filter
The remove_duplicates
token filter is used to remove duplicate tokens that are generated in the same position during analysis.
Example
The following example request creates an index with a keyword_repeat
token filter. The filter adds a keyword
version of each token in the same position as the token itself and then uses a kstem
to create a stemmed version of the token:
PUT /example-index
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"keyword_repeat",
"kstem"
]
}
}
}
}
}
copy
Use the following request to analyze the string Slower turtle
:
GET /example-index/_analyze
{
"analyzer": "custom_analyzer",
"text": "Slower turtle"
}
copy
The response contains the token turtle
twice in the same position:
{
"tokens": [
{
"token": "slower",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "slow",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "turtle",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "turtle",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}
The duplicate token can be removed by adding a remove_duplicates
token filter to the index settings:
PUT /index-remove-duplicate
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"keyword_repeat",
"kstem",
"remove_duplicates"
]
}
}
}
}
}
copy
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /index-remove-duplicate/_analyze
{
"analyzer": "custom_analyzer",
"text": "Slower turtle"
}
copy
The response contains the generated tokens:
{
"tokens": [
{
"token": "slower",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "slow",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "turtle",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}