Unique token filter
Unique token filter
Removes duplicate tokens from a stream. For example, you can use the unique
filter to change the lazy lazy dog
to the lazy dog
.
If the only_on_same_position
parameter is set to true
, the unique
filter removes only duplicate tokens in the same position.
When only_on_same_position
is true
, the unique
filter works the same as remove_duplicates filter.
Example
The following analyze API request uses the unique
filter to remove duplicate tokens from the quick fox jumps the lazy fox
:
GET _analyze
{
"tokenizer" : "whitespace",
"filter" : ["unique"],
"text" : "the quick fox jumps the lazy fox"
}
The filter removes duplicated tokens for the
and fox
, producing the following output:
[ the, quick, fox, jumps, lazy ]
Add to an analyzer
The following create index API request uses the unique
filter to configure a new custom analyzer.
PUT custom_unique_example
{
"settings" : {
"analysis" : {
"analyzer" : {
"standard_truncate" : {
"tokenizer" : "standard",
"filter" : ["unique"]
}
}
}
}
}
Configurable parameters
only_on_same_position
(Optional, Boolean) If true
, only remove duplicate tokens in the same position. Defaults to false
.
Customize
To customize the unique
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following request creates a custom unique
filter with only_on_same_position
set to true
.
PUT letter_unique_pos_example
{
"settings": {
"analysis": {
"analyzer": {
"letter_unique_pos": {
"tokenizer": "letter",
"filter": [ "unique_pos" ]
}
},
"filter": {
"unique_pos": {
"type": "unique",
"only_on_same_position": true
}
}
}
}
}