Unique token filter
Unique token filter
Removes duplicate tokens from a stream. For example, you can use the unique
filter to change the lazy lazy dog
to the lazy dog
.
If the only_on_same_position
parameter is set to true
, the unique
filter removes only duplicate tokens in the same position.
When only_on_same_position
is true
, the unique
filter works the same as remove_duplicates filter.
Example
The following analyze API request uses the unique
filter to remove duplicate tokens from the quick fox jumps the lazy fox
:
resp = client.indices.analyze(
tokenizer="whitespace",
filter=[
"unique"
],
text="the quick fox jumps the lazy fox",
)
print(resp)
response = client.indices.analyze(
body: {
tokenizer: 'whitespace',
filter: [
'unique'
],
text: 'the quick fox jumps the lazy fox'
}
)
puts response
const response = await client.indices.analyze({
tokenizer: "whitespace",
filter: ["unique"],
text: "the quick fox jumps the lazy fox",
});
console.log(response);
GET _analyze
{
"tokenizer" : "whitespace",
"filter" : ["unique"],
"text" : "the quick fox jumps the lazy fox"
}
The filter removes duplicated tokens for the
and fox
, producing the following output:
[ the, quick, fox, jumps, lazy ]
Add to an analyzer
The following create index API request uses the unique
filter to configure a new custom analyzer.
resp = client.indices.create(
index="custom_unique_example",
settings={
"analysis": {
"analyzer": {
"standard_truncate": {
"tokenizer": "standard",
"filter": [
"unique"
]
}
}
}
},
)
print(resp)
response = client.indices.create(
index: 'custom_unique_example',
body: {
settings: {
analysis: {
analyzer: {
standard_truncate: {
tokenizer: 'standard',
filter: [
'unique'
]
}
}
}
}
}
)
puts response
const response = await client.indices.create({
index: "custom_unique_example",
settings: {
analysis: {
analyzer: {
standard_truncate: {
tokenizer: "standard",
filter: ["unique"],
},
},
},
},
});
console.log(response);
PUT custom_unique_example
{
"settings" : {
"analysis" : {
"analyzer" : {
"standard_truncate" : {
"tokenizer" : "standard",
"filter" : ["unique"]
}
}
}
}
}
Configurable parameters
only_on_same_position
(Optional, Boolean) If true
, only remove duplicate tokens in the same position. Defaults to false
.
Customize
To customize the unique
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following request creates a custom unique
filter with only_on_same_position
set to true
.
resp = client.indices.create(
index="letter_unique_pos_example",
settings={
"analysis": {
"analyzer": {
"letter_unique_pos": {
"tokenizer": "letter",
"filter": [
"unique_pos"
]
}
},
"filter": {
"unique_pos": {
"type": "unique",
"only_on_same_position": True
}
}
}
},
)
print(resp)
response = client.indices.create(
index: 'letter_unique_pos_example',
body: {
settings: {
analysis: {
analyzer: {
letter_unique_pos: {
tokenizer: 'letter',
filter: [
'unique_pos'
]
}
},
filter: {
unique_pos: {
type: 'unique',
only_on_same_position: true
}
}
}
}
}
)
puts response
const response = await client.indices.create({
index: "letter_unique_pos_example",
settings: {
analysis: {
analyzer: {
letter_unique_pos: {
tokenizer: "letter",
filter: ["unique_pos"],
},
},
filter: {
unique_pos: {
type: "unique",
only_on_same_position: true,
},
},
},
},
});
console.log(response);
PUT letter_unique_pos_example
{
"settings": {
"analysis": {
"analyzer": {
"letter_unique_pos": {
"tokenizer": "letter",
"filter": [ "unique_pos" ]
}
},
"filter": {
"unique_pos": {
"type": "unique",
"only_on_same_position": true
}
}
}
}
}