Length token filter
The length
token filter is used to remove tokens that don’t meet specified length criteria (minimum and maximum values) from the token stream.
Parameters
The length
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
min | Optional | Integer | The minimum token length. Default is 0 . |
max | Optional | Integer | The maximum token length. Default is Integer.MAX_VALUE (2147483647 ). |
Example
The following example request creates a new index named my_index
and configures an analyzer with a length
filter:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"only_keep_4_to_10_characters": {
"tokenizer": "whitespace",
"filter": [ "length_4_to_10" ]
}
},
"filter": {
"length_4_to_10": {
"type": "length",
"min": 4,
"max": 10
}
}
}
}
}
copy
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my_index/_analyze
{
"analyzer": "only_keep_4_to_10_characters",
"text": "OpenSearch is a great tool!"
}
copy
The response contains the generated tokens:
{
"tokens": [
{
"token": "OpenSearch",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
},
{
"token": "great",
"start_offset": 16,
"end_offset": 21,
"type": "word",
"position": 3
},
{
"token": "tool!",
"start_offset": 22,
"end_offset": 27,
"type": "word",
"position": 4
}
]
}
当前内容版权归 OpenSearch 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 OpenSearch .