Decimal digit token filter
The decimal_digit
token filter is used to normalize decimal digit characters (0–9) into their ASCII equivalents in various scripts. This is useful when you want to ensure that all digits are treated uniformly in text analysis, regardless of the script in which they are written.
Example
The following example request creates a new index named my_index
and configures an analyzer with a decimal_digit
filter:
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_decimal_digit_filter": {
"type": "decimal_digit"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["my_decimal_digit_filter"]
}
}
}
}
}
copy
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
POST /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "123 ١٢٣ १२३"
}
copy
text
breakdown:
- “123” (ASCII digits)
- “١٢٣” (Arabic-Indic digits)
- “१२३” (Devanagari digits)
The response contains the generated tokens:
{
"tokens": [
{
"token": "123",
"start_offset": 0,
"end_offset": 3,
"type": "<NUM>",
"position": 0
},
{
"token": "123",
"start_offset": 4,
"end_offset": 7,
"type": "<NUM>",
"position": 1
},
{
"token": "123",
"start_offset": 8,
"end_offset": 11,
"type": "<NUM>",
"position": 2
}
]
}
当前内容版权归 OpenSearch 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 OpenSearch .