Keep types token filter
Keeps or removes tokens of a specific type. For example, you can use this filter to change 3 quick foxes
to quick foxes
by keeping only <ALPHANUM>
(alphanumeric) tokens.
Token types
Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.
For example, the standard
tokenizer can produce a variety of token types, including <ALPHANUM>
, <HANGUL>
, and <NUM>
. Simpler analyzers, like the lowercase
tokenizer, only produce the word
token type.
Certain token filters can also add token types. For example, the synonym
filter can add the <SYNONYM>
token type.
This filter uses Lucene’s TypeTokenFilter.
Include example
The following analyze API request uses the keep_types
filter to keep only <NUM>
(numeric) tokens from 1 quick fox 2 lazy dogs
.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ]
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ 1, 2 ]
Exclude example
The following analyze API request uses the keep_types
filter to remove <NUM>
tokens from 1 quick fox 2 lazy dogs
. Note the mode
parameter is set to exclude
.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ],
"mode": "exclude"
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ quick, fox, lazy, dogs ]
Configurable parameters
types
(Required, array of strings) List of token types to keep or remove.
mode
(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:
include
(Default) Keep only the specified token types.
exclude
Remove the specified token types.
Customize and add to an analyzer
To customize the keep_types
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following create index API request uses a custom keep_types
filter to configure a new custom analyzer. The custom keep_types
filter keeps only <ALPHANUM>
(alphanumeric) tokens.
PUT keep_types_example
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [ "extract_alpha" ]
}
},
"filter": {
"extract_alpha": {
"type": "keep_types",
"types": [ "<ALPHANUM>" ]
}
}
}
}
}