Elision token filter
Elision token filter
Removes specified elisions from the beginning of tokens. For example, you can use this filter to change l'avion
to avion
.
When not customized, the filter removes the following French elisions by default:
l'
, m'
, t'
, qu'
, n'
, s'
, j'
, d'
, c'
, jusqu'
, quoiqu'
, lorsqu'
, puisqu'
Customized versions of this filter are included in several of Elasticsearch’s built-in language analyzers:
This filter uses Lucene’s ElisionFilter.
Example
The following analyze API request uses the elision
filter to remove j'
from j’examine près du wharf
:
GET _analyze
{
"tokenizer" : "standard",
"filter" : ["elision"],
"text" : "j’examine près du wharf"
}
The filter produces the following tokens:
[ examine, près, du, wharf ]
Add to an analyzer
The following create index API request uses the elision
filter to configure a new custom analyzer.
PUT /elision_example
{
"settings": {
"analysis": {
"analyzer": {
"whitespace_elision": {
"tokenizer": "whitespace",
"filter": [ "elision" ]
}
}
}
}
}
Configurable parameters
articles
(Required*, array of string) List of elisions to remove.
To be removed, the elision must be at the beginning of a token and be immediately followed by an apostrophe. Both the elision and apostrophe are removed.
For custom elision
filters, either this parameter or articles_path
must be specified.
articles_path
(Required*, string) Path to a file that contains a list of elisions to remove.
This path must be absolute or relative to the config
location, and the file must be UTF-8 encoded. Each elision in the file must be separated by a line break.
To be removed, the elision must be at the beginning of a token and be immediately followed by an apostrophe. Both the elision and apostrophe are removed.
For custom elision
filters, either this parameter or articles
must be specified.
articles_case
(Optional, Boolean) If true
, elision matching is case insensitive. If false
, elision matching is case sensitive. Defaults to false
.
Customize
To customize the elision
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following request creates a custom case-insensitive elision
filter that removes the l'
, m'
, t'
, qu'
, n'
, s'
, and j'
elisions:
PUT /elision_case_insensitive_example
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "whitespace",
"filter": [ "elision_case_insensitive" ]
}
},
"filter": {
"elision_case_insensitive": {
"type": "elision",
"articles": [ "l", "m", "t", "qu", "n", "s", "j" ],
"articles_case": true
}
}
}
}
}