normalizer
normalizer
The normalizer
property of keyword fields is similar to analyzer except that it guarantees that the analysis chain produces a single token.
The normalizer
is applied prior to indexing the keyword, as well as at search-time when the keyword
field is searched via a query parser such as the match query or via a term-level query such as the term query.
A simple normalizer called lowercase
ships with elasticsearch and can be used. Custom normalizers can be defined as part of analysis settings as follows.
PUT index
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
PUT index/_doc/1
{
"foo": "BÀR"
}
PUT index/_doc/2
{
"foo": "bar"
}
PUT index/_doc/3
{
"foo": "baz"
}
POST index/_refresh
GET index/_search
{
"query": {
"term": {
"foo": "BAR"
}
}
}
GET index/_search
{
"query": {
"match": {
"foo": "BAR"
}
}
}
The above queries match documents 1 and 2 since BÀR
is converted to bar
at both index and query time.
{
"took": $body.took,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped" : 0,
"failed": 0
},
"hits": {
"total" : {
"value": 2,
"relation": "eq"
},
"max_score": 0.4700036,
"hits": [
{
"_index": "index",
"_type": "_doc",
"_id": "1",
"_score": 0.4700036,
"_source": {
"foo": "BÀR"
}
},
{
"_index": "index",
"_type": "_doc",
"_id": "2",
"_score": 0.4700036,
"_source": {
"foo": "bar"
}
}
]
}
}
Also, the fact that keywords are converted prior to indexing also means that aggregations return normalized values:
GET index/_search
{
"size": 0,
"aggs": {
"foo_terms": {
"terms": {
"field": "foo"
}
}
}
}
returns
{
"took": 43,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped" : 0,
"failed": 0
},
"hits": {
"total" : {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"foo_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "bar",
"doc_count": 2
},
{
"key": "baz",
"doc_count": 1
}
]
}
}
}