Normalizers

Normalizers

Normalizers are similar to analyzers except that they may only emit a single token. As a consequence, they do not have a tokenizer and only accept a subset of the available char filters and token filters. Only the filters that work on a per-character basis are allowed. For instance a lowercasing filter would be allowed, but not a stemming filter, which needs to look at the keyword as a whole. The current list of filters that can be used in a normalizer definition are: arabic_normalization, asciifolding, bengali_normalization, cjk_width, decimal_digit, elision, german_normalization, hindi_normalization, indic_normalization, lowercase, pattern_replace, persian_normalization, scandinavian_folding, serbian_normalization, sorani_normalization, trim, uppercase.

Elasticsearch ships with a lowercase built-in normalizer. For other forms of normalization, a custom configuration is required.

Custom normalizers

Custom normalizers take a list of character filters and a list of token filters.

  1. resp = client.indices.create(
  2. index="index",
  3. settings={
  4. "analysis": {
  5. "char_filter": {
  6. "quote": {
  7. "type": "mapping",
  8. "mappings": [
  9. "« => \"",
  10. "» => \""
  11. ]
  12. }
  13. },
  14. "normalizer": {
  15. "my_normalizer": {
  16. "type": "custom",
  17. "char_filter": [
  18. "quote"
  19. ],
  20. "filter": [
  21. "lowercase",
  22. "asciifolding"
  23. ]
  24. }
  25. }
  26. }
  27. },
  28. mappings={
  29. "properties": {
  30. "foo": {
  31. "type": "keyword",
  32. "normalizer": "my_normalizer"
  33. }
  34. }
  35. },
  36. )
  37. print(resp)
  1. response = client.indices.create(
  2. index: 'index',
  3. body: {
  4. settings: {
  5. analysis: {
  6. char_filter: {
  7. quote: {
  8. type: 'mapping',
  9. mappings: [
  10. '« => "',
  11. '» => "'
  12. ]
  13. }
  14. },
  15. normalizer: {
  16. my_normalizer: {
  17. type: 'custom',
  18. char_filter: [
  19. 'quote'
  20. ],
  21. filter: [
  22. 'lowercase',
  23. 'asciifolding'
  24. ]
  25. }
  26. }
  27. }
  28. },
  29. mappings: {
  30. properties: {
  31. foo: {
  32. type: 'keyword',
  33. normalizer: 'my_normalizer'
  34. }
  35. }
  36. }
  37. }
  38. )
  39. puts response
  1. const response = await client.indices.create({
  2. index: "index",
  3. settings: {
  4. analysis: {
  5. char_filter: {
  6. quote: {
  7. type: "mapping",
  8. mappings: ['« => "', '» => "'],
  9. },
  10. },
  11. normalizer: {
  12. my_normalizer: {
  13. type: "custom",
  14. char_filter: ["quote"],
  15. filter: ["lowercase", "asciifolding"],
  16. },
  17. },
  18. },
  19. },
  20. mappings: {
  21. properties: {
  22. foo: {
  23. type: "keyword",
  24. normalizer: "my_normalizer",
  25. },
  26. },
  27. },
  28. });
  29. console.log(response);
  1. PUT index
  2. {
  3. "settings": {
  4. "analysis": {
  5. "char_filter": {
  6. "quote": {
  7. "type": "mapping",
  8. "mappings": [
  9. "« => \"",
  10. "» => \""
  11. ]
  12. }
  13. },
  14. "normalizer": {
  15. "my_normalizer": {
  16. "type": "custom",
  17. "char_filter": ["quote"],
  18. "filter": ["lowercase", "asciifolding"]
  19. }
  20. }
  21. }
  22. },
  23. "mappings": {
  24. "properties": {
  25. "foo": {
  26. "type": "keyword",
  27. "normalizer": "my_normalizer"
  28. }
  29. }
  30. }
  31. }