Mapping character filter

Mapping character filter

The mapping character filter accepts a map of keys and values. Whenever it encounters a string of characters that is the same as a key, it replaces them with the value associated with that key.

Matching is greedy; the longest pattern matching at a given point wins. Replacements are allowed to be the empty string.

The mapping filter uses Lucene’s MappingCharFilter.

Example

The following analyze API request uses the mapping filter to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), changing the text My license plate is ٢٥٠١٥ to My license plate is 25015.

  1. GET /_analyze
  2. {
  3. "tokenizer": "keyword",
  4. "char_filter": [
  5. {
  6. "type": "mapping",
  7. "mappings": [
  8. "٠ => 0",
  9. "١ => 1",
  10. "٢ => 2",
  11. "٣ => 3",
  12. "٤ => 4",
  13. "٥ => 5",
  14. "٦ => 6",
  15. "٧ => 7",
  16. "٨ => 8",
  17. "٩ => 9"
  18. ]
  19. }
  20. ],
  21. "text": "My license plate is ٢٥٠١٥"
  22. }

The filter produces the following text:

  1. [ My license plate is 25015 ]

Configurable parameters

mappings

(Required*, array of strings) Array of mappings, with each element having the form key => value.

Either this or the mappings_path parameter must be specified.

mappings_path

(Required*, string) Path to a file containing key => value mappings.

This path must be absolute or relative to the config location, and the file must be UTF-8 encoded. Each mapping in the file must be separated by a line break.

Either this or the mappings parameter must be specified.

Customize and add to an analyzer

To customize the mappings filter, duplicate it to create the basis for a new custom character filter. You can modify the filter using its configurable parameters.

The following create index API request configures a new custom analyzer using a custom mappings filter, my_mappings_char_filter.

The my_mappings_char_filter filter replaces the :) and :( emoticons with a text equivalent.

  1. PUT /my-index-000001
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "my_analyzer": {
  7. "tokenizer": "standard",
  8. "char_filter": [
  9. "my_mappings_char_filter"
  10. ]
  11. }
  12. },
  13. "char_filter": {
  14. "my_mappings_char_filter": {
  15. "type": "mapping",
  16. "mappings": [
  17. ":) => _happy_",
  18. ":( => _sad_"
  19. ]
  20. }
  21. }
  22. }
  23. }
  24. }

The following analyze API request uses the custom my_mappings_char_filter to replace :( with _sad_ in the text I'm delighted about it :(.

  1. GET /my-index-000001/_analyze
  2. {
  3. "tokenizer": "keyword",
  4. "char_filter": [ "my_mappings_char_filter" ],
  5. "text": "I'm delighted about it :("
  6. }

The filter produces the following text:

  1. [ I'm delighted about it _sad_ ]