Mapping character filter

Mapping character filter

The mapping character filter accepts a map of keys and values. Whenever it encounters a string of characters that is the same as a key, it replaces them with the value associated with that key.

Matching is greedy; the longest pattern matching at a given point wins. Replacements are allowed to be the empty string.

The mapping filter uses Lucene’s MappingCharFilter.

Example

The following analyze API request uses the mapping filter to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), changing the text My license plate is ٢٥٠١٥ to My license plate is 25015.

  1. resp = client.indices.analyze(
  2. tokenizer="keyword",
  3. char_filter=[
  4. {
  5. "type": "mapping",
  6. "mappings": [
  7. "٠ => 0",
  8. "١ => 1",
  9. "٢ => 2",
  10. "٣ => 3",
  11. "٤ => 4",
  12. "٥ => 5",
  13. "٦ => 6",
  14. "٧ => 7",
  15. "٨ => 8",
  16. "٩ => 9"
  17. ]
  18. }
  19. ],
  20. text="My license plate is ٢٥٠١٥",
  21. )
  22. print(resp)
  1. response = client.indices.analyze(
  2. body: {
  3. tokenizer: 'keyword',
  4. char_filter: [
  5. {
  6. type: 'mapping',
  7. mappings: [
  8. '٠ => 0',
  9. '١ => 1',
  10. '٢ => 2',
  11. '٣ => 3',
  12. '٤ => 4',
  13. '٥ => 5',
  14. '٦ => 6',
  15. '٧ => 7',
  16. '٨ => 8',
  17. '٩ => 9'
  18. ]
  19. }
  20. ],
  21. text: 'My license plate is ٢٥٠١٥'
  22. }
  23. )
  24. puts response
  1. const response = await client.indices.analyze({
  2. tokenizer: "keyword",
  3. char_filter: [
  4. {
  5. type: "mapping",
  6. mappings: [
  7. "٠ => 0",
  8. "١ => 1",
  9. "٢ => 2",
  10. "٣ => 3",
  11. "٤ => 4",
  12. "٥ => 5",
  13. "٦ => 6",
  14. "٧ => 7",
  15. "٨ => 8",
  16. "٩ => 9",
  17. ],
  18. },
  19. ],
  20. text: "My license plate is ٢٥٠١٥",
  21. });
  22. console.log(response);
  1. GET /_analyze
  2. {
  3. "tokenizer": "keyword",
  4. "char_filter": [
  5. {
  6. "type": "mapping",
  7. "mappings": [
  8. "٠ => 0",
  9. "١ => 1",
  10. "٢ => 2",
  11. "٣ => 3",
  12. "٤ => 4",
  13. "٥ => 5",
  14. "٦ => 6",
  15. "٧ => 7",
  16. "٨ => 8",
  17. "٩ => 9"
  18. ]
  19. }
  20. ],
  21. "text": "My license plate is ٢٥٠١٥"
  22. }

The filter produces the following text:

  1. [ My license plate is 25015 ]

Configurable parameters

mappings

(Required*, array of strings) Array of mappings, with each element having the form key => value.

Either this or the mappings_path parameter must be specified.

mappings_path

(Required*, string) Path to a file containing key => value mappings.

This path must be absolute or relative to the config location, and the file must be UTF-8 encoded. Each mapping in the file must be separated by a line break.

Either this or the mappings parameter must be specified.

Customize and add to an analyzer

To customize the mappings filter, duplicate it to create the basis for a new custom character filter. You can modify the filter using its configurable parameters.

The following create index API request configures a new custom analyzer using a custom mappings filter, my_mappings_char_filter.

The my_mappings_char_filter filter replaces the :) and :( emoticons with a text equivalent.

  1. resp = client.indices.create(
  2. index="my-index-000001",
  3. settings={
  4. "analysis": {
  5. "analyzer": {
  6. "my_analyzer": {
  7. "tokenizer": "standard",
  8. "char_filter": [
  9. "my_mappings_char_filter"
  10. ]
  11. }
  12. },
  13. "char_filter": {
  14. "my_mappings_char_filter": {
  15. "type": "mapping",
  16. "mappings": [
  17. ":) => _happy_",
  18. ":( => _sad_"
  19. ]
  20. }
  21. }
  22. }
  23. },
  24. )
  25. print(resp)
  1. response = client.indices.create(
  2. index: 'my-index-000001',
  3. body: {
  4. settings: {
  5. analysis: {
  6. analyzer: {
  7. my_analyzer: {
  8. tokenizer: 'standard',
  9. char_filter: [
  10. 'my_mappings_char_filter'
  11. ]
  12. }
  13. },
  14. char_filter: {
  15. my_mappings_char_filter: {
  16. type: 'mapping',
  17. mappings: [
  18. ':) => _happy_',
  19. ':( => _sad_'
  20. ]
  21. }
  22. }
  23. }
  24. }
  25. }
  26. )
  27. puts response
  1. const response = await client.indices.create({
  2. index: "my-index-000001",
  3. settings: {
  4. analysis: {
  5. analyzer: {
  6. my_analyzer: {
  7. tokenizer: "standard",
  8. char_filter: ["my_mappings_char_filter"],
  9. },
  10. },
  11. char_filter: {
  12. my_mappings_char_filter: {
  13. type: "mapping",
  14. mappings: [":) => _happy_", ":( => _sad_"],
  15. },
  16. },
  17. },
  18. },
  19. });
  20. console.log(response);
  1. PUT /my-index-000001
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "my_analyzer": {
  7. "tokenizer": "standard",
  8. "char_filter": [
  9. "my_mappings_char_filter"
  10. ]
  11. }
  12. },
  13. "char_filter": {
  14. "my_mappings_char_filter": {
  15. "type": "mapping",
  16. "mappings": [
  17. ":) => _happy_",
  18. ":( => _sad_"
  19. ]
  20. }
  21. }
  22. }
  23. }
  24. }

The following analyze API request uses the custom my_mappings_char_filter to replace :( with _sad_ in the text I'm delighted about it :(.

  1. resp = client.indices.analyze(
  2. index="my-index-000001",
  3. tokenizer="keyword",
  4. char_filter=[
  5. "my_mappings_char_filter"
  6. ],
  7. text="I'm delighted about it :(",
  8. )
  9. print(resp)
  1. const response = await client.indices.analyze({
  2. index: "my-index-000001",
  3. tokenizer: "keyword",
  4. char_filter: ["my_mappings_char_filter"],
  5. text: "I'm delighted about it :(",
  6. });
  7. console.log(response);
  1. GET /my-index-000001/_analyze
  2. {
  3. "tokenizer": "keyword",
  4. "char_filter": [ "my_mappings_char_filter" ],
  5. "text": "I'm delighted about it :("
  6. }

The filter produces the following text:

  1. [ I'm delighted about it _sad_ ]