Phone number analyzers

The analysis-phonenumber plugin provides analyzers and tokenizers for parsing phone numbers. A dedicated analyzer is required because parsing phone numbers is a non-trivial task (even though it might seem trivial at first glance). For common misconceptions regarding phone number parsing, see Falsehoods programmers believe about phone numbers.

OpenSearch supports the following phone number analyzers:

Internally, the plugin uses the libphonenumber library and follows its parsing rules.

The phone number analyzers are not meant to find phone numbers in larger texts. Instead, you should use them on fields that only contain phone numbers.

Installing the plugin

Before you can use the phone number analyzers, you must install the analysis-phonenumber plugin by running the following command:

  1. ./bin/opensearch-plugin install analysis-phonenumber

Specifying a default region

You can optionally specify a default region for parsing phone numbers by providing the phone-region parameter within the analyzer. Valid phone regions are represented by ISO 3166 country codes. For more information, see List of ISO 3166 country codes.

When tokenizing phone numbers containing the international calling prefix +, the default region is irrelevant. However, for phone numbers that use a national prefix for international numbers (for example, 001 instead of +1 to dial Northern America from most European countries), the region needs to be provided. You can also properly index local phone numbers with no international prefix by specifying the region.

Example

The following request creates an index containing one field that ingests phone numbers for Switzerland (region code CH):

  1. PUT /example-phone
  2. {
  3. "settings": {
  4. "analysis": {
  5. "analyzer": {
  6. "phone-ch": {
  7. "type": "phone",
  8. "phone-region": "CH"
  9. },
  10. "phone-search-ch": {
  11. "type": "phone-search",
  12. "phone-region": "CH"
  13. }
  14. }
  15. }
  16. },
  17. "mappings": {
  18. "properties": {
  19. "phone_number": {
  20. "type": "text",
  21. "analyzer": "phone-ch",
  22. "search_analyzer": "phone-search-ch"
  23. }
  24. }
  25. }
  26. }

copy

The phone analyzer

The phone analyzer generates n-grams based on the given phone number. A (fictional) Swiss phone number containing an international calling prefix can be parsed with or without the Swiss-specific phone region. Thus, the following two requests will produce the same result:

  1. GET /example-phone/_analyze
  2. {
  3. "analyzer" : "phone-ch",
  4. "text" : "+41 60 555 12 34"
  5. }

copy

  1. GET /example-phone/_analyze
  2. {
  3. "analyzer" : "phone",
  4. "text" : "+41 60 555 12 34"
  5. }

copy

The response contains the generated n-grams:

  1. ["+41 60 555 12 34", "6055512", "41605551", "416055512", "6055", "41605551234", ...]

However, if you specify the phone number without the international calling prefix + (either by using 0041 or omitting the international calling prefix altogether), then only the analyzer configured with the correct phone region can parse the number:

  1. GET /example-phone/_analyze
  2. {
  3. "analyzer" : "phone-ch",
  4. "text" : "060 555 12 34"
  5. }

copy

The phone-search analyzer

In contrast, the phone-search analyzer does not create n-grams and only issues some basic tokens. For example, send the following request and specify the phone-search analyzer:

  1. GET /example-phone/_analyze
  2. {
  3. "analyzer" : "phone-search",
  4. "text" : "+41 60 555 12 34"
  5. }

copy

The response contains the following tokens:

  1. ["+41 60 555 12 34", "41 60 555 12 34", "41605551234", "605551234", "41"]