Match query

Match query

Use the match query for full-text search on a specific document field. If you run a match query on a text field, the match query analyzes the provided search string and returns documents that match any of the string’s terms. If you run a match query on an exact-value field, it returns documents that match the exact value. The preferred way to search exact-value fields is to use a filter because, unlike a query, a filter is cached.

The following example shows a basic match query for the word wind in the title:

GET _search
{
  "query": {
    "match": {
      "title": "wind"
    }
  }
}

copy

To pass additional parameters, you can use the expanded syntax:

GET _search
{
  "query": {
    "match": {
      "title": {
        "query": "wind",
        "analyzer": "stop"
      }
    }
  }
}

copy

Examples

In the following examples, you’ll use the index that contains the following documents:

PUT testindex/_doc/1
{
  "title": "Let the wind rise"
}

copy

PUT testindex/_doc/2
{
  "title": "Gone with the wind"
}

copy

PUT testindex/_doc/3
{
  "title": "Rise is gone"
}

copy

Operator

If a match query is run on a text field, the text is analyzed with the analyzer specified in the analyzer parameter. Then the resulting tokens are combined into a Boolean query using the operator specified in the operator parameter. The default operator is OR, so the query wind rise is changed into wind OR rise. In this example, this query returns documents 1–3 because each document has a term that matches the query. To specify the and operator, use the following query:

GET testindex/_search
{
  "query": {
    "match": {
      "title": {
        "query": "wind rise",
        "operator": "and"
      }
    }
  }
}

copy

The query is constructed as wind AND rise and returns document 1 as the matching document:

Response

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.2667098,
    "hits": [
      {
        "_index": "testindex",
        "_id": "1",
        "_score": 1.2667098,
        "_source": {
          "title": "Let the wind rise"
        }
      }
    ]
  }
}

Minimum should match

You can control the minimum number of terms that a document must match to be returned in the results by specifying the minimum_should_match parameter:

GET testindex/_search
{
  "query": {
    "match": {
      "title": {
        "query": "wind rise",
        "operator": "or",
        "minimum_should_match": 2
      }
    }
  }
}

copy

Now documents are required to match both terms, so only document 1 is returned (this is equivalent to the and operator):

Response

{
  "took": 23,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.2667098,
    "hits": [
      {
        "_index": "testindex",
        "_id": "1",
        "_score": 1.2667098,
        "_source": {
          "title": "Let the wind rise"
        }
      }
    ]
  }
}

Analyzer

Because in this example you didn’t explicitly specify the analyzer, the default standard analyzer is used. The default analyzer does not perform stemming, so if you run a query the wind rises, you receive no results because the token rises does not match the token rise. To change the search analyzer, specify it in the analyzer field. For example, the following query uses the english analyzer:

GET testindex/_search
{
  "query": {
    "match": {
      "title": {
        "query": "the wind rises",
        "operator": "and",
        "analyzer": "english"
      }
    }
  }
}

copy

The english analyzer removes the stopword the and performs stemming, producing the tokens wind and rise. The latter token matches document 1, which is returned in the results:

Response

{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.2667098,
    "hits": [
      {
        "_index": "testindex",
        "_id": "1",
        "_score": 1.2667098,
        "_source": {
          "title": "Let the wind rise"
        }
      }
    ]
  }
}

Empty query

In some cases, an analyzer might remove all tokens from a query. For example, the english analyzer removes stop words, so in a query and OR or, all tokens are removed. To check the analyzer behavior, you can use the Analyze API:

GET testindex/_analyze
{
  "analyzer" : "english",
  "text" : "and OR or"
}

copy

As expected, the query produces no tokens:

{
  "tokens": []
}

You can specify the behavior for an empty query in the zero_terms_query parameter. Setting zero_terms_query to all returns all documents in the index and setting it to none returns no documents:

GET testindex/_search
{
  "query": {
    "match": {
      "title": {
        "query": "and OR or",
        "analyzer" : "english",
        "zero_terms_query": "all"
      }
    }
  }
}

copy

Fuzziness

To account for typos, you can specify fuzziness for your query as either of the following:

An integer that specifies the maximum allowed Damerau–Levenshtein distance for this edit.
AUTO:
- Strings of 0–2 characters must match exactly.
- Strings of 3–5 characters allow 1 edit.
- Strings longer than 5 characters allow 2 edits.

Setting fuzziness to the default AUTO value works best in most cases:

GET testindex/_search
{
  "query": {
    "match": {
      "title": {
        "query": "wnid",
        "fuzziness": "AUTO"
      }
    }
  }
}

copy

The token wnid matches wind and the query returns documents 1 and 2:

Response

{
  "took": 31,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.47501624,
    "hits": [
      {
        "_index": "testindex",
        "_id": "1",
        "_score": 0.47501624,
        "_source": {
          "title": "Let the wind rise"
        }
      },
      {
        "_index": "testindex",
        "_id": "2",
        "_score": 0.47501624,
        "_source": {
          "title": "Gone with the wind"
        }
      }
    ]
  }
}

Prefix length

Misspellings rarely occur in the beginning of words. Thus, you can specify the minimum length the matched prefix must be to return a document in the results. For example, you can change the preceding query to include a prefix_length:

GET testindex/_search
{
  "query": {
    "match": {
      "title": {
        "query": "wnid",
        "fuzziness": "AUTO",
        "prefix_length": 2
      }
    }
  }
}

copy

The preceding query returns no results. If you change the prefix_length to 1, documents 1 and 2 are returned because the first letter of the token wnid is not misspelled.

Transpositions

In the preceding example, the word wnid contained a transposition (in was changed to ni). By default, transpositions are allowed in fuzzy matching, but you can disallow them by setting fuzzy_transpositions to false:

GET testindex/_search
{
  "query": {
    "match": {
      "title": {
        "query": "wnid",
        "fuzziness": "AUTO",
        "fuzzy_transpositions": false
      }
    }
  }
}

copy

Now the query returns no results.

Synonyms

If you use a synonym_graph filter and auto_generate_synonyms_phrase_query is set to true (default), OpenSearch parses the query into terms and then combines the terms to generate a phrase query for multi-term synonyms. For example, if you specify ba,batting average as synonyms and search for ba, OpenSearch searches for ba OR "batting average".

To match multi-term synonyms with conjunctions, set auto_generate_synonyms_phrase_query to false:

GET /testindex/_search
{
  "query": {
    "match": {
      "text": {
        "query": "good ba",
        "auto_generate_synonyms_phrase_query": false
      }
    }
  }
}

copy

The query produced is ba OR (batting AND average).

Parameters

The query accepts the name of the field (<field>) as a top-level parameter:

GET _search
{
  "query": {
    "match": {
      "<field>": {
        "query": "text to search for",
        ... 
      }
    }
  }
}

copy

The <field> accepts the following parameters. All parameters except query are optional.

Parameter	Data type	Description
`query`	String	The query string to use for search. Required.
`autogenerate_synonyms_phrase_query`	Boolean	Specifies whether to create a match phrase query automatically for multi-term synonyms. For example, if you specify `ba,batting average` as synonyms and search for `ba`, OpenSearch searches for `ba OR “batting average”` (if this option is `true`) or `ba OR (batting AND average)` (if this option is `false`). Default is `true`.
`analyzer`	String	The analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`boost`	Floating-point	Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`.
`enable_position_increments`	Boolean	When `true`, resulting queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted “gap” between terms. Default is `true`.
`fuzziness`	String	The number of character edits (insertions, deletions, substitutions, or transpositions) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`fuzzy_rewrite`	String	Determines how OpenSearch rewrites the query. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. If the `fuzziness` parameter is not `0`, the query uses a `fuzzy_rewrite` method of `top_terms_blended_freqs${max_expansions}` by default. Default is `constant_score`.
`fuzzy_transpositions`	Boolean	Setting `fuzzy_transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap “n” and “i”) and 2 if it is false (delete “n”, insert “n”). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient`	Boolean	Setting `lenient` to `true` ignores data type mismatches between the query and the document field. For example, a query string of `“8.2”` could match a field of type `float`. Default is `false`.
`max_expansions`	Positive integer	The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`minimum_should_match`	Positive or negative integer, positive or negative percentage, combination	If the query string contains multiple search terms and you use the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, `wind often rising` does not match `The Wind Rises.` If `minimum_should_match` is `1`, it matches. For details, see Minimum should match.
`operator`	String	If the query string contains multiple search terms, whether all terms need to match (`AND`) or only one term needs to match (`OR`) for a document to be considered a match. Valid values are: - `OR`: The string `to be` is interpreted as `to OR be` - `AND`: The string `to be` is interpreted as `to AND be` Default is `OR`.
`prefix_length`	Non-negative integer	The number of leading characters that are not considered in fuzziness. Default is `0`.
`zero_terms_query`	String	In some cases, the analyzer removes all terms from a query string. For example, the `stop` analyzer removes all terms from the string `an but this`. In those cases, `zero_terms_query` specifies whether to match no documents (`none`) or all documents (`all`). Valid values are `none` and `all`. Default is `none`.