Fuzzy query

A fuzzy query searches for documents containing terms that are similar to the search term within the maximum allowed Damerau–Levenshtein distance. The Damerau–Levenshtein distance measures the number of one-character changes needed to change one term to another term. These changes include:

  • Replacements: cat to bat
  • Insertions: cat to cats
  • Deletions: cat to at
  • Transpositions: cat to act

A fuzzy query creates a list of all possible expansions of the search term that fall within the Damerau-Levenshtein distance. You can specify the maximum number of such expansions in the max_expansions field. The query then searches for documents that match any of the expansions. If you set the transpositions parameter to false, then your search will use the classic Levenshtein distance.

The following example query searches for the speaker HALET (misspelled HAMLET). The maximum edit distance is not specified, so the default AUTO edit distance is used:

  1. GET shakespeare/_search
  2. {
  3. "query": {
  4. "fuzzy": {
  5. "speaker": {
  6. "value": "HALET"
  7. }
  8. }
  9. }
  10. }

copy

The response contains all documents in which HAMLET is the speaker.

The following example query searches for the word HALET with advanced parameters:

  1. GET shakespeare/_search
  2. {
  3. "query": {
  4. "fuzzy": {
  5. "speaker": {
  6. "value": "HALET",
  7. "fuzziness": "2",
  8. "max_expansions": 40,
  9. "prefix_length": 0,
  10. "transpositions": true,
  11. "rewrite": "constant_score"
  12. }
  13. }
  14. }
  15. }

copy

Parameters

The query accepts the name of the field (<field>) as a top-level parameter:

  1. GET _search
  2. {
  3. "query": {
  4. "fuzzy": {
  5. "<field>": {
  6. "value": "sample",
  7. ...
  8. }
  9. }
  10. }
  11. }

copy

The <field> accepts the following parameters. All parameters except value are optional.

ParameterData typeDescription
valueStringThe term to search for in the field specified in <field>.
boostFloating-pointA floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
fuzzinessAUTO, 0, or a positive integerThe number of character edits (insert, delete, substitute) needed to change one word to another when determining whether a term matched a value. For example, the distance between wined and wind is 1. The default, AUTO, chooses a value based on the length of each term and is a good choice for most use cases.
max_expansionsPositive integerThe maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in fuzziness. Then OpenSearch tries to match those terms. Default is 50.
prefix_lengthNon-negative integerThe number of leading characters that are not considered in fuzziness. Default is 0.
rewriteStringDetermines how OpenSearch rewrites and scores multi-term queries. Valid values are constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, and top_terms_blended_freqs_N. Default is constant_score.
transpositionsBooleanSpecifies whether to allow transpositions of two adjacent characters (ab to ba) as edits. Default is true.

Specifying a large value in max_expansions can lead to poor performance, especially if prefix_length is set to 0, because of the large number of variations of the word that OpenSearch tries to match.

If search.allow_expensive_queries is set to false, then fuzzy queries are not executed.