Full text queries - Intervals - 《Elasticsearch v7.9 Reference》

Intervals query

Intervals query

Returns documents based on the order and proximity of matching terms.

The intervals query uses matching rules, constructed from a small set of definitions. These rules are then applied to terms from a specified field.

The definitions produce sequences of minimal intervals that span terms in a body of text. These intervals can be further combined and filtered by parent sources.

Example request

The following intervals search returns documents containing my favorite food immediately followed by hot water or cold porridge in the my_text field.

This search would match a my_text value of my favorite food is cold porridge but not when it's cold my favorite food is porridge.

POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "all_of" : {
          "ordered" : true,
          "intervals" : [
            {
              "match" : {
                "query" : "my favorite food",
                "max_gaps" : 0,
                "ordered" : true
              }
            },
            {
              "any_of" : {
                "intervals" : [
                  { "match" : { "query" : "hot water" } },
                  { "match" : { "query" : "cold porridge" } }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Top-level parameters for `intervals`

<field>

(Required, rule object) Field you wish to search.

The value of this parameter is a rule object used to match documents based on matching terms, order, and proximity.

Valid rules include:

`match` rule parameters

The match rule matches analyzed text.

query

(Required, string) Text you wish to find in the provided <field>.

max_gaps

(Optional, integer) Maximum number of positions between the matching terms. Terms further apart than this are not considered matches. Defaults to -1.

If unspecified or set to -1, there is no width restriction on the match. If set to 0, the terms must appear next to each other.

ordered

(Optional, boolean) If true, matching terms must appear in their specified order. Defaults to false.

analyzer

(Optional, string) analyzer used to analyze terms in the query. Defaults to the top-level <field>‘s analyzer.

filter

(Optional, interval filter rule object) An optional interval filter.

use_field

(Optional, string) If specified, then match intervals from this field rather than the top-level <field>. Terms are analyzed using the search analyzer from this field. This allows you to search across multiple fields as if they were all the same field; for example, you could index the same text into stemmed and unstemmed fields, and search for stemmed tokens near unstemmed ones.

`prefix` rule parameters

The prefix rule matches terms that start with a specified set of characters. This prefix can expand to match at most 128 terms. If the prefix matches more than 128 terms, Elasticsearch returns an error. You can use the index-prefixes option in the field mapping to avoid this limit.

prefix

(Required, string) Beginning characters of terms you wish to find in the top-level <field>.

analyzer

(Optional, string) analyzer used to normalize the prefix. Defaults to the top-level <field>‘s analyzer.

use_field

(Optional, string) If specified, then match intervals from this field rather than the top-level <field>.

The prefix is normalized using the search analyzer from this field, unless a separate analyzer is specified.

`wildcard` rule parameters

The wildcard rule matches terms using a wildcard pattern. This pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, Elasticsearch returns an error.

pattern

(Required, string) Wildcard pattern used to find matching terms.

This parameter supports two wildcard operators:

?, which matches any single character
*, which can match zero or more characters, including an empty one

Avoid beginning patterns with * or ?. This can increase the iterations needed to find matching terms and slow search performance.

analyzer

(Optional, string) analyzer used to normalize the pattern. Defaults to the top-level <field>‘s analyzer.

use_field

(Optional, string) If specified, match intervals from this field rather than the top-level <field>.

The pattern is normalized using the search analyzer from this field, unless analyzer is specified separately.

`fuzzy` rule parameters

The fuzzy rule matches terms that are similar to the provided term, within an edit distance defined by Fuzziness. If the fuzzy expansion matches more than 128 terms, Elasticsearch returns an error.

term

(Required, string) The term to match

prefix_length

(Optional, string) Number of beginning characters left unchanged when creating expansions. Defaults to 0.

transpositions

(Optional, boolean) Indicates whether edits include transpositions of two adjacent characters (ab → ba). Defaults to true.

fuzziness

(Optional, string) Maximum edit distance allowed for matching. See Fuzziness for valid values and more information. Defaults to auto.

analyzer

(Optional, string) analyzer used to normalize the term. Defaults to the top-level <field> ‘s analyzer.

use_field

(Optional, string) If specified, match intervals from this field rather than the top-level <field>.

The term is normalized using the search analyzer from this field, unless analyzer is specified separately.

`all_of` rule parameters

The all_of rule returns matches that span a combination of other rules.

intervals

(Required, array of rule objects) An array of rules to combine. All rules must produce a match in a document for the overall source to match.

max_gaps

(Optional, integer) Maximum number of positions between the matching terms. Intervals produced by the rules further apart than this are not considered matches. Defaults to -1.

If unspecified or set to -1, there is no width restriction on the match. If set to 0, the terms must appear next to each other.

ordered

(Optional, boolean) If true, intervals produced by the rules should appear in the order in which they are specified. Defaults to false.

filter

(Optional, interval filter rule object) Rule used to filter returned intervals.

`any_of` rule parameters

The any_of rule returns intervals produced by any of its sub-rules.

intervals

(Required, array of rule objects) An array of rules to match.

filter

(Optional, interval filter rule object) Rule used to filter returned intervals.

`filter` rule parameters

The filter rule returns intervals based on a query. See Filter example for an example.

after

(Optional, query object) Query used to return intervals that follow an interval from the filter rule.

before

(Optional, query object) Query used to return intervals that occur before an interval from the filter rule.

contained_by

(Optional, query object) Query used to return intervals contained by an interval from the filter rule.

containing

(Optional, query object) Query used to return intervals that contain an interval from the filter rule.

not_contained_by

(Optional, query object) Query used to return intervals that are not contained by an interval from the filter rule.

not_containing

(Optional, query object) Query used to return intervals that do not contain an interval from the filter rule.

not_overlapping

(Optional, query object) Query used to return intervals that do not overlap with an interval from the filter rule.

overlapping

(Optional, query object) Query used to return intervals that overlap with an interval from the filter rule.

script

(Optional, script object) Script used to return matching documents. This script must return a boolean value, true or false. See Script filters for an example.

Notes

Filter example

The following search includes a filter rule. It returns documents that have the words hot and porridge within 10 positions of each other, without the word salty in between:

POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "match" : {
          "query" : "hot porridge",
          "max_gaps" : 10,
          "filter" : {
            "not_containing" : {
              "match" : {
                "query" : "salty"
              }
            }
          }
        }
      }
    }
  }
}

Script filters

You can use a script to filter intervals based on their start position, end position, and internal gap count. The following filter script uses the interval variable with the start, end, and gaps methods:

POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "match" : {
          "query" : "hot porridge",
          "filter" : {
            "script" : {
              "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
            }
          }
        }
      }
    }
  }
}

Minimization

The intervals query always minimizes intervals, to ensure that queries can run in linear time. This can sometimes cause surprising results, particularly when using max_gaps restrictions or filters. For example, take the following query, searching for salty contained within the phrase hot porridge:

POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "match" : {
          "query" : "salty",
          "filter" : {
            "contained_by" : {
              "match" : {
                "query" : "hot porridge"
              }
            }
          }
        }
      }
    }
  }
}

This query does not match a document containing the phrase hot porridge is salty porridge, because the intervals returned by the match query for hot porridge only cover the initial two terms in this document, and these do not overlap the intervals covering salty.

Another restriction to be aware of is the case of any_of rules that contain sub-rules which overlap. In particular, if one of the rules is a strict prefix of the other, then the longer rule can never match, which can cause surprises when used in combination with max_gaps. Consider the following query, searching for the immediately followed by big or big bad, immediately followed by wolf:

POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "all_of" : {
          "intervals" : [
            { "match" : { "query" : "the" } },
            { "any_of" : {
                "intervals" : [
                    { "match" : { "query" : "big" } },
                    { "match" : { "query" : "big bad" } }
                ] } },
            { "match" : { "query" : "wolf" } }
          ],
          "max_gaps" : 0,
          "ordered" : true
        }
      }
    }
  }
}

Counter-intuitively, this query does not match the document the big bad wolf, because the any_of rule in the middle only produces intervals for big - intervals for big bad being longer than those for big, while starting at the same position, and so being minimized away. In these cases, it’s better to rewrite the query so that all of the options are explicitly laid out at the top level:

POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "any_of" : {
          "intervals" : [
            { "match" : {
                "query" : "the big bad wolf",
                "ordered" : true,
                "max_gaps" : 0 } },
            { "match" : {
                "query" : "the big wolf",
                "ordered" : true,
                "max_gaps" : 0 } }
           ]
        }
      }
    }
  }
}

Intervals

Intervals query

Example request

Top-level parameters for intervals

match rule parameters

prefix rule parameters

wildcard rule parameters

fuzzy rule parameters

all_of rule parameters

any_of rule parameters

filter rule parameters

Notes

Filter example

Script filters

Minimization

Top-level parameters for `intervals`

`match` rule parameters

`prefix` rule parameters

`wildcard` rule parameters

`fuzzy` rule parameters

`all_of` rule parameters

`any_of` rule parameters

`filter` rule parameters