Intervals query
The intervals query matches documents based on the proximity and order of matching terms. It applies a set of matching rules to terms contained in the specified field. The query generates sequences of minimal intervals that span terms in the text. You can combine the intervals and filter them by parent sources.
Consider an index containing the following documents:
PUT testindex/_doc/1
{
"title": "key-value pairs are efficiently stored in a hash table"
}
copy
PUT /testindex/_doc/2
{
"title": "store key-value pairs in a hash map"
}
copy
For example, the following query searches for documents containing the phrase key-value pairs
(with no gap separating the terms) followed by either hash table
or hash map
:
GET /testindex/_search
{
"query": {
"intervals": {
"title": {
"all_of": {
"ordered": true,
"intervals": [
{
"match": {
"query": "key-value pairs",
"max_gaps": 0,
"ordered": true
}
},
{
"any_of": {
"intervals": [
{
"match": {
"query": "hash table"
}
},
{
"match": {
"query": "hash map"
}
}
]
}
}
]
}
}
}
}
}
copy
The query returns both documents:
Response
{
"took": 1011,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.25,
"hits": [
{
"_index": "testindex",
"_id": "2",
"_score": 0.25,
"_source": {
"title": "store key-value pairs in a hash map"
}
},
{
"_index": "testindex",
"_id": "1",
"_score": 0.14285713,
"_source": {
"title": "key-value pairs are efficiently stored in a hash table"
}
}
]
}
}
Parameters
The query accepts the name of the field (<field>
) as a top-level parameter:
GET _search
{
"query": {
"intervals": {
"<field>": {
...
}
}
}
}
copy
The <field>
accepts the following rule objects that are used to match documents based on terms, order, and proximity.
Rule | Description |
---|---|
match | Matches analyzed text. |
prefix | Matches terms that start with a specified set of characters. |
wildcard | Matches terms using a wildcard pattern. |
fuzzy | Matches terms that are similar to the provided term within a specified edit distance. |
all_of | Combines multiple rules using a conjunction (AND ). |
any_of | Combines multiple rules using a disjunction (OR ). |
The match
rule
The match
rule matches analyzed text. The following table lists all parameters the match
rule supports.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
query | Required | String | Text for which to search. |
analyzer | Optional | String | The analyzer used to analyze the query text. Default is the analyzer specified for the <field> . |
filter | Optional | Interval filter rule object | A rule used to filter returned intervals. |
max_gaps | Optional | Integer | The maximum allowed number of positions between the matching terms. Terms further apart than max_gaps are not considered matches. If max_gaps is not specified or is set to -1 , terms are considered matches regardless of their position. If max_gaps is set to 0 , matching terms must appear next to each other. Default is -1 . |
ordered | Optional | Boolean | Specifies whether matching terms must appear in their specified order. Default is false . |
use_field | Optional | String | Specifies to search this field instead of the top-level use_field , you can search across multiple fields as if they were all the same field. For example, if you index the same text into stemmed and unstemmed fields, you can search for stemmed tokens that are near unstemmed ones. |
The prefix
rule
The prefix
rule matches terms that start with a specified set of characters (prefix). The prefix can expand to match at most 128 terms. If the prefix matches more than 128 terms, an error is returned. The following table lists all parameters the prefix
rule supports.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
prefix | Required | String | The prefix used to match terms. |
analyzer | Optional | String | The analyzer used to normalize the prefix . Default is the analyzer specified for the <field> . |
use_field | Optional | String | Specifies to search this field instead of the top-level prefix is normalized using the search analyzer specified for this field, unless you specify an analyzer . |
The wildcard
rule
The wildcard
rule matches terms using a wildcard pattern. The wildcard pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the wildcard
rule supports.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
pattern | Required | String | The wildcard pattern used to match terms. Specify ? to match any single character or * to match zero or more characters. |
analyzer | Optional | String | The analyzer used to normalize the pattern . Default is the analyzer specified for the <field> . |
use_field | Optional | String | Specifies to search this field instead of the top-level prefix is normalized using the search analyzer specified for this field, unless you specify an analyzer . |
Specifying patterns that start with *
or ?
can hinder search performance because it increases the number of iterations required to match terms.
The fuzzy
rule
The fuzzy
rule matches terms that are similar to the provided term within a specified edit distance. The fuzzy pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the fuzzy
rule supports.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
term | Required | String | The term to match. |
analyzer | Optional | String | The analyzer used to normalize the term . Default is the analyzer specified for the <field> . |
fuzziness | Optional | String | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between wined and wind is 1. Valid values are non-negative integers or AUTO . The default, AUTO , chooses a value based on the length of each term and is a good choice for most use cases. |
transpositions | Optional | Boolean | Setting transpositions to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the fuzziness option. For example, the distance between wind and wnid is 1 if transpositions is true (swap “n” and “i”) and 2 if it is false (delete “n”, insert “n”). If transpositions is false , rewind and wnid have the same distance (2) from wind , despite the more human-centric opinion that wnid is an obvious typo. The default is a good choice for most use cases. |
prefix_length | Optional | Integer | The number of beginning characters left unchanged for fuzzy matching. Default is 0. |
use_field | Optional | String | Specifies to search this field instead of the top-level term is normalized using the search analyzer specified for this field, unless you specify an analyzer . |
The all_of
rule
The all_of
rule combines multiple rules using a conjunction (AND
). The following table lists all parameters the all_of
rule supports.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
intervals | Required | Array of rule objects | An array of rules to combine. A document must match all rules in order to be returned in the results. |
filter | Optional | Interval filter rule object | A rule used to filter returned intervals. |
max_gaps | Optional | Integer | The maximum allowed number of positions between the matching terms. Terms further apart than max_gaps are not considered matches. If max_gaps is not specified or is set to -1 , terms are considered matches regardless of their position. If max_gaps is set to 0 , matching terms must appear next to each other. Default is -1 . |
ordered | Optional | Boolean | If true , intervals generated by the rules should appear in the specified order. Default is false . |
The any_of
rule
The any_of
rule combines multiple rules using a disjunction (OR
). The following table lists all parameters the any_of
rule supports.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
intervals | Required | Array of rule objects | An array of rules to combine. A document must match at least one rule in order to be returned in the results. |
filter | Optional | Interval filter rule object | A rule used to filter returned intervals. |
The filter
rule
The filter
rule is used to restrict the results. The following table lists all parameters the filter
rule supports.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
after | Optional | Query object | A query used to return intervals that follow an interval specified in the filter rule. |
before | Optional | Query object | A query used to return intervals that are before an interval specified in the filter rule. |
contained_by | Optional | Query object | A query used to return intervals contained by an interval specified in the filter rule. |
containing | Optional | Query object | A query used to return intervals that contain an interval specified in the filter rule. |
not_contained_by | Optional | Query object | A query used to return intervals that are not contained by an interval specified in the filter rule. |
not_containing | Optional | Query object | A query used to return intervals that do not contain an interval specified in the filter rule. |
not_overlapping | Optional | Query object | A query used to return intervals that do not overlap with an interval specified in the filter rule. |
overlapping | Optional | Query object | A query used to return intervals that overlap with an interval specified in the filter rule. |
script | Optional | Script object | A script used to match documents. This script must return true or false . |
Example: Filters
The following query searches for documents containing the words pairs
and hash
that are within five positions of each other and don’t contain the word efficiently
between them:
POST /testindex/_search
{
"query": {
"intervals" : {
"title" : {
"match" : {
"query" : "pairs hash",
"max_gaps" : 5,
"filter" : {
"not_containing" : {
"match" : {
"query" : "efficiently"
}
}
}
}
}
}
}
}
copy
The response contains only document 2:
Response
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.25,
"hits": [
{
"_index": "testindex",
"_id": "2",
"_score": 0.25,
"_source": {
"title": "store key-value pairs in a hash map"
}
}
]
}
}
Example: Script filters
Alternatively, you can write your own script filter to include with the intervals
query using the following variables:
interval.start
: The position (term number) where the interval starts.interval.end
: The position (term number) where the interval ends.interval.gap
: The number of words between the terms.
For example, the following query searches for the words map
and hash
that are next to each other within the specified interval. Terms are numbered starting with 0, so in the text store key-value pairs in a hash map
, store
is at position 0, key
is at position 1
, and so on. The specified interval should start after a
and end before the end of string:
POST /testindex/_search
{
"query": {
"intervals" : {
"title" : {
"match" : {
"query" : "map hash",
"filter" : {
"script" : {
"source" : "interval.start > 5 && interval.end < 8 && interval.gaps == 0"
}
}
}
}
}
}
}
copy
The response contains document 2:
Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5,
"hits": [
{
"_index": "testindex",
"_id": "2",
"_score": 0.5,
"_source": {
"title": "store key-value pairs in a hash map"
}
}
]
}
}
Interval minimization
To ensure that queries run in linear time, the intervals
query minimizes the intervals. For example, consider a document containing the text a b c d c
. You can use the following query to search for d
that is contained by a
and c
:
POST /testindex/_search
{
"query": {
"intervals" : {
"my_text" : {
"match" : {
"query" : "d",
"filter" : {
"contained_by" : {
"match" : {
"query" : "a c"
}
}
}
}
}
}
}
}
copy
The query returns no results because it matches the first two terms a c
and finds no d
between these terms.