Simple query string query
Use the simple_query_string
type to specify multiple arguments delineated by regular expressions directly in the query string. Simple query string has a less strict syntax than query string because it discards any invalid portions of the string and does not return errors for invalid syntax.
This query uses a simple syntax to parse the query string based on special operators and split the string into terms. After parsing, the query analyzes each term independently and then returns matching documents.
The following query performs fuzzy search on the title
field:
GET _search
{
"query": {
"simple_query_string": {
"query": "\"rises wind the\"~4 | *ising~2",
"fields": ["title"]
}
}
}
copy
Simple query string syntax
A query string consists of terms and operators. A term is a single word (for example, in the query wind rises
, the terms are wind
and rises
). If several terms are surrounded by quotation marks, they are treated as one phrase where words are marched in the order they appear (for example, "wind rises"
). Operators such as +
, |
, and -
specify the Boolean logic used to interpret text in the query string.
Operators
Simple query string syntax supports the following operators.
Operator | Description |
---|---|
+ | Acts as the AND operator. |
| | Acts as the OR operator. |
* | When used at the end of a term, signifies a prefix query. |
“ | Wraps several terms into a phrase (for example, “wind rises” ). |
( , ) | Wrap a clause for precedence (for example, wind + (rises | rising) ). |
~n | When used after a term (for example, wnid~3 ), sets fuzziness . When used after a phrase, sets slop . |
- | Negates the term. |
All of the preceding operators are reserved characters. To refer to them as raw characters and not operators, escape any of them with a backslash. When sending a JSON request, use \\
to escape reserved characters (because the backslash character is itself reserved, you must escape the backslash with another backslash).
Default operator
The default operator is OR
(unless you set the default_operator
to AND
). The default operator dictates the overall query behavior. For example, consider an index containing the following documents:
PUT /customers/_doc/1
{
"first_name":"Amber",
"last_name":"Duke",
"address":"880 Holmes Lane"
}
copy
PUT /customers/_doc/2
{
"first_name":"Hattie",
"last_name":"Bond",
"address":"671 Bristol Street"
}
copy
PUT /customers/_doc/3
{
"first_name":"Nanette",
"last_name":"Bates",
"address":"789 Madison St"
}
copy
PUT /customers/_doc/4
{
"first_name":"Dale",
"last_name":"Amber",
"address":"467 Hutchinson Court"
}
copy
The following query attempts to find documents, for which the address contains the words street
or st
and does not contain the word madison
:
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "street st -madison"
}
}
}
copy
However, the results include not only the expected document, but all four documents:
Response
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 2.2039728,
"hits": [
{
"_index": "customers",
"_id": "2",
"_score": 2.2039728,
"_source": {
"first_name": "Hattie",
"last_name": "Bond",
"address": "671 Bristol Street"
}
},
{
"_index": "customers",
"_id": "3",
"_score": 1.2039728,
"_source": {
"first_name": "Nanette",
"last_name": "Bates",
"address": "789 Madison St"
}
},
{
"_index": "customers",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "Amber",
"last_name": "Duke",
"address": "880 Holmes Lane"
}
},
{
"_index": "customers",
"_id": "4",
"_score": 1,
"_source": {
"first_name": "Dale",
"last_name": "Amber",
"address": "467 Hutchinson Court"
}
}
]
}
}
Because the default operator is OR
, this query includes documents that contain the words street
or st
(documents 2 and 3) and documents that do not contain the word madison
(documents 1 and 4).
To express the query intent correctly, precede -madison
with +
:
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "street st +-madison"
}
}
}
copy
Alternatively, specify AND
as the default operator and use disjunction for the words street
and st
:
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "st|street -madison",
"default_operator": "AND"
}
}
}
copy
The preceding query returns document 2:
Response
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 2.2039728,
"hits": [
{
"_index": "customers",
"_id": "2",
"_score": 2.2039728,
"_source": {
"first_name": "Hattie",
"last_name": "Bond",
"address": "671 Bristol Street"
}
}
]
}
}
Limit operators
To limit the supported operators for the simple query string parser, include the operators that you want to support, separated by |
, in the flags
parameter. For example, the following query enables only OR
, AND
, and FUZZY
operators:
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "bristol | madison +stre~2",
"flags": "OR|AND|FUZZY"
}
}
}
copy
The following table lists all available operator flags.
Flag | Description |
---|---|
ALL (default) | Enables all operators. |
AND | Enables the + (AND ) operator. |
ESCAPE | Enables the \ as an escape character. |
FUZZY | Enables the ~n operator after a word, where n is an integer denoting the allowed edit distance for matching. |
NEAR | Enables the ~n operator after a phrase, where n is the maximum number of positions allowed between matching tokens. Same as SLOP . |
NONE | Disables all operators. |
NOT | Enables the - (NOT ) operator. |
OR | Enables the | (OR ) operator. |
PHRASE | Enables the “ (quotation marks) for phrase search. |
PRECEDENCE | Enables the ( and ) (parentheses) operators for operator precedence. |
PREFIX | Enables the * (prefix) operator. |
SLOP | Enables the ~n operator after a phrase, where n is the maximum number of positions allowed between matching tokens. Same as NEAR . |
WHITESPACE | Enables white space characters as characters on which the text is split. |
Wildcard expressions
You can specify wildcard expressions using the *
special character, which replaces zero or more characters. For example, the following query searches in all fields that end with name
:
GET /customers/_search
{
"query": {
"simple_query_string" : {
"query": "Amber Bond",
"fields": [ "*name" ]
}
}
}
copy
Boosting
Use the caret (^
) boost operator to boost the relevance score of a field by a multiplier. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is 1
.
For example, the following query searches the first_name
and last_name
fields and boosts matches from the first_name
field by a factor of 2:
GET /customers/_search
{
"query": {
"simple_query_string" : {
"query": "Amber",
"fields": [ "first_name^2", "last_name" ]
}
}
}
copy
Multi-position tokens
For multi-position tokens, simple query string creates a match phrase query. Thus, if you specify ml, machine learning
as synonyms and search for ml
, OpenSearch searches for ml OR "machine learning"
.
Alternatively, you can match multi-position tokens using conjunctions. If you set auto_generate_synonyms_phrase_query
to false
, OpenSearch searches for ml OR (machine AND learning)
.
For example, the following query searches for the text ml models
and specifies not to auto-generate a match phrase query for each synonym:
GET /testindex/_search
{
"query": {
"simple_query_string": {
"fields": ["title"],
"query": "ml models",
"auto_generate_synonyms_phrase_query": false
}
}
}
copy
For this query, OpenSearch creates the following Boolean query: (ml OR (machine AND learning)) models
.
Parameters
The following table lists the top-level parameters that simple_query_string
query supports. All parameters except query
are optional.
Parameter | Data type | Description |
---|---|---|
query | String | The text that may contain expressions in the simple query string syntax to use for search. Required. |
analyze_wildcard | Boolean | Specifies whether OpenSearch should attempt to analyze wildcard terms. Default is false . |
analyzer | String | The analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the default_field . If no analyzer is specified for the default_field , the analyzer is the default analyzer for the index. |
auto_generate_synonyms_phrase_query | Boolean | Specifies whether to create match_phrase queries automatically for multi-term synonyms. Default is true . |
default_operator | String | If the query string contains multiple search terms, whether all terms need to match (AND ) or only one term needs to match (OR ) for a document to be considered a match. Valid values are:- OR : The string to be is interpreted as to OR be - AND : The string to be is interpreted as to AND be Default is OR . |
fields | String array | The list of fields to search (for example, “fields”: [“title^4”, “description”] ). Supports wildcards. If unspecified, defaults to the index.query. Default_field setting, which defaults to [“*”] . The maximum number of fields that can be searched at the same time is defined by indices.query.bool.max_clause_count , which is 1,024 by default. |
flags | String | A | -delimited string of flags to enable (for example, AND|OR|NOT ). Default is ALL . You can explicitly set the value for default_field . For example, to return all titles, set it to “default_field”: “title” . |
fuzzy_max_expansions | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in fuzziness . Then OpenSearch tries to match those terms. Default is 50 . |
fuzzy_transpositions | Boolean | Setting fuzzy_transpositions to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the fuzziness option. For example, the distance between wind and wnid is 1 if fuzzy_transpositions is true (swap “n” and “i”) and 2 if it is false (delete “n”, insert “n”). If fuzzy_transpositions is false, rewind and wnid have the same distance (2) from wind , despite the more human-centric opinion that wnid is an obvious typo. The default is a good choice for most use cases. |
fuzzy_prefix_length | Integer | The number of beginning characters left unchanged for fuzzy matching. Default is 0. |
lenient | Boolean | Setting lenient to true ignores data type mismatches between the query and the document field. For example, a query string of “8.2” could match a field of type float . Default is false . |
minimum_should_match | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the or operator, the number of terms that need to match for the document to be considered a match. For example, if minimum_should_match is 2, wind often rising does not match The Wind Rises. If minimum_should_match is 1 , it matches. For details, see Minimum should match. |
quote_field_suffix | String | This option supports searching for exact matches (surrounded with quotation marks) using a different analysis method than non-exact matches use. For example, if quote_field_suffix is .exact and you search for \”lightly\” in the title field, OpenSearch searches for the word lightly in the title.exact field. This second field might use a different type (for example, keyword rather than text ) or a different analyzer. |