Simple query string query

Use the simple_query_string type to specify multiple arguments delineated by regular expressions directly in the query string. Simple query string has a less strict syntax than query string because it discards any invalid portions of the string and does not return errors for invalid syntax.

This query uses a simple syntax to parse the query string based on special operators and split the string into terms. After parsing, the query analyzes each term independently and then returns matching documents.

The following query performs fuzzy search on the title field:

  1. GET _search
  2. {
  3. "query": {
  4. "simple_query_string": {
  5. "query": "\"rises wind the\"~4 | *ising~2",
  6. "fields": ["title"]
  7. }
  8. }
  9. }

copy

Simple query string syntax

A query string consists of terms and operators. A term is a single word (for example, in the query wind rises, the terms are wind and rises). If several terms are surrounded by quotation marks, they are treated as one phrase where words are marched in the order they appear (for example, "wind rises"). Operators such as +, |, and - specify the Boolean logic used to interpret text in the query string.

Operators

Simple query string syntax supports the following operators.

OperatorDescription
+Acts as the AND operator.
|Acts as the OR operator.
*When used at the end of a term, signifies a prefix query.
Wraps several terms into a phrase (for example, “wind rises”).
(, )Wrap a clause for precedence (for example, wind + (rises | rising)).
~nWhen used after a term (for example, wnid~3), sets fuzziness. When used after a phrase, sets slop.
-Negates the term.

All of the preceding operators are reserved characters. To refer to them as raw characters and not operators, escape any of them with a backslash. When sending a JSON request, use \\ to escape reserved characters (because the backslash character is itself reserved, you must escape the backslash with another backslash).

Default operator

The default operator is OR (unless you set the default_operator to AND). The default operator dictates the overall query behavior. For example, consider an index containing the following documents:

  1. PUT /customers/_doc/1
  2. {
  3. "first_name":"Amber",
  4. "last_name":"Duke",
  5. "address":"880 Holmes Lane"
  6. }

copy

  1. PUT /customers/_doc/2
  2. {
  3. "first_name":"Hattie",
  4. "last_name":"Bond",
  5. "address":"671 Bristol Street"
  6. }

copy

  1. PUT /customers/_doc/3
  2. {
  3. "first_name":"Nanette",
  4. "last_name":"Bates",
  5. "address":"789 Madison St"
  6. }

copy

  1. PUT /customers/_doc/4
  2. {
  3. "first_name":"Dale",
  4. "last_name":"Amber",
  5. "address":"467 Hutchinson Court"
  6. }

copy

The following query attempts to find documents, for which the address contains the words street or st and does not contain the word madison:

  1. GET /customers/_search
  2. {
  3. "query": {
  4. "simple_query_string": {
  5. "fields": [ "address" ],
  6. "query": "street st -madison"
  7. }
  8. }
  9. }

copy

However, the results include not only the expected document, but all four documents:

Response

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 4,
  13. "relation": "eq"
  14. },
  15. "max_score": 2.2039728,
  16. "hits": [
  17. {
  18. "_index": "customers",
  19. "_id": "2",
  20. "_score": 2.2039728,
  21. "_source": {
  22. "first_name": "Hattie",
  23. "last_name": "Bond",
  24. "address": "671 Bristol Street"
  25. }
  26. },
  27. {
  28. "_index": "customers",
  29. "_id": "3",
  30. "_score": 1.2039728,
  31. "_source": {
  32. "first_name": "Nanette",
  33. "last_name": "Bates",
  34. "address": "789 Madison St"
  35. }
  36. },
  37. {
  38. "_index": "customers",
  39. "_id": "1",
  40. "_score": 1,
  41. "_source": {
  42. "first_name": "Amber",
  43. "last_name": "Duke",
  44. "address": "880 Holmes Lane"
  45. }
  46. },
  47. {
  48. "_index": "customers",
  49. "_id": "4",
  50. "_score": 1,
  51. "_source": {
  52. "first_name": "Dale",
  53. "last_name": "Amber",
  54. "address": "467 Hutchinson Court"
  55. }
  56. }
  57. ]
  58. }
  59. }

Because the default operator is OR, this query includes documents that contain the words street or st (documents 2 and 3) and documents that do not contain the word madison (documents 1 and 4).

To express the query intent correctly, precede -madison with +:

  1. GET /customers/_search
  2. {
  3. "query": {
  4. "simple_query_string": {
  5. "fields": [ "address" ],
  6. "query": "street st +-madison"
  7. }
  8. }
  9. }

copy

Alternatively, specify AND as the default operator and use disjunction for the words street and st:

  1. GET /customers/_search
  2. {
  3. "query": {
  4. "simple_query_string": {
  5. "fields": [ "address" ],
  6. "query": "st|street -madison",
  7. "default_operator": "AND"
  8. }
  9. }
  10. }

copy

The preceding query returns document 2:

Response

  1. {
  2. "took": 2,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 2.2039728,
  16. "hits": [
  17. {
  18. "_index": "customers",
  19. "_id": "2",
  20. "_score": 2.2039728,
  21. "_source": {
  22. "first_name": "Hattie",
  23. "last_name": "Bond",
  24. "address": "671 Bristol Street"
  25. }
  26. }
  27. ]
  28. }
  29. }

Limit operators

To limit the supported operators for the simple query string parser, include the operators that you want to support, separated by |, in the flags parameter. For example, the following query enables only OR, AND, and FUZZY operators:

  1. GET /customers/_search
  2. {
  3. "query": {
  4. "simple_query_string": {
  5. "fields": [ "address" ],
  6. "query": "bristol | madison +stre~2",
  7. "flags": "OR|AND|FUZZY"
  8. }
  9. }
  10. }

copy

The following table lists all available operator flags.

FlagDescription
ALL (default)Enables all operators.
ANDEnables the + (AND) operator.
ESCAPEEnables the \ as an escape character.
FUZZYEnables the ~n operator after a word, where n is an integer denoting the allowed edit distance for matching.
NEAREnables the ~n operator after a phrase, where n is the maximum number of positions allowed between matching tokens. Same as SLOP.
NONEDisables all operators.
NOTEnables the - (NOT) operator.
OREnables the | (OR) operator.
PHRASEEnables the (quotation marks) for phrase search.
PRECEDENCEEnables the ( and ) (parentheses) operators for operator precedence.
PREFIXEnables the * (prefix) operator.
SLOPEnables the ~n operator after a phrase, where n is the maximum number of positions allowed between matching tokens. Same as NEAR.
WHITESPACEEnables white space characters as characters on which the text is split.

Wildcard expressions

You can specify wildcard expressions using the * special character, which replaces zero or more characters. For example, the following query searches in all fields that end with name:

  1. GET /customers/_search
  2. {
  3. "query": {
  4. "simple_query_string" : {
  5. "query": "Amber Bond",
  6. "fields": [ "*name" ]
  7. }
  8. }
  9. }

copy

Boosting

Use the caret (^) boost operator to boost the relevance score of a field by a multiplier. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is 1.

For example, the following query searches the first_name and last_name fields and boosts matches from the first_name field by a factor of 2:

  1. GET /customers/_search
  2. {
  3. "query": {
  4. "simple_query_string" : {
  5. "query": "Amber",
  6. "fields": [ "first_name^2", "last_name" ]
  7. }
  8. }
  9. }

copy

Multi-position tokens

For multi-position tokens, simple query string creates a match phrase query. Thus, if you specify ml, machine learning as synonyms and search for ml, OpenSearch searches for ml OR "machine learning".

Alternatively, you can match multi-position tokens using conjunctions. If you set auto_generate_synonyms_phrase_query to false, OpenSearch searches for ml OR (machine AND learning).

For example, the following query searches for the text ml models and specifies not to auto-generate a match phrase query for each synonym:

  1. GET /testindex/_search
  2. {
  3. "query": {
  4. "simple_query_string": {
  5. "fields": ["title"],
  6. "query": "ml models",
  7. "auto_generate_synonyms_phrase_query": false
  8. }
  9. }
  10. }

copy

For this query, OpenSearch creates the following Boolean query: (ml OR (machine AND learning)) models.

Parameters

The following table lists the top-level parameters that simple_query_string query supports. All parameters except query are optional.

ParameterData typeDescription
queryStringThe text that may contain expressions in the simple query string syntax to use for search. Required.
analyze_wildcardBooleanSpecifies whether OpenSearch should attempt to analyze wildcard terms. Default is false.
analyzerStringThe analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the default_field. If no analyzer is specified for the default_field, the analyzer is the default analyzer for the index.
auto_generate_synonyms_phrase_queryBooleanSpecifies whether to create match_phrase queries automatically for multi-term synonyms. Default is true.
default_operatorStringIf the query string contains multiple search terms, whether all terms need to match (AND) or only one term needs to match (OR) for a document to be considered a match. Valid values are:
- OR: The string to be is interpreted as to OR be
- AND: The string to be is interpreted as to AND be
Default is OR.
fieldsString arrayThe list of fields to search (for example, “fields”: [“title^4”, “description”]). Supports wildcards. If unspecified, defaults to the index.query.default_field setting, which defaults to [“*”]. The maximum number of fields that can be searched at the same time is defined by indices.query.bool.max_clause_count, which is 1,024 by default.
flagsStringA |-delimited string of flags to enable (for example, AND|OR|NOT). Default is ALL. You can explicitly set the value for default_field. For example, to return all titles, set it to “default_field”: “title”.
fuzzy_max_expansionsPositive integerThe maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in fuzziness. Then OpenSearch tries to match those terms. Default is 50.
fuzzy_transpositionsBooleanSetting fuzzy_transpositions to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the fuzziness option. For example, the distance between wind and wnid is 1 if fuzzy_transpositions is true (swap “n” and “i”) and 2 if it is false (delete “n”, insert “n”). If fuzzy_transpositions is false, rewind and wnid have the same distance (2) from wind, despite the more human-centric opinion that wnid is an obvious typo. The default is a good choice for most use cases.
fuzzy_prefix_lengthIntegerThe number of beginning characters left unchanged for fuzzy matching. Default is 0.
lenientBooleanSetting lenient to true ignores data type mismatches between the query and the document field. For example, a query string of “8.2” could match a field of type float. Default is false.
minimum_should_matchPositive or negative integer, positive or negative percentage, combinationIf the query string contains multiple search terms and you use the or operator, the number of terms that need to match for the document to be considered a match. For example, if minimum_should_match is 2, wind often rising does not match The Wind Rises. If minimum_should_match is 1, it matches. For details, see Minimum should match.
quote_field_suffixStringThis option supports searching for exact matches (surrounded with quotation marks) using a different analysis method than non-exact matches use. For example, if quote_field_suffix is .exact and you search for \”lightly\” in the title field, OpenSearch searches for the word lightly in the title.exact field. This second field might use a different type (for example, keyword rather than text) or a different analyzer.