Regexp query

Use the regexp query to search for terms that match a regular expression.

The following query searches for any term that starts with any uppercase or lowercase letter followed by amlet:

  1. GET shakespeare/_search
  2. {
  3. "query": {
  4. "regexp": {
  5. "play_name": "[a-zA-Z]amlet"
  6. }
  7. }
  8. }

copy

Note the following important considerations:

  • Regular expressions are applied to the terms (that is, tokens) in the field—not to the entire field.
  • By default, the maximum length of a regular expression is 1,000 characters. To change the maximum length, update the index.max_regex_length setting.
  • Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see the Lucene documentation.
  • To improve regexp query performance, avoid wildcard patterns without a prefix or suffix, such as .* or .*?+.
  • regexp queries can be expensive operations and require the search.allow_expensive_queries setting to be set to true. Before making frequent regexp queries, test their impact on cluster performance and examine alternative queries that may achieve similar results.

Parameters

The query accepts the name of the field (<field>) as a top-level parameter:

  1. GET _search
  2. {
  3. "query": {
  4. "regexp": {
  5. "<field>": {
  6. "value": "[Ss]ample",
  7. ...
  8. }
  9. }
  10. }
  11. }

copy

The <field> accepts the following parameters. All parameters except value are optional.

ParameterData typeDescription
valueStringThe regular expression used for matching terms in the field specified in <field>.
boostFloating-pointA floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
case_insensitiveBooleanIf true, allows case-insensitive matching of the regular expression value with the indexed field values. Default is false (case sensitivity is determined by the field’s mapping).
flagsStringEnables optional operators for Lucene’s regular expression engine.
max_determinized_statesIntegerLucene converts a regular expression to an automaton with a number of determinized states. This parameter specifies the maximum number of automaton states the query requires. Use this parameter to prevent high resource consumption. To run complex regular expressions, you may need to increase the value of this parameter. Default is 10,000.
rewriteStringDetermines how OpenSearch rewrites and scores multi-term queries. Valid values are constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, and top_terms_blended_freqs_N. Default is constant_score.

If search.allow_expensive_queries is set to false, regexp queries are not run.