6.10. Regular Expression Functions

6.10. Regular Expression Functions

All of the regular expression functions use the Java pattern syntax,with a few notable exceptions:

When using multi-line mode (enabled via the (?m) flag),only \n is recognized as a line terminator. Additionally,the (?d) flag is not supported and must not be used.
Case-insensitive matching (enabled via the (?i) flag) is alwaysperformed in a Unicode-aware manner. However, context-sensitive andlocal-sensitive matching is not supported. Additionally, the(?u) flag is not supported and must not be used.
Surrogate pairs are not supported. For example, \uD800\uDC00 isnot treated as U+10000 and must be specified as \x{10000}.
Boundaries (\b) are incorrectly handled for a non-spacing markwithout a base character.
\Q and \E are not supported in character classes(such as [A-Z123]) and are instead treated as literals.
Unicode character classes (\p{prop}) are supported withthe following differences:
- All underscores in names must be removed. For example, useOldItalic instead of Old_Italic.
- Scripts must be specified directly, without theIs, script= or sc= prefixes.Example: \p{Hiragana}
- Blocks must be specified with the In prefix.The block= and blk= prefixes are not supported.Example: \p{Mongolian}
- Categories must be specified directly, without the Is,general_category= or gc= prefixes.Example: \p{L}
- Binary properties must be specified directly, without the Is.Example: \p{NoncharacterCodePoint}
  regexpextract_all(_string, pattern) → array<varchar>
  
  Returns the substring(s) matched by the regular expression patternin string:

SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14]

regexpextract_all(_string, pattern, group) → array<varchar>

Finds all occurrences of the regular expression pattern in stringand returns the capturing group number group:

SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm']

regexpextract(_string, pattern) → varchar

Returns the first substring matched by the regular expression patternin string:

SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1

regexpextract(_string, pattern, group) → varchar

Finds the first occurrence of the regular expression pattern instring and returns the capturing group number group:

SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a'

regexplike(_string, pattern) → boolean

Evaluates the regular expression pattern and determines if it iscontained within string.

This function is similar to the LIKE operator, expect that thepattern only needs to be contained within string, rather thanneeding to match all of string. In other words, this performs acontains operation rather than a match operation. You can matchthe entire string by anchoring the pattern using ^ and $:

SELECT regexp_like('1a 2b 14m', '\d+b'); -- true

regexpreplace(_string, pattern) → varchar

Removes every instance of the substring matched by the regular expressionpattern from string:

SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'

regexpreplace(_string, pattern, replacement) → varchar

Replaces every instance of the substring matched by the regular expressionpattern in string with replacement. Capturing groups can bereferenced in replacement using $g for a numbered group or${name} for a named group. A dollar sign ($) may be included in thereplacement by escaping it with a backslash (\$):

SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m'

regexpreplace(_string, pattern, function) → varchar

Replaces every instance of the substring matched by the regular expressionpattern in string using function. The lambda expressionfunction is invoked for each match with the capturing groups passed as anarray. Capturing group numbers start at one; there is no group for the entire match(if you need this, surround the entire expression with parenthesis).

SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); --'New York'

regexpsplit(_string, pattern) → array<varchar>

Splits string using the regular expression pattern and returns anarray. Trailing empty strings are preserved:

SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]

原文: https://prestodb.io/docs/current/functions/regexp.html