6.9. String Functions and Operators
String Operators
The ||
operator performs concatenation.
String Functions
Note
These functions assume that the input strings contain valid UTF-8 encodedUnicode code points. There are no explicit checks for valid UTF-8 andthe functions may return incorrect results on invalid UTF-8.Invalid UTF-8 data can be corrected with from_utf8()
.
Additionally, the functions operate on Unicode code points and not uservisible characters (or grapheme clusters). Some languages combinemultiple code points into a single user-perceived character, the basicunit of a writing system for a language, but the functions will treat eachcode point as a separate unit.
The lower()
and upper()
functions do not performlocale-sensitive, context-sensitive, or one-to-many mappings required forsome languages. Specifically, this will return incorrect results forLithuanian, Turkish and Azeri.
chr
(n) → varcharReturns the Unicode code point
n
as a single character string.codepoint
(string) → integerReturns the Unicode code point of the only character of
string
.concat
(string1, …, stringN) → varcharReturns the concatenation of
string1
,string2
,…
,stringN
.This function provides the same functionality as theSQL-standard concatenation operator (||
).hammingdistance
(_string1, string2) → bigintReturns the Hamming distance of
string1
andstring2
,i.e. the number of positions at which the corresponding characters are different.Note that the two strings must have the same length.length
(string) → bigintReturns the length of
string
in characters.levenshteindistance
(_string1, string2) → bigintReturns the Levenshtein edit distance of
string1
andstring2
,i.e. the minimum number of single-character edits (insertions,deletions or substitutions) needed to changestring1
intostring2
.lower
(string) → varcharConverts
string
to lowercase.lpad
(string, size, padstring) → varcharLeft pads
string
tosize
characters withpadstring
.Ifsize
is less than the length ofstring
, the result istruncated tosize
characters.size
must not be negativeandpadstring
must be non-empty.ltrim
(string) → varcharRemoves leading whitespace from
string
.replace
(string, search) → varcharRemoves all instances of
search
fromstring
.replace
(string, search, replace) → varcharReplaces all instances of
search
withreplace
instring
.reverse
(string) → varcharReturns
string
with the characters in reverse order.rpad
(string, size, padstring) → varcharRight pads
string
tosize
characters withpadstring
.Ifsize
is less than the length ofstring
, the result istruncated tosize
characters.size
must not be negativeandpadstring
must be non-empty.rtrim
(string) → varcharRemoves trailing whitespace from
string
.split
(string, delimiter) -> array(varchar)Splits
string
ondelimiter
and returns an array.split
(string, delimiter, limit) -> array(varchar)Splits
string
ondelimiter
and returns an array of size at mostlimit
. The last element in the array always contain everythingleft in thestring
.limit
must be a positive number.splitpart
(_string, delimiter, index) → varcharSplits
string
ondelimiter
and returns the fieldindex
.Field indexes start with1
. If the index is larger than thanthe number of fields, then null is returned.splitto_map
(_string, entryDelimiter, keyValueDelimiter) → map- Splits
string
byentryDelimiter
andkeyValueDelimiter
and returns a map.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splitseach pair into key and value. Note thatentryDelimiter
andkeyValueDelimiter
areinterpreted literally, i.e., as full string matches. splitto_map
(_string, entryDelimiter, keyValueDelimiter, function(k, v1, v2, res)) → map- Splits
string
byentryDelimiter
andkeyValueDelimiter
and returns a map.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splitseach pair into key and value. Note thatentryDelimiter
andkeyValueDelimiter
areinterpreted literally, i.e., as full string matches.function(k, v1, v2, res)
is invoked in cases of duplicate keys to resolve the value that should be in the map.
SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> v1)); – {“a”: “1”, “b”: “2”}SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> CONCAT(v1, v2))); – {“a”: “13”, “b”: “2”}
splitto_multimap
(_string, entryDelimiter, keyValueDelimiter) -> map(varchar, array(varchar))Splits
string
byentryDelimiter
andkeyValueDelimiter
and returns a mapcontaining an array of values for each unique key.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splits each pair into key and value. Thevalues for each key will be in the same order as they appeared instring
.Note thatentryDelimiter
andkeyValueDelimiter
are interpreted literally,i.e., as full string matches.strpos
(string, substring) → bigintReturns the starting position of the first instance of
substring
instring
. Positions start with1
. If not found,0
is returned.strpos
(string, substring, instance) → bigintReturns the position of the N-th
instance
ofsubstring
instring
.instance
must be a positive number.Positions start with1
. If not found,0
is returned.strrpos
(string, substring) → bigintReturns the starting position of the last instance of
substring
instring
.Positions start with1
. If not found,0
is returned.strrpos
(string, substring, instance) → bigintReturns the position of the N-th
instance
ofsubstring
instring
starting from the end of the string.instance
must be a positive number.Positions start with1
. If not found,0
is returned.position
(substring IN string) → bigintReturns the starting position of the first instance of
substring
instring
. Positions start with1
. If not found,0
is returned.substr
(string, start) → varcharReturns the rest of
string
from the starting positionstart
.Positions start with1
. A negative starting position is interpretedas being relative to the end of the string.substr
(string, start, length) → varcharReturns a substring from
string
of lengthlength
from the startingpositionstart
. Positions start with1
. A negative startingposition is interpreted as being relative to the end of the string.trim
(string) → varcharRemoves leading and trailing whitespace from
string
.upper
(string) → varcharConverts
string
to uppercase.wordstem
(_word) → varcharReturns the stem of
word
in the English language.wordstem
(_word, lang) → varchar- Returns the stem of
word
in thelang
language.
Unicode Functions
normalize
(string) → varcharTransforms
string
with NFC normalization form.normalize
(string, form) → varchar- Transforms
string
with the specified normalization form.form
must be be one of the following keywords:
FormDescriptionNFD
Canonical DecompositionNFC
Canonical Decomposition, followed by Canonical CompositionNFKD
Compatibility DecompositionNFKC
Compatibility Decomposition, followed by Canonical Composition
Note
This SQL-standard function has special syntax and requiresspecifying form
as a keyword, not as a string.
toutf8
(_string) → varbinaryEncodes
string
into a UTF-8 varbinary representation.fromutf8
(_binary) → varcharDecodes a UTF-8 encoded string from
binary
. Invalid UTF-8 sequencesare replaced with the Unicode replacement characterU+FFFD
.fromutf8
(_binary, replace) → varchar- Decodes a UTF-8 encoded string from
binary
. Invalid UTF-8 sequencesare replaced with replace. The replacement string replace must eitherbe a single character or empty (in which case invalid characters areremoved).