6.9. String Functions and Operators
String Operators
The ||
operator performs concatenation.
String Functions
Note
These functions assume that the input strings contain valid UTF-8 encodedUnicode code points. There are no explicit checks for valid UTF-8 andthe functions may return incorrect results on invalid UTF-8.Invalid UTF-8 data can be corrected with from_utf8()
.
Additionally, the functions operate on Unicode code points and not uservisible characters (or grapheme clusters). Some languages combinemultiple code points into a single user-perceived character, the basicunit of a writing system for a language, but the functions will treat eachcode point as a separate unit.
The lower()
and upper()
functions do not performlocale-sensitive, context-sensitive, or one-to-many mappings required forsome languages. Specifically, this will return incorrect results forLithuanian, Turkish and Azeri.
chr
(n) → varchar
Returns the Unicode code pointn
as a single character string.
codepoint
(string) → integer
Returns the Unicode code point of the only character ofstring
.
concat
(string1, …, stringN) → varchar
Returns the concatenation ofstring1
,string2
,…
,stringN
.This function provides the same functionality as theSQL-standard concatenation operator (||
).
hammingdistance
(_string1, string2) → bigint
Returns the Hamming distance ofstring1
andstring2
,i.e. the number of positions at which the corresponding characters are different.Note that the two strings must have the same length.
length
(string) → bigint
Returns the length ofstring
in characters.
levenshteindistance
(_string1, string2) → bigint
Returns the Levenshtein edit distance ofstring1
andstring2
,i.e. the minimum number of single-character edits (insertions,deletions or substitutions) needed to changestring1
intostring2
.
lower
(string) → varchar
Convertsstring
to lowercase.
lpad
(string, size, padstring) → varchar
Left padsstring
tosize
characters withpadstring
.Ifsize
is less than the length ofstring
, the result istruncated tosize
characters.size
must not be negativeandpadstring
must be non-empty.
ltrim
(string) → varchar
Removes leading whitespace fromstring
.
replace
(string, search) → varchar
Removes all instances ofsearch
fromstring
.
replace
(string, search, replace) → varchar
Replaces all instances ofsearch
withreplace
instring
.
reverse
(string) → varchar
Returnsstring
with the characters in reverse order.
rpad
(string, size, padstring) → varchar
Right padsstring
tosize
characters withpadstring
.Ifsize
is less than the length ofstring
, the result istruncated tosize
characters.size
must not be negativeandpadstring
must be non-empty.
rtrim
(string) → varchar
Removes trailing whitespace fromstring
.
split
(string, delimiter) → array<varchar>
Splitsstring
ondelimiter
and returns an array.
split
(string, delimiter, limit) → array<varchar>
Splitsstring
ondelimiter
and returns an array of size at mostlimit
. The last element in the array always contain everythingleft in thestring
.limit
must be a positive number.
splitpart
(_string, delimiter, index) → varchar
Splitsstring
ondelimiter
and returns the fieldindex
.Field indexes start with1
. If the index is larger than thanthe number of fields, then null is returned.
splitto_map
(_string, entryDelimiter, keyValueDelimiter) → map<varchar, varchar>
Splitsstring
byentryDelimiter
andkeyValueDelimiter
and returns a map.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splitseach pair into key and value.
splitto_multimap
(_string, entryDelimiter, keyValueDelimiter) → map<varchar, array<varchar>>
Splitsstring
byentryDelimiter
andkeyValueDelimiter
and returns a mapcontaining an array of values for each unique key.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splits each pair into key and value. Thevalues for each key will be in the same order as they appeared instring
.
strpos
(string, substring) → bigint
Returns the starting position of the first instance ofsubstring
instring
. Positions start with1
. If not found,0
is returned.
position
(substring IN string) → bigint
Returns the starting position of the first instance ofsubstring
instring
. Positions start with1
. If not found,0
is returned.
substr
(string, start) → varchar
Returns the rest ofstring
from the starting positionstart
.Positions start with1
. A negative starting position is interpretedas being relative to the end of the string.
substr
(string, start, length) → varchar
Returns a substring fromstring
of lengthlength
from the startingpositionstart
. Positions start with1
. A negative startingposition is interpreted as being relative to the end of the string.
trim
(string) → varchar
Removes leading and trailing whitespace fromstring
.
upper
(string) → varchar
Convertsstring
to uppercase.
wordstem
(_word) → varchar
Returns the stem ofword
in the English language.
wordstem
(_word, lang) → varchar
Returns the stem ofword
in thelang
language.
Unicode Functions
normalize
(string) → varchar
Transformsstring
with NFC normalization form.
normalize
(string, form) → varchar
Transformsstring
with the specified normalization form.form
must be be one of the following keywords:
|Form|Description
|——-
|NFD
|Canonical Decomposition
|NFC
|Canonical Decomposition, followed by Canonical Composition
|NFKD
|Compatibility Decomposition
|NFKC
|Compatibility Decomposition, followed by Canonical Composition
Note
This SQL-standard function has special syntax and requiresspecifyingform
as a keyword, not as a string.
toutf8
(_string) → varbinary
Encodesstring
into a UTF-8 varbinary representation.
fromutf8
(_binary) → varchar
Decodes a UTF-8 encoded string frombinary
. Invalid UTF-8 sequencesare replaced with the Unicode replacement characterU+FFFD
.
fromutf8
(_binary, replace) → varchar
Decodes a UTF-8 encoded string frombinary
. Invalid UTF-8 sequencesare replaced with replace. The replacement string replace must eitherbe a single character or empty (in which case invalid characters areremoved).