- String Functions
- Supported functions
- ASCII()
- BIN()
- BIT_LENGTH()
- CHAR()
- CHAR_LENGTH()
- CHARACTER_LENGTH()
- CONCAT()
- CONCAT_WS()
- ELT()
- EXPORT_SET()
- FIELD()
- FIND_IN_SET()
- FORMAT()
- FROM_BASE64()
- HEX()
- INSERT()
- INSTR()
- LCASE()
- LEFT()
- LENGTH()
- LIKE
- LOCATE()
- LOWER()
- LPAD()
- LTRIM()
- MAKE_SET()
- MID()
- NOT LIKE
- NOT REGEXP
- OCT()
- OCTET_LENGTH()
- ORD()
- POSITION()
- QUOTE()
- REGEXP
- REGEXP_INSTR()
- REGEXP_LIKE()
- REGEXP_REPLACE()
- REGEXP_SUBSTR()
- REPEAT()
- REPLACE()
- REVERSE()
- RIGHT()
- RLIKE
- RPAD()
- RTRIM()
- SPACE()
- STRCMP()
- SUBSTR()
- SUBSTRING()
- SUBSTRING_INDEX()
- TO_BASE64()
- TRANSLATE()
- TRIM()
- UCASE()
- UNHEX()
- UPPER()
- WEIGHT_STRING()
- Unsupported functions
- Regular expression compatibility with MySQL
- Supported functions
String Functions
TiDB supports most of the string functions available in MySQL 8.0, and some of the functions available in Oracle 21.
For comparisons between functions and syntax of Oracle and TiDB, see Comparisons between Functions and Syntax of Oracle and TiDB.
Supported functions
ASCII()
The ASCII(str)
function is used to get the ASCII value of the leftmost character in the given argument. The argument can be either a string or a number.
- If the argument is not empty, the function returns the ASCII value of the leftmost character.
- If the argument is an empty string, the function returns
0
. - If the argument is
NULL
, the function returnsNULL
.
Note
ASCII(str)
only works for characters represented using 8 bits of binary digits (one byte).
Example:
SELECT ASCII('A'), ASCII('TiDB'), ASCII(23);
Output:
+------------+---------------+-----------+
| ASCII('A') | ASCII('TiDB') | ASCII(23) |
+------------+---------------+-----------+
| 65 | 84 | 50 |
+------------+---------------+-----------+
BIN()
The BIN()
function is used to convert the given argument into a string representation of its binary value. The argument can be either a string or a number.
- If the argument is a positive number, the function returns a string representation of its binary value.
- If the argument is a negative number, the function converts the absolute value of the argument to its binary representation, inverts each bit of the binary value (changing
0
to1
and1
to0
), and then adds1
to the inverted value. - If the argument is a string containing only digits, the function returns the result according to those digits. For example, the results for
"123"
and123
are the same. - If the argument is a string and its first character is not a digit (such as
"q123"
), the function returns0
. - If the argument is a string that consists of digits and non-digits, the function returns the result according to the consecutive digits at the beginning of the argument. For example, the results for
"123q123"
and123
are the same, butBIN('123q123')
generates a warning likeTruncated incorrect INTEGER value: '123q123'
. - If the argument is
NULL
, the function returnsNULL
.
Example 1:
SELECT BIN(123), BIN('123q123');
Output 1:
+----------+----------------+
| BIN(123) | BIN('123q123') |
+----------+----------------+
| 1111011 | 1111011 |
+----------+----------------+
Example 2:
SELECT BIN(-7);
Output 2:
+------------------------------------------------------------------+
| BIN(-7) |
+------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111001 |
+------------------------------------------------------------------+
BIT_LENGTH()
The BIT_LENGTH()
function is used to return the length of a given argument in bits.
Examples:
SELECT BIT_LENGTH("TiDB");
+--------------------+
| BIT_LENGTH("TiDB") |
+--------------------+
| 32 |
+--------------------+
8 bits per character x 4 characters = 32 bits
SELECT BIT_LENGTH("PingCAP 123");
+---------------------------+
| BIT_LENGTH("PingCAP 123") |
+---------------------------+
| 88 |
+---------------------------+
8 bits per character (space is counted because it is a non-alphanumeric character) x 11 characters = 88 bits
SELECT CustomerName, BIT_LENGTH(CustomerName) AS BitLengthOfName FROM Customers;
+--------------------+-----------------+
| CustomerName | BitLengthOfName |
+--------------------+-----------------+
| Albert Einstein | 120 |
| Robert Oppenheimer | 144 |
+--------------------+-----------------+
Note
The preceding example operates under the assumption that there is a database with a table named Customers
and a column inside the table named CustomerName
.
CHAR()
The CHAR()
function is used to get the corresponding character of a specific ASCII value. It performs the opposite operation of ASCII()
, which returns the ASCII value of a specific character. If multiple arguments are supplied, the function works on all arguments and are then concaternated together.
Examples:
SELECT CHAR(65);
+------------+
| CHAR(65) |
+------------+
| A |
+------------+
SELECT CHAR(84);
+------------+
| CHAR(84) |
+------------+
| T |
+------------+
The CHAR()
function can also be used to get the corresponding character of ASCII values that extend beyond the standard ASCII range (0
- 127
).
/*For extended ASCII: */
SELECT CHAR(128);
+------------+
| CHAR(128) |
+------------+
| 0x80 |
+------------+
The CHAR()
function can also get the corresponding character value of a unicode value.
/* For Unicode: */
--skip-binary-as-hex
SELECT CHAR(50089);
+--------------+
| CHAR(50089) |
+--------------+
| é |
+--------------+
SELECT CHAR(65,66,67);
+----------------+
| CHAR(65,66,67) |
+----------------+
| ABC |
+----------------+
1 row in set (0.00 sec)
CHAR_LENGTH()
The CHAR_LENGTH()
function is used to get the total number of characters in a given argument as an integer.
Examples:
SELECT CHAR_LENGTH("TiDB") AS LengthOfString;
+----------------+
| LengthOfString |
+----------------+
| 4 |
+----------------+
SELECT CustomerName, CHAR_LENGTH(CustomerName) AS LengthOfName FROM Customers;
+--------------------+--------------+
| CustomerName | LengthOfName |
+--------------------+--------------+
| Albert Einstein | 15 |
| Robert Oppenheimer | 18 |
+--------------------+--------------+
Note
The preceding example operates under the assumption that there is a database with a table named Customers
and a column inside the table named CustomerName
.
CHARACTER_LENGTH()
The CHARACTER_LENGTH()
function is the same as the CHAR_LENGTH()
function. Both functions can be used synonymously because they generate the same output.
CONCAT()
The CONCAT()
function concatenates one or more arguments into a single string.
Syntax:
CONCAT(str1,str2,...)
str1, str2, ...
is a list of arguments to be concatenated. Each argument can be a string or a number.
Example:
SELECT CONCAT('TiDB', ' ', 'Server', '-', 1, TRUE);
Output:
+---------------------------------------------+
| CONCAT('TiDB', ' ', 'Server', '-', 1, TRUE) |
+---------------------------------------------+
| TiDB Server-11 |
+---------------------------------------------+
If any of the arguments is NULL
, CONCAT()
returns NULL
.
Example:
SELECT CONCAT('TiDB', NULL, 'Server');
Output:
+--------------------------------+
| CONCAT('TiDB', NULL, 'Server') |
+--------------------------------+
| NULL |
+--------------------------------+
In addition to the CONCAT()
function, you can concatenate strings by placing them adjacent to each other as in the following example. Note that this method does not support numeric types.
SELECT 'Ti' 'DB' ' ' 'Server';
Output:
+-------------+
| Ti |
+-------------+
| TiDB Server |
+-------------+
CONCAT_WS()
The CONCAT_WS()
function is a form of CONCAT() with a separator, which returns a string concatenated by the specified separator.
Syntax:
CONCAT_WS(separator,str1,str2,...)
separator
: the first argument is the separator, which concatenates the remaining arguments that are notNULL
.str1, str2, ...
: a list of arguments to be concatenated. Each argument can be a string or a number.
Example:
SELECT CONCAT_WS(',', 'TiDB Server', 'TiKV', 'PD');
Output:
+---------------------------------------------+
| CONCAT_WS(',', 'TiDB Server', 'TiKV', 'PD') |
+---------------------------------------------+
| TiDB Server,TiKV,PD |
+---------------------------------------------+
If the separator is an empty string,
CONCAT_WS()
is equivalent toCONCAT()
and returns the concatenated string of the remaining arguments.Example:
SELECT CONCAT_WS('', 'TiDB Server', 'TiKV', 'PD');
Output:
+--------------------------------------------+
| CONCAT_WS('', 'TiDB Server', 'TiKV', 'PD') |
+--------------------------------------------+
| TiDB ServerTiKVPD |
+--------------------------------------------+
If the separator is
NULL
,CONCAT_WS()
returnsNULL
.Example:
SELECT CONCAT_WS(NULL, 'TiDB Server', 'TiKV', 'PD');
Output:
+----------------------------------------------+
| CONCAT_WS(NULL, 'TiDB Server', 'TiKV', 'PD') |
+----------------------------------------------+
| NULL |
+----------------------------------------------+
If only one of the arguments to be concatenated is not
NULL
,CONCAT_WS()
returns that argument.Example:
SELECT CONCAT_WS(',', 'TiDB Server', NULL);
Output:
+-------------------------------------+
| CONCAT_WS(',', 'TiDB Server', NULL) |
+-------------------------------------+
| TiDB Server |
+-------------------------------------+
If there are
NULL
arguments to be concatenated,CONCAT_WS()
skips theseNULL
arguments.Example:
SELECT CONCAT_WS(',', 'TiDB Server', NULL, 'PD');
Output:
+-------------------------------------------+
| CONCAT_WS(',', 'TiDB Server', NULL, 'PD') |
+-------------------------------------------+
| TiDB Server,PD |
+-------------------------------------------+
If there are empty strings to be concatenated,
CONCAT_WS()
does not skip empty strings.Example:
SELECT CONCAT_WS(',', 'TiDB Server', '', 'PD');
Output:
+-----------------------------------------+
| CONCAT_WS(',', 'TiDB Server', '', 'PD') |
+-----------------------------------------+
| TiDB Server,,PD |
+-----------------------------------------+
ELT()
The ELT()
function returns the element at the index number.
SELECT ELT(3, 'This', 'is', 'TiDB');
+------------------------------+
| ELT(3, 'This', 'is', 'TiDB') |
+------------------------------+
| TiDB |
+------------------------------+
1 row in set (0.00 sec)
The preceding example returns the third element, which is 'TiDB'
.
EXPORT_SET()
The EXPORT_SET()
function returns a string that consists of a specified number (number_of_bits
) of on
/off
values, optionally separated by separator
. These values are based on whether the corresponding bit in the bits
argument is 1
, where the first value corresponds to the rightmost (lowest) bit of bits
.
Syntax:
EXPORT_SET(bits, on, off, [separator[, number_of_bits]])
bits
: an integer representing the bit value.on
: the string to be returned if the corresponding bit is1
.off
: the string to be returned if the corresponding bit is0
.separator
(optional): the separator character in the result string.number_of_bits
(optional): the number of bits to be processed. If it is not set,64
(the max size of bits) is used by default, which means thatbits
is treated as an unsigned 64-bit integer.
Examples:
In the following example, number_of_bits
is set to 5
, resulting in 5 values, separated by |
. Because only 3 bits are given, the other bits are considered unset. Therefore, setting number_of_bits
to either 101
or 00101
results in the same output.
SELECT EXPORT_SET(b'101',"ON",'off','|',5);
+-------------------------------------+
| EXPORT_SET(b'101',"ON",'off','|',5) |
+-------------------------------------+
| ON|off|ON|off|off |
+-------------------------------------+
1 row in set (0.00 sec)
In the following example, bits
is set to 00001111
, on
is set to x
, and off
is set to _
. This causes the function to return ____
for the 0
bits and xxxx
for the 1
bits. Therefore, when processing with the bits in 00001111
from right to left, the function returns xxxx____
.
SELECT EXPORT_SET(b'00001111', 'x', '_', '', 8);
+------------------------------------------+
| EXPORT_SET(b'00001111', 'x', '_', '', 8) |
+------------------------------------------+
| xxxx____ |
+------------------------------------------+
1 row in set (0.00 sec)
In the following example, bits
is set to 00001111
, on
is set to x
, and off
is set to _
. This causes the function to return x
for each 1
bit and _
for each 0
bit. Therefore, when processing with the bits in 01010101
from right to left, the function returns x_x_x_x_
.
SELECT EXPORT_SET(b'01010101', 'x', '_', '', 8);
+------------------------------------------+
| EXPORT_SET(b'01010101', 'x', '_', '', 8) |
+------------------------------------------+
| x_x_x_x_ |
+------------------------------------------+
1 row in set (0.00 sec)
FIELD()
Return the index (position) of the first argument in the subsequent arguments.
In the following example, the first argument of FIELD()
is needle
, and it matches the second argument in the following list, so the function returns 2
.
SELECT FIELD('needle', 'A', 'needle', 'in', 'a', 'haystack');
+-------------------------------------------------------+
| FIELD('needle', 'A', 'needle', 'in', 'a', 'haystack') |
+-------------------------------------------------------+
| 2 |
+-------------------------------------------------------+
1 row in set (0.00 sec)
FIND_IN_SET()
Return the index position of the first argument within the second argument.
This function is often used with the SET data type.
In the following example, Go
is the fourth element in the set COBOL,BASIC,Rust,Go,Java,Fortran
, so the function returns 4
.
SELECT FIND_IN_SET('Go', 'COBOL,BASIC,Rust,Go,Java,Fortran');
+-------------------------------------------------------+
| FIND_IN_SET('Go', 'COBOL,BASIC,Rust,Go,Java,Fortran') |
+-------------------------------------------------------+
| 4 |
+-------------------------------------------------------+
1 row in set (0.00 sec)
FORMAT()
The FORMAT(X,D[,locale])
function is used to format the number X
to a format similar to "#,###,###. ##"
, rounded to D
decimal places, and return the result as a string.
Arguments:
X
: the number to be formatted. It can be a direct numeric value, a numeric string, or a number in scientific notation.D
: the number of decimal places for the returned value. The function rounds the numberX
toD
decimal places. IfD
is greater than the actual number of decimal places inX
, the result is padded with zeros to the corresponding length.[locale]
: specifies a locale setting to be used for grouping between decimal points, thousands separators, and separators for resultant numbers. A valid locale value is the same as the valid value of the lc_time_names system variable. If not specified or the region setting isNULL
, the'en_US'
region setting is used by default. This argument is optional.
Behaviors:
- If the first argument is a string and contains only numbers, the function returns a result based on that numeric value. For example,
FORMAT('12.34', 1)
andFORMAT(12.34, 1)
return the same result. - If the first argument is a number represented in scientific notation (using
E/e
), the function returns the result based on that number. For example,FORMAT('1E2', 3)
returns100.000
. - If the first argument is a string starting with non-numeric characters, the function returns zero and a warning
(Code 1292)
. For example,FORMAT('q12.36', 5)
returns0.00000
, but also includes a warningWarning (Code 1292): Truncated incorrect DOUBLE value: 'q12.36'
. - If the first argument is a string mixing numbers and non-numbers, the function returns a result based on the consecutive numeric part at the beginning of the argument, and also includes a warning
(Code 1292)
. For example,FORMAT('12.36q56.78', 1)
returns the same numeric result asFORMAT('12.36', 1)
, but includes a warningWarning (Code 1292): Truncated incorrect DOUBLE value: '12.36q56.78'
. - If the second argument is zero or a negative number, the function truncates the decimal part and returns an integer.
- If any of the arguments is
NULL
, the function returnsNULL
.
Examples:
The following examples show how to format the number 12.36 to different decimal places:
mysql> SELECT FORMAT(12.36, 1);
+------------------+
| FORMAT(12.36, 1) |
+------------------+
| 12.4 |
+------------------+
mysql> SELECT FORMAT(12.36, 5);
+------------------+
| FORMAT(12.36, 5) |
+------------------+
| 12.36000 |
+------------------+
mysql> SELECT FORMAT(12.36, 2);
+------------------+
| FORMAT(12.36, 2) |
+------------------+
| 12.36 |
+------------------+
FROM_BASE64()
The FROM_BASE64()
function is used to decode a Base64 encoded string and return the decoded result in its hexadecimal form.
- This function accepts a single argument, that is, the Base64 encoded string to be decoded.
- If the argument is
NULL
or not a valid Base64 encoded string, theFROM_BASE64()
function returnsNULL
.
Examples:
The following example shows how to decode the Base64 encoded string 'SGVsbG8gVGlEQg=='
. This string is the result of encoding 'Hello TiDB'
, using the TO_BASE64() function.
mysql> SELECT TO_BASE64('Hello TiDB');
+-------------------------+
| TO_BASE64('Hello TiDB') |
+-------------------------+
| SGVsbG8gVGlEQg== |
+-------------------------+
mysql> SELECT FROM_BASE64('SGVsbG8gVGlEQg==');
+------------------------------------------------------------------+
| FROM_BASE64('SGVsbG8gVGlEQg==') |
+------------------------------------------------------------------+
| 0x48656C6C6F2054694442 |
+------------------------------------------------------------------+
mysql> SELECT CONVERT(FROM_BASE64('SGVsbG8gVGlEQg==') USING utf8mb4);
+--------------------------------------------------------+
| CONVERT(FROM_BASE64('SGVsbG8gVGlEQg==') USING utf8mb4) |
+--------------------------------------------------------+
| Hello TiDB |
+--------------------------------------------------------+
The following example shows how to decode the Base64 encoded number MTIzNDU2
. This string is the result of encoding 123456
, which can be done using the TO_BASE64() function.
mysql> SELECT FROM_BASE64('MTIzNDU2');
+--------------------------------------------------+
| FROM_BASE64('MTIzNDU2') |
+--------------------------------------------------+
| 0x313233343536 |
+--------------------------------------------------+
HEX()
The HEX()
function is used to convert the given argument into a string representation of its hexadecimal value. The argument can be either a string or a number.
- If the argument is a string,
HEX(str)
returns a hexadecimal string representation ofstr
. The function converts each byte of each character instr
into two hexadecimal digits. For example, the charactera
in a UTF-8 or ASCII character set is represented as a binary value of00111101
, or61
in hexadecimal notation. - If the argument is a number,
HEX(n)
returns a hexadecimal string representation ofn
. The function treats the argumentn
as aBIGINT
number, equivalent to usingCONV(n, 10, 16)
. - If the argument is
NULL
, the function returnsNULL
.
Examples:
SELECT X'616263', HEX('abc'), UNHEX(HEX('abc')), 0x616263;
+-----------+------------+-------------------+----------+
| X'616263' | HEX('abc') | UNHEX(HEX('abc')) | 0x616263 |
+-----------+------------+-------------------+----------+
| abc | 616263 | abc | abc |
+-----------+------------+-------------------+----------+
SELECT X'F09F8DA3', HEX('🍣'), UNHEX(HEX('🍣')), 0xF09F8DA3;
+-------------+-------------+--------------------+------------+
| X'F09F8DA3' | HEX('🍣') | UNHEX(HEX('🍣')) | 0xF09F8DA3 |
+-------------+-------------+--------------------+------------+
| 🍣 | F09F8DA3 | 🍣 | 🍣 |
+-------------+-------------+--------------------+------------+
SELECT HEX(255), CONV(HEX(255), 16, 10);
+----------+------------------------+
| HEX(255) | CONV(HEX(255), 16, 10) |
+----------+------------------------+
| FF | 255 |
+----------+------------------------+
SELECT HEX(NULL);
+-----------+
| HEX(NULL) |
+-----------+
| NULL |
+-----------+
INSERT()
The INSERT(str, pos, len, newstr)
function is used to replace a substring in str
(that starts at position pos
and is len
characters long) with the string newstr
. This function is multibyte safe.
- If
pos
exceeds the length ofstr
, the function returns the original stringstr
without modification. - If
len
exceeds the remaining length ofstr
from positionpos
, the function replaces the rest of the string from positionpos
. - If any argument is
NULL
, the function returnsNULL
.
Examples:
SELECT INSERT('He likes tennis', 4, 5, 'plays');
+------------------------------------------+
| INSERT('He likes tennis', 4, 5, 'plays') |
+------------------------------------------+
| He plays tennis |
+------------------------------------------+
SELECT INSERT('He likes tennis', -1, 5, 'plays');
+-------------------------------------------+
| INSERT('He likes tennis', -1, 5, 'plays') |
+-------------------------------------------+
| He likes tennis |
+-------------------------------------------+
SELECT INSERT('He likes tennis', 4, 100, 'plays');
+--------------------------------------------+
| INSERT('He likes tennis', 4, 100, 'plays') |
+--------------------------------------------+
| He plays |
+--------------------------------------------+
SELECT INSERT('He likes tenis', 10, 100, '🍣');
+-------------------------------------------+
| INSERT('He likes tenis', 10, 100, '🍣') |
+-------------------------------------------+
| He likes 🍣 |
+-------------------------------------------+
SELECT INSERT('あああああああ', 2, 3, 'いいい');
+----------------------------------------------------+
| INSERT('あああああああ', 2, 3, 'いいい') |
+----------------------------------------------------+
| あいいいあああ |
+----------------------------------------------------+
SELECT INSERT('あああああああ', 2, 3, 'xx');
+---------------------------------------------+
| INSERT('あああああああ', 2, 3, 'xx') |
+---------------------------------------------+
| あxxあああ |
+---------------------------------------------+
INSTR()
The INSTR(str, substr)
function is used to get the position of the first occurrence of substr
in str
. Each argument can be either a string or a number. This function is the same as the two-argument version of LOCATE(substr, str), but with the order of the arguments reversed.
Note
The case sensitivity of INSTR(str, substr)
is determined by the collations used in TiDB. Binary collations (with the suffix _bin
) are case-sensitive, while general collations (with the suffix _general_ci
or _ai_ci
, and) are case-insensitive.
- If either argument is a number, the function treats the number as a string.
- If
substr
is not instr
, the function returns0
. Otherwise, it returns the position of the first occurrence ofsubstr
instr
. - If either argument is
NULL
, the function returnsNULL
.
Examples:
SELECT INSTR("pingcap.com", "tidb");
+------------------------------+
| INSTR("pingcap.com", "tidb") |
+------------------------------+
| 0 |
+------------------------------+
SELECT INSTR("pingcap.com/tidb", "tidb");
+-----------------------------------+
| INSTR("pingcap.com/tidb", "tidb") |
+-----------------------------------+
| 13 |
+-----------------------------------+
SELECT INSTR("pingcap.com/tidb" COLLATE utf8mb4_bin, "TiDB");
+-------------------------------------------------------+
| INSTR("pingcap.com/tidb" COLLATE utf8mb4_bin, "TiDB") |
+-------------------------------------------------------+
| 0 |
+-------------------------------------------------------+
SELECT INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB");
+--------------------------------------------------------------+
| INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB") |
+--------------------------------------------------------------+
| 13 |
+--------------------------------------------------------------+
SELECT INSTR(0123, "12");
+-------------------+
| INSTR(0123, "12") |
+-------------------+
| 1 |
+-------------------+
LCASE()
The LCASE(str)
function is a synonym for LOWER(str), which returns the lowercase of the given argument.
LEFT()
The LEFT()
function returns a specified number of characters from the left side of a string.
Syntax:
LEFT(`str`, `len`)
str
: the original string to extract characters. Ifstr
contains a multibyte character, the function counts it as a single code point.len
: the length of characters to be returned.- If
len
is equal to or less than 0, the function returns an empty string. - If
len
is equal to or greater than the length ofstr
, the function returns the originalstr
.
- If
- If any argument is
NULL
, the function returnsNULL
.
Examples:
SELECT LEFT('ABCED', 3);
+------------------+
| LEFT('ABCED', 3) |
+------------------+
| ABC |
+------------------+
SELECT LEFT('ABCED', 6);
+------------------+
| LEFT('ABCED', 6) |
+------------------+
| ABCED |
+------------------+
SELECT LEFT('ABCED', 0);
+------------------+
| LEFT('ABCED', 0) |
+------------------+
| |
+------------------+
SELECT LEFT('ABCED', -1);
+-------------------+
| LEFT('ABCED', -1) |
+-------------------+
| |
+-------------------+
SELECT LEFT('🍣ABC', 3);
+--------------------+
| LEFT('🍣ABC', 3) |
+--------------------+
| 🍣AB |
+--------------------+
SELECT LEFT('ABC', NULL);
+-------------------+
| LEFT('ABC', NULL) |
+-------------------+
| NULL |
+-------------------+
SELECT LEFT(NULL, 3);
+------------------------------+
| LEFT(NULL, 3) |
+------------------------------+
| NULL |
+------------------------------+
LENGTH()
The LENGTH()
function returns the length of a string in bytes.
LENGTH()
counts a multibyte character as multiple bytes while CHAR_LENGTH()
counts a multibyte character as a single code point.
If the argument is NULL
, the function returns NULL
.
Examples:
SELECT LENGTH('ABC');
+---------------+
| LENGTH('ABC') |
+---------------+
| 3 |
+---------------+
SELECT LENGTH('🍣ABC');
+-------------------+
| LENGTH('🍣ABC') |
+-------------------+
| 7 |
+-------------------+
SELECT CHAR_LENGTH('🍣ABC');
+------------------------+
| CHAR_LENGTH('🍣ABC') |
+------------------------+
| 4 |
+------------------------+
SELECT LENGTH(NULL);
+--------------+
| LENGTH(NULL) |
+--------------+
| NULL |
+--------------+
LIKE
The LIKE
operator is used for simple string matching. The expression expr LIKE pat [ESCAPE 'escape_char']
returns 1
(TRUE
) or 0
(FALSE
). If either expr
or pat
is NULL
, the result is NULL
.
You can use the following two wildcard parameters with LIKE
:
%
matches any number of characters, including zero characters._
matches exactly one character.
The following examples use the utf8mb4_bin
collation:
SET collation_connection='utf8mb4_bin';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+-------------+
| Variable_name | Value |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
+----------------------+-------------+
SELECT NULL LIKE '%' as result;
+--------+
| result |
+--------+
| NULL |
+--------+
SELECT 'sushi!!!' LIKE 'sushi_' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%sushi%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%SUSHI%' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%🍣%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
The default escape character is \
:
SELECT 'sushi!!!' LIKE 'sushi\_' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
SELECT 'sushi_' LIKE 'sushi\_' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
To specify a different escape character, such as *
, you can use the ESCAPE
clause:
SELECT 'sushi_' LIKE 'sushi*_' ESCAPE '*' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
SELECT 'sushi!' LIKE 'sushi*_' ESCAPE '*' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
You can use the LIKE
operator to match a numeric value:
SELECT 10 LIKE '1%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
SELECT 10000 LIKE '12%' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
To specify a collation explicitly, such as utf8mb4_unicode_ci
, you can use COLLATE
:
SELECT '🍣🍺Sushi🍣🍺' COLLATE utf8mb4_unicode_ci LIKE '%SUSHI%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
LOCATE()
The LOCATE(substr, str[, pos])
function is used to get the position of the first occurrence of a specified substring substr
in a string str
. The pos
argument is optional and specifies the starting position for the search.
- If the substring
substr
is not present instr
, the function returns0
. - If any argument is
NULL
, the function returnsNULL
. - This function is multibyte safe and performs a case-sensitive search only if at least one argument is a binary string.
The following examples use the utf8mb4_bin
collation:
SET collation_connection='utf8mb4_bin';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+-------------+
| Variable_name | Value |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
+----------------------+-------------+
SELECT LOCATE('bar', 'foobarbar');
+----------------------------+
| LOCATE('bar', 'foobarbar') |
+----------------------------+
| 4 |
+----------------------------+
SELECT LOCATE('baz', 'foobarbar');
+----------------------------+
| LOCATE('baz', 'foobarbar') |
+----------------------------+
| 0 |
+----------------------------+
SELECT LOCATE('bar', 'fooBARBAR');
+----------------------------+
| LOCATE('bar', 'fooBARBAR') |
+----------------------------+
| 0 |
+----------------------------+
SELECT LOCATE('bar', 'foobarBAR', 100);
+---------------------------------+
| LOCATE('bar', 'foobarBAR', 100) |
+---------------------------------+
| 0 |
+---------------------------------+
SELECT LOCATE('bar', 'foobarbar', 5);
+-------------------------------+
| LOCATE('bar', 'foobarbar', 5) |
+-------------------------------+
| 7 |
+-------------------------------+
SELECT LOCATE('bar', NULL);
+---------------------+
| LOCATE('bar', NULL) |
+---------------------+
| NULL |
+---------------------+
SELECT LOCATE('い', 'たいでぃーびー');
+----------------------------------------+
| LOCATE('い', 'たいでぃーびー') |
+----------------------------------------+
| 2 |
+----------------------------------------+
SELECT LOCATE('い', 'たいでぃーびー', 3);
+-------------------------------------------+
| LOCATE('い', 'たいでぃーびー', 3) |
+-------------------------------------------+
| 0 |
+-------------------------------------------+
The following examples use the utf8mb4_unicode_ci
collation:
SET collation_connection='utf8mb4_unicode_ci';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
+----------------------+--------------------+
SELECT LOCATE('い', 'たいでぃーびー', 3);
+-------------------------------------------+
| LOCATE('い', 'たいでぃーびー', 3) |
+-------------------------------------------+
| 4 |
+-------------------------------------------+
SELECT LOCATE('🍺', '🍣🍣🍣🍺🍺');
+----------------------------------------+
| LOCATE('🍺', '🍣🍣🍣🍺🍺') |
+----------------------------------------+
| 1 |
+----------------------------------------+
The following multibyte and binary string examples use the utf8mb4_bin
collation:
SET collation_connection='utf8mb4_bin';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+-------------+
| Variable_name | Value |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
+----------------------+-------------+
SELECT LOCATE('🍺', '🍣🍣🍣🍺🍺');
+----------------------------------------+
| LOCATE('🍺', '🍣🍣🍣🍺🍺') |
+----------------------------------------+
| 4 |
+----------------------------------------+
SELECT LOCATE('b', _binary'aBcde');
+-----------------------------+
| LOCATE('b', _binary'aBcde') |
+-----------------------------+
| 0 |
+-----------------------------+
SELECT LOCATE('B', _binary'aBcde');
+-----------------------------+
| LOCATE('B', _binary'aBcde') |
+-----------------------------+
| 2 |
+-----------------------------+
SELECT LOCATE(_binary'b', 'aBcde');
+-----------------------------+
| LOCATE(_binary'b', 'aBcde') |
+-----------------------------+
| 0 |
+-----------------------------+
SELECT LOCATE(_binary'B', 'aBcde');
+-----------------------------+
| LOCATE(_binary'B', 'aBcde') |
+-----------------------------+
| 2 |
+-----------------------------+
LOWER()
The LOWER(str)
function is used to convert all characters in the given argument str
to lowercase. The argument can be either a string or a number.
- If the argument is a string, the function returns the string in lowercase.
- If the argument is a number, the function returns the number without leading zeros.
- If the argument is
NULL
, the function returnsNULL
.
Examples:
SELECT LOWER("TiDB");
+---------------+
| LOWER("TiDB") |
+---------------+
| tidb |
+---------------+
SELECT LOWER(-012);
+-------------+
| LOWER(-012) |
+-------------+
| -12 |
+-------------+
LPAD()
The LPAD(str, len, padstr)
function returns the string argument, left-padded with the specified string padstr
to a length of len
characters.
- If
len
is less than the length of the stringstr
, the function truncates the stringstr
to the length oflen
. - If
len
is a negative number, the function returnsNULL
. - If any argument is
NULL
, the function returnsNULL
.
Examples:
SELECT LPAD('TiDB',8,'>');
+--------------------+
| LPAD('TiDB',8,'>') |
+--------------------+
| >>>>TiDB |
+--------------------+
1 row in set (0.00 sec)
SELECT LPAD('TiDB',2,'>');
+--------------------+
| LPAD('TiDB',2,'>') |
+--------------------+
| Ti |
+--------------------+
1 row in set (0.00 sec)
SELECT LPAD('TiDB',-2,'>');
+---------------------+
| LPAD('TiDB',-2,'>') |
+---------------------+
| NULL |
+---------------------+
1 row in set (0.00 sec)
LTRIM()
The LTRIM()
function removes leading spaces from a given string.
If the argument is NULL
, this function returns NULL
.
Note
This function only removes the space character (U+0020) and does not remove other space-like characters such as tab (U+0009) or non-breaking space (U+00A0).
Examples:
In the following example, the LTRIM()
function removes the leading spaces from ' hello'
and returns hello
.
SELECT LTRIM(' hello');
+--------------------+
| LTRIM(' hello') |
+--------------------+
| hello |
+--------------------+
1 row in set (0.00 sec)
In the following example, CONCAT() is used to enclose the result of LTRIM(' hello')
with «
and »
. This formatting makes it a bit easier to see that all leading spaces are removed.
SELECT CONCAT('«',LTRIM(' hello'),'»');
+------------------------------------+
| CONCAT('«',LTRIM(' hello'),'»') |
+------------------------------------+
| «hello» |
+------------------------------------+
1 row in set (0.00 sec)
MAKE_SET()
The MAKE_SET()
function returns a set of comma-separated strings based on whether a corresponding bit in the bits
argument is set to 1
.
Syntax:
MAKE_SET(bits, str1, str2, ...)
bits
: controls which subsequent string arguments to include in the result set. Ifbits
is set toNULL
, the function returnsNULL
.str1, str2, ...
: a list of strings. Each string corresponds to a bit in thebits
argument from right to left.str1
corresponds to the first bit from the right,str2
corresponds to the second bit from the right, and so on. If the corresponding bit is1
, the string is included in the result; otherwise, it is not included.
Examples:
In the following example, because all bits are set to 0
in the bits
argument, the function does not include any subsequent strings in the result and returns an empty string.
SELECT MAKE_SET(b'000','foo','bar','baz');
+------------------------------------+
| MAKE_SET(b'000','foo','bar','baz') |
+------------------------------------+
| |
+------------------------------------+
1 row in set (0.00 sec)
In the following example, because only the first bit from the right is 1
, the function only returns the first string foo
.
SELECT MAKE_SET(b'001','foo','bar','baz');
+------------------------------------+
| MAKE_SET(b'001','foo','bar','baz') |
+------------------------------------+
| foo |
+------------------------------------+
1 row in set (0.00 sec)
In the following example, because only the second bit from the right is 1
, the function only returns the second string bar
.
SELECT MAKE_SET(b'010','foo','bar','baz');
+------------------------------------+
| MAKE_SET(b'010','foo','bar','baz') |
+------------------------------------+
| bar |
+------------------------------------+
1 row in set (0.00 sec)
In the following example, because only the third bit from the right is 1
, the function only returns the third string baz
.
SELECT MAKE_SET(b'100','foo','bar','baz');
+------------------------------------+
| MAKE_SET(b'100','foo','bar','baz') |
+------------------------------------+
| baz |
+------------------------------------+
1 row in set (0.00 sec)
In the following example, because all bits are 1
, the function returns all three strings in a comma-separated result set.
SELECT MAKE_SET(b'111','foo','bar','baz');
+------------------------------------+
| MAKE_SET(b'111','foo','bar','baz') |
+------------------------------------+
| foo,bar,baz |
+------------------------------------+
1 row in set (0.0002 sec)
MID()
The MID(str, pos, len)
function returns a substring starting from the specified pos
position with the len
length.
If any of the arguments are NULL
, the function returns NULL
.
TiDB does not support the two-argument variant of this function. For more information, see #52420.
Examples:
In the following example, MID()
returns the substring of the input string starting from the second character (b
) with a length of 3
characters.
SELECT MID('abcdef',2,3);
+-------------------+
| MID('abcdef',2,3) |
+-------------------+
| bcd |
+-------------------+
1 row in set (0.00 sec)
NOT LIKE
Negation of simple pattern matching.
This function performs the inverse operation of LIKE.
Examples:
In the following example, NOT LIKE
returns 0
(False) because aaa
matches the a%
pattern.
SELECT 'aaa' LIKE 'a%', 'aaa' NOT LIKE 'a%';
+-----------------+---------------------+
| 'aaa' LIKE 'a%' | 'aaa' NOT LIKE 'a%' |
+-----------------+---------------------+
| 1 | 0 |
+-----------------+---------------------+
1 row in set (0.00 sec)
In the following example, NOT LIKE
returns 1
(True) because aaa
does not match the b%
pattern.
SELECT 'aaa' LIKE 'b%', 'aaa' NOT LIKE 'b%';
+-----------------+---------------------+
| 'aaa' LIKE 'b%' | 'aaa' NOT LIKE 'b%' |
+-----------------+---------------------+
| 0 | 1 |
+-----------------+---------------------+
1 row in set (0.00 sec)
NOT REGEXP
Negation of REGEXP.
OCT()
Return a string containing octal (base 8) representation of a number.
Examples:
The following example generates a sequence of numbers from 0 to 20 using a recursive common table expression (CTE) and then uses the OCT()
function to convert each number to its octal representation. Decimal values from 0 to 7 have identical representations in octal. Decimal numbers from 8 to 15 correspond to octal numbers from 10 to 17.
WITH RECURSIVE nr(n) AS (
SELECT 0 AS n
UNION ALL
SELECT n+1 FROM nr WHERE n<20
)
SELECT n, OCT(n) FROM nr;
+------+--------+
| n | OCT(n) |
+------+--------+
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
| 7 | 7 |
| 8 | 10 |
| 9 | 11 |
| 10 | 12 |
| 11 | 13 |
| 12 | 14 |
| 13 | 15 |
| 14 | 16 |
| 15 | 17 |
| 16 | 20 |
| 17 | 21 |
| 18 | 22 |
| 19 | 23 |
| 20 | 24 |
+------+--------+
20 rows in set (0.00 sec)
OCTET_LENGTH()
Synonym for LENGTH().
ORD()
Return the character code for the leftmost character of the given argument.
This function is similar to CHAR() but works the other way around.
Examples:
Taking a
and A
as an example, ORD()
returns 97
for a
and 65
for A
.
SELECT ORD('a'), ORD('A');
+----------+----------+
| ORD('a') | ORD('A') |
+----------+----------+
| 97 | 65 |
+----------+----------+
1 row in set (0.00 sec)
If you take the character code obtained from ORD()
as input, you can get the original characters back using the CHAR()
function. Note that the output format might vary depending on whether the binary-as-hex
option is enabled in your MySQL client.
SELECT CHAR(97), CHAR(65);
+----------+----------+
| CHAR(97) | CHAR(65) |
+----------+----------+
| a | A |
+----------+----------+
1 row in set (0.01 sec)
The following example shows how ORD()
handles multibyte characters. Here, both 101
and 0x65
are the UTF-8 encoded values for the e
character, but in different formats. And both 50091
and 0xC3AB
represent the same values, but for the ë
character.
SELECT ORD('e'), ORD('ë'), HEX('e'), HEX('ë');
+----------+-----------+----------+-----------+
| ORD('e') | ORD('ë') | HEX('e') | HEX('ë') |
+----------+-----------+----------+-----------+
| 101 | 50091 | 65 | C3AB |
+----------+-----------+----------+-----------+
1 row in set (0.00 sec)
POSITION()
Synonym for LOCATE().
QUOTE()
Escape the argument for use in an SQL statement.
If the argument is NULL
, the function returns NULL
.
Example:
To display the result directly instead of showing a hexadecimal-encoded value, you need to start the MySQL client with the --skip-binary-as-hex option.
The following example shows that the ASCII NULL character is escaped as \0
and the single quote character '
is escaped as \'
:
SELECT QUOTE(0x002774657374);
+-----------------------+
| QUOTE(0x002774657374) |
+-----------------------+
| '\0\'test' |
+-----------------------+
1 row in set (0.00 sec)
REGEXP
Pattern matching using regular expressions.
Examples:
In this example a number of strings are matched against two regular expressions.
WITH vals AS (
SELECT 'TiDB' AS v
UNION ALL
SELECT 'Titanium'
UNION ALL
SELECT 'Tungsten'
UNION ALL
SELECT 'Rust'
)
SELECT
v,
v REGEXP '^Ti' AS 'starts with "Ti"',
v REGEXP '^.{4}$' AS 'Length is 4 characters'
FROM
vals;
+----------+------------------+------------------------+
| v | starts with "Ti" | Length is 4 characters |
+----------+------------------+------------------------+
| TiDB | 1 | 1 |
| Titanium | 1 | 0 |
| Tungsten | 0 | 0 |
| Rust | 0 | 1 |
+----------+------------------+------------------------+
4 rows in set (0.00 sec)
The following example demonstrates that REGEXP
is not limited to the SELECT
clause. For example, you can also use it in the WHERE
clause of the query.
SELECT
v
FROM (
SELECT 'TiDB' AS v
) AS vals
WHERE
v REGEXP 'DB$';
+------+
| v |
+------+
| TiDB |
+------+
1 row in set (0.01 sec)
REGEXP_INSTR()
Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
The REGEXP_INSTR(str, regexp, [start, [match, [ret, [match_type]]]])
function returns the position of the match if the regular expression (regexp
) matches the string (str
).
If either the str
or regexp
is NULL
, then the function returns NULL
.
Examples:
In the example below you can see that the ^.b.$
matches abc
.
SELECT REGEXP_INSTR('abc','^.b.$');
+-----------------------------+
| REGEXP_INSTR('abc','^.b.$') |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.00 sec)
The following example uses the third argument to look for a match with a different start position in the string.
SELECT REGEXP_INSTR('abcabc','a');
+----------------------------+
| REGEXP_INSTR('abcabc','a') |
+----------------------------+
| 1 |
+----------------------------+
1 row in set (0.00 sec)
SELECT REGEXP_INSTR('abcabc','a',2);
+------------------------------+
| REGEXP_INSTR('abcabc','a',2) |
+------------------------------+
| 4 |
+------------------------------+
1 row in set (0.00 sec)
The following example uses the 4th argument to look for the second match.
SELECT REGEXP_INSTR('abcabc','a',1,2);
+--------------------------------+
| REGEXP_INSTR('abcabc','a',1,2) |
+--------------------------------+
| 4 |
+--------------------------------+
1 row in set (0.00 sec)
The following example uses the 5th argument to return the value after the mach, instead of the value of the match.
SELECT REGEXP_INSTR('abcabc','a',1,1,1);
+----------------------------------+
| REGEXP_INSTR('abcabc','a',1,1,1) |
+----------------------------------+
| 2 |
+----------------------------------+
1 row in set (0.00 sec)
The following example uses the 6th argument to add the i
flag to get a case insensitive match. For more details about regular expression match_type
, see match_type compatibility.
SELECT REGEXP_INSTR('abcabc','A',1,1,0,'');
+-------------------------------------+
| REGEXP_INSTR('abcabc','A',1,1,0,'') |
+-------------------------------------+
| 0 |
+-------------------------------------+
1 row in set (0.00 sec)
SELECT REGEXP_INSTR('abcabc','A',1,1,0,'i');
+--------------------------------------+
| REGEXP_INSTR('abcabc','A',1,1,0,'i') |
+--------------------------------------+
| 1 |
+--------------------------------------+
1 row in set (0.00 sec)
Besides match_type
, the collation also influences the matching. The following example uses a case-sensitive and a case-insensitive collation to demonstrate this.
SELECT REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_general_ci);
+-------------------------------------------------------+
| REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_general_ci) |
+-------------------------------------------------------+
| 1 |
+-------------------------------------------------------+
1 row in set (0.01 sec)
SELECT REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_bin);
+------------------------------------------------+
| REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_bin) |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.00 sec)
REGEXP_LIKE()
Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
The REGEXP_LIKE(str, regex, [match_type])
function is used to test if a regular expression matches a string. Optionally the match_type
can be used to change the matching behavior.
Examples:
The following example shows that ^a
matches abc
.
SELECT REGEXP_LIKE('abc','^a');
+-------------------------+
| REGEXP_LIKE('abc','^a') |
+-------------------------+
| 1 |
+-------------------------+
1 row in set (0.00 sec)
This following example shows that ^A
does not match abc
.
SELECT REGEXP_LIKE('abc','^A');
+-------------------------+
| REGEXP_LIKE('abc','^A') |
+-------------------------+
| 0 |
+-------------------------+
1 row in set (0.00 sec)
This example matches ^A
against abc
, which now matches because of the i
flag which enabled case insensitive matching. For more details about the regular expression match_type
, see match_type compatibility.
SELECT REGEXP_LIKE('abc','^A','i');
+-----------------------------+
| REGEXP_LIKE('abc','^A','i') |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.00 sec)
REGEXP_REPLACE()
Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
The REGEXP_REPLACE(str, regexp, replace, [start, [match, [match_type]]])
function can be used to replace strings based on regular expressions.
Examples:
In the following example, two o’s are replaced by i
.
SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i');
+--------------------------------------+
| REGEXP_REPLACE('TooDB', 'o{2}', 'i') |
+--------------------------------------+
| TiDB |
+--------------------------------------+
1 row in set (0.00 sec)
The following example starts the match at the third character, causing the regular expression not to match and not do any replacement.
SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i',3);
+----------------------------------------+
| REGEXP_REPLACE('TooDB', 'o{2}', 'i',3) |
+----------------------------------------+
| TooDB |
+----------------------------------------+
1 row in set (0.00 sec)
In the following example, the 5th argument is used to set if the first or the second match is used for the replacement.
SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,1);
+---------------------------------------+
| REGEXP_REPLACE('TooDB', 'o', 'i',1,1) |
+---------------------------------------+
| TioDB |
+---------------------------------------+
1 row in set (0.00 sec)
SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,2);
+---------------------------------------+
| REGEXP_REPLACE('TooDB', 'o', 'i',1,2) |
+---------------------------------------+
| ToiDB |
+---------------------------------------+
1 row in set (0.00 sec)
The following example uses the 6th argument to set the match_type
for case insensitive matching. For more details about the regular expression match_type
, see match_type compatibility.
SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1);
+-----------------------------------------+
| REGEXP_REPLACE('TooDB', 'O{2}','i',1,1) |
+-----------------------------------------+
| TooDB |
+-----------------------------------------+
1 row in set (0.00 sec)
SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1,'i');
+---------------------------------------------+
| REGEXP_REPLACE('TooDB', 'O{2}','i',1,1,'i') |
+---------------------------------------------+
| TiDB |
+---------------------------------------------+
1 row in set (0.00 sec)
REGEXP_SUBSTR()
Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
The REGEXP_SUBSTR(str, regexp, [start, [match, [match_type]]])
function is used to get a substring based on a regular expression.
The following example uses the Ti.{2}
regular expression to get the TiDB
substring of the This is TiDB
string.
SELECT REGEXP_SUBSTR('This is TiDB','Ti.{2}');
+----------------------------------------+
| REGEXP_SUBSTR('This is TiDB','Ti.{2}') |
+----------------------------------------+
| TiDB |
+----------------------------------------+
1 row in set (0.00 sec)
REPEAT()
Repeat a string the specified number of times.
Examples:
The following example generates a sequence of numbers from 1 to 20 using a recursive common table expression (CTE). For each number in the sequence, the character x
is repeated the number of times equal to the number itself.
WITH RECURSIVE nr(n) AS (
SELECT 1 AS n
UNION ALL
SELECT n+1 FROM nr WHERE n<20
)
SELECT n, REPEAT('x',n) FROM nr;
+------+----------------------+
| n | REPEAT('x',n) |
+------+----------------------+
| 1 | x |
| 2 | xx |
| 3 | xxx |
| 4 | xxxx |
| 5 | xxxxx |
| 6 | xxxxxx |
| 7 | xxxxxxx |
| 8 | xxxxxxxx |
| 9 | xxxxxxxxx |
| 10 | xxxxxxxxxx |
| 11 | xxxxxxxxxxx |
| 12 | xxxxxxxxxxxx |
| 13 | xxxxxxxxxxxxx |
| 14 | xxxxxxxxxxxxxx |
| 15 | xxxxxxxxxxxxxxx |
| 16 | xxxxxxxxxxxxxxxx |
| 17 | xxxxxxxxxxxxxxxxx |
| 18 | xxxxxxxxxxxxxxxxxx |
| 19 | xxxxxxxxxxxxxxxxxxx |
| 20 | xxxxxxxxxxxxxxxxxxxx |
+------+----------------------+
20 rows in set (0.01 sec)
The following example demonstrates that REPEAT()
can operate on strings consisting of multiple characters.
SELECT REPEAT('ha',3);
+----------------+
| REPEAT('ha',3) |
+----------------+
| hahaha |
+----------------+
1 row in set (0.00 sec)
REPLACE()
Replace occurrences of a specified string.
REVERSE()
Reverse the characters in a string.
RIGHT()
Return the specified rightmost number of characters.
RLIKE
Synonym for REGEXP.
RPAD()
Append string the specified number of times.
RTRIM()
Remove trailing spaces.
SPACE()
Return a string of the specified number of spaces.
STRCMP()
Compare two strings.
SUBSTR()
Return the substring as specified.
SUBSTRING()
Return the substring as specified.
SUBSTRING_INDEX()
The SUBSTRING_INDEX()
function is used to extract a substring from a string based on a specified delimiter and count. This function is particularly useful when dealing with data separated by a specific delimiter, such as parsing CSV data or processing log files.
Syntax:
SUBSTRING_INDEX(str, delim, count)
str
: specifies the string to be processed.delim
: specifies the delimiter in the string, which is case-sensitive.count
: specifies the number of occurrences of the delimiter.- If
count
is a positive number, the function returns the substring before thecount
occurrences (counting from the left of the string) of the delimiter. - If
count
is a negative number, the function returns the substring after thecount
occurrences (counting from the right of the string) of the delimiter. - If
count
is0
, the function returns an empty string.
- If
Example 1:
SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', 2);
Output 1:
+-----------------------------------------+
| SUBSTRING_INDEX('www.tidbcloud.com', '.', 2) |
+-----------------------------------------+
| www.tidbcloud |
+-----------------------------------------+
Example 2:
SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', -1);
Output 2:
+------------------------------------------+
| SUBSTRING_INDEX('www.tidbcloud.com', '.', -1) |
+------------------------------------------+
| com |
+------------------------------------------+
TO_BASE64()
The TO_BASE64()
function is used to convert the given argument to a string in the base-64 encoded form and return the result according to the character set and collation of the current connection. A base-64 encoded string can be decoded using the FROM_BASE64() function.
Syntax:
TO_BASE64(str)
- If the argument is not a string, the function converts it to a string before base-64 encoding.
- If the argument is
NULL
, the function returnsNULL
.
Example 1:
SELECT TO_BASE64('abc');
Output 1:
+------------------+
| TO_BASE64('abc') |
+------------------+
| YWJj |
+------------------+
Example 2:
SELECT TO_BASE64(6);
Output 2:
+--------------+
| TO_BASE64(6) |
+--------------+
| Ng== |
+--------------+
TRANSLATE()
Replace all occurrences of characters by other characters in a string. It does not treat empty strings as NULL
as Oracle does.
TRIM()
Remove leading and trailing spaces.
UCASE()
The UCASE()
function is used to convert a string to uppercase letters. This function is equivalent to the UPPER()
function.
Note
When the string is null, the UCASE()
function returns NULL
.
Example:
SELECT UCASE('bigdata') AS result_upper, UCASE(null) AS result_null;
Output:
+--------------+-------------+
| result_upper | result_null |
+--------------+-------------+
| BIGDATA | NULL |
+--------------+-------------+
UNHEX()
The UNHEX()
function performs the reverse operation of the HEX()
function. It treats each pair of characters in the argument as a hexadecimal number and converts it to the character represented by that number, returning the result as a binary string.
Note
The argument must be a valid hexadecimal value that contains 0
–9
, A
–F
, or a
–f
. If the argument is NULL
or falls outside this range, the function returns NULL
.
Example:
SELECT UNHEX('54694442');
Output:
+--------------------------------------+
| UNHEX('54694442') |
+--------------------------------------+
| 0x54694442 |
+--------------------------------------+
UPPER()
The UPPER()
function is used to convert a string to uppercase letters. This function is equivalent to the UCASE()
function.
Note
When the string is null, the UPPER()
function returns NULL
.
Example:
SELECT UPPER('bigdata') AS result_upper, UPPER(null) AS result_null;
Output:
+--------------+-------------+
| result_upper | result_null |
+--------------+-------------+
| BIGDATA | NULL |
+--------------+-------------+
WEIGHT_STRING()
The WEIGHT_STRING()
function returns the weight string (binary characters) for the input string, primarily used for sorting and comparison operations in multi-character set scenarios. If the argument is NULL
, it returns NULL
. The syntax is as follows:
WEIGHT_STRING(str [AS {CHAR|BINARY}(N)])
str
: the input string expression. If it is a non-binary string, such as aCHAR
,VARCHAR
, orTEXT
value, the return value contains the collation weights for the string. If it is a binary string, such as aBINARY
,VARBINARY
, orBLOB
value, the return value is the same as the input.AS {CHAR|BINARY}(N)
: optional parameters used to specify the type and length of the output.CHAR
represents the character data type, andBINARY
represents the binary data type.N
specifies the output length, which is an integer greater than or equal to 1.
Note
If N
is less than the string length, the string is truncated. If N
exceeds the string length, AS CHAR(N)
pads the string with spaces to the specified length, and AS BINARY(N)
pads the string with 0x00
to the specified length.
Example:
SET NAMES 'utf8mb4';
SELECT HEX(WEIGHT_STRING('ab' AS CHAR(3))) AS char_result, HEX(WEIGHT_STRING('ab' AS BINARY(3))) AS binary_result;
Output:
+-------------+---------------+
| char_result | binary_result |
+-------------+---------------+
| 6162 | 616200 |
+-------------+---------------+
Unsupported functions
LOAD_FILE()
MATCH()
SOUNDEX()
Regular expression compatibility with MySQL
The following sections describe the regular expression compatibility with MySQL, including REGEXP_INSTR()
, REGEXP_LIKE()
, REGEXP_REPLACE()
, and REGEXP_SUBSTR()
.
Syntax compatibility
MySQL implements regular expression using International Components for Unicode (ICU), and TiDB uses RE2. To learn the syntax differences between the two libraries, you can refer to the ICU documentation and RE2 Syntax.
match_type
compatibility
The value options of match_type
between TiDB and MySQL are:
Value options in TiDB are
"c"
,"i"
,"m"
, and"s"
, and value options in MySQL are"c"
,"i"
,"m"
,"n"
, and"u"
.The
"s"
in TiDB corresponds to"n"
in MySQL. When"s"
is set in TiDB, the.
character also matches line terminators (\n
).For example, the
SELECT REGEXP_LIKE(a, b, "n") FROM t1
in MySQL is the same as theSELECT REGEXP_LIKE(a, b, "s") FROM t1
in TiDB.TiDB does not support
"u"
, which means Unix-only line endings in MySQL.
match_type | MySQL | TiDB | Description |
---|---|---|---|
c | Yes | Yes | Case-sensitive matching |
i | Yes | Yes | Case-insensitive matching |
m | Yes | Yes | Multi-line mode |
s | No | Yes | Matches newlines, same as n in MySQL |
n | Yes | No | Matches newlines, same as s in TiDB |
u | Yes | No | UNIX™ line endings |
Data type compatibility
The difference between TiDB and MySQL support for the binary string type:
- MySQL does not support binary strings in regular expression functions since 8.0.22. For more details, refer to MySQL documentation. But in practice, regular functions can work in MySQL when all parameters or return types are binary strings. Otherwise, an error will be reported.
- Currently, TiDB prohibits using binary strings and an error will be reported under any circumstances.
Other compatibility
The behavior of replacing empty strings in TiDB is different from MySQL. Taking
REGEXP_REPLACE("", "^$", "123")
as an example:- MySQL does not replace the empty string and returns
""
as the result. - TiDB replaces the empty string and returns
"123"
as the result.
- MySQL does not replace the empty string and returns
The keyword used for capturing groups in TiDB is different from MySQL. MySQL uses
$
as the keyword, while TiDB uses\\
as the keyword. In addition, TiDB only supports capturing groups numbered from0
to9
.For example, the following SQL statement returns
ab
in TiDB:SELECT REGEXP_REPLACE('abcd','(.*)(.{2})$','\\1') AS s;