STRING

The STRING data type stores a string of Unicode characters.

Aliases

In CockroachDB, the following are aliases for STRING:

  • CHARACTER
  • CHAR
  • VARCHAR
  • TEXT
    And the following are aliases for STRING(n):

  • CHARACTER(n)

  • CHARACTER VARYING(n)
  • CHAR(n)
  • CHAR VARYING(n)
  • VARCHAR(n)

Length

To limit the length of a string column, use STRING(n), where n is the maximum number of Unicode code points (normally thought of as "characters") allowed.

When inserting a string:

  • If the value exceeds the column's length limit, CockroachDB gives an error.
  • If the value is cast as a string with a length limit (e.g., CAST('hello world' AS STRING(5))), CockroachDB truncates to the limit.
  • If the value is under the column's length limit, CockroachDB does not add padding. This applies to STRING(n) and all its aliases.

Syntax

A value of type STRING can be expressed using a variety of formats.See string literals for more details.

When printing out a STRING value in the SQL shell, the shell uses the simpleSQL string literal format if the value doesn't contain special character,or the escaped format otherwise.

Collations

STRING values accept collations, which lets you sort strings according to language- and country-specific rules.

Warning:
You cannot current use collated strings in indexes or primary keys; doing so causes CockroachDB to crash. If you're interested in using collated strings in these contexts, you can follow this issue on GitHub to be notified when it's resolved.

Size

The size of a STRING value is variable, but it's recommended to keep values under 64 kilobytes to ensure performance. Above that threshold, write amplification and other considerations may cause significant performance degradation.

Examples

  1. > CREATE TABLE strings (a STRING PRIMARY KEY, b STRING(4), c TEXT);
  1. > SHOW COLUMNS FROM strings;
  1. column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
  2. +-------------+-----------+-------------+----------------+-----------------------+-------------+-----------+
  3. a | STRING | false | NULL | | {"primary"} | false
  4. b | STRING(4) | true | NULL | | {} | false
  5. c | STRING | true | NULL | | {} | false
  6. (3 rows)
  1. > INSERT INTO strings VALUES ('a1b2c3d4', 'e5f6', 'g7h8i9');
  1. > SELECT * FROM strings;
  1. a | b | c
  2. +----------+------+--------+
  3. a1b2c3d4 | e5f6 | g7h8i9
  4. (1 row)
  1. > CREATE TABLE aliases (a STRING PRIMARY KEY, b VARCHAR, c CHAR);
  1. > SHOW COLUMNS FROM aliases;
  1. column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
  2. +-------------+-----------+-------------+----------------+-----------------------+-------------+-----------+
  3. a | STRING | false | NULL | | {"primary"} | false
  4. b | VARCHAR | true | NULL | | {} | false
  5. c | CHAR | true | NULL | | {} | false
  6. (3 rows)

Supported casting and conversion

STRING values can be cast to any of the following data types:

TypeDetails
BITRequires supported BIT string format, e.g., '101001'.
BOOLRequires supported BOOL string format, e.g., 'true'.
BYTESFor more details, see here.
DATERequires supported DATE string format, e.g., '2016-01-25'.
DECIMALRequires supported DECIMAL string format, e.g., '1.1'.
FLOATRequires supported FLOAT string format, e.g., '1.1'.
INETRequires supported INET string format, e.g, '192.168.0.1'.
INTRequires supported INT string format, e.g., '10'.
INTERVALRequires supported INTERVAL string format, e.g., '1h2m3s4ms5us6ns'.
TIMERequires supported TIME string format, e.g., '01:22:12' (microsecond precision).
TIMESTAMPRequires supported TIMESTAMP string format, e.g., '2016-01-25 10:10:10.555555'.

STRING vs. BYTES

While both STRING and BYTES can appear to have similar behavior in many situations, one should understand their nuance before casting one into the other.

STRING treats all of its data as characters, or more specificially, Unicode code points. BYTES treats all of its data as a byte string. This difference in implementation can lead to dramatically different behavior. For example, let's take a complex Unicode character such as ☃ (the snowman emoji):

  1. > SELECT length('☃'::string);
  1. length
  2. +--------+
  3. 1
  1. > SELECT length('☃'::bytes);
  1. length
  2. +--------+
  3. 3

In this case, LENGTH(string) measures the number of Unicode code points present in the string, whereas LENGTH(bytes) measures the number of bytes required to store that value. Each character (or Unicode code point) can be encoded using multiple bytes, hence the difference in output between the two.

Translating literals to STRING vs. BYTES

A literal entered through a SQL client will be translated into a different value based on the type:

  • BYTES give a special meaning to the pair \x at the beginning, and translates the rest by substituting pairs of hexadecimal digits to a single byte. For example, \xff is equivalent to a single byte with the value of 255. For more information, see SQL Constants: String literals with character escapes.
  • STRING does not give a special meaning to \x, so all characters are treated as distinct Unicode code points. For example, \xff is treated as a STRING with length 4 (\, x, f, and f).

See also

Data Types

Was this page helpful?
YesNo