Lexical structure
An SQL input consists of a sequence of commands each of which is a sequence of tokens, terminated by a semicolon (;
).
The syntax of a command defines its set of valid tokens. A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol.
String literal
String literals are defined as an arbitrary sequence of characters that are delimited with single quotes '
as defined in ANSI SQL, for example 'This is a string'
.
Escape strings
The escape character in CrateDB is the single-quote '
. A character gets escaped when adding a single-quote before it. For example a single quote character within a string literal can be included by writing two adjacent single quotes, e.g., 'Jack''s car'
.
Note
Two adjacent single quotes are not equivalent to the double-quote character "
.
String literals with C-Style escapes
In addition to the escaped character '
, CrateDB supports C-Style escaped string sequences. Such a sequence is constructed by prefixing the string literal with the letter E
or e
, for example, e'hello\nWorld'
. The following escaped sequences are supported:
Escape Sequence | Interpretation |
---|---|
| backspace |
| form feed |
| newline |
| carriage return |
| tab |
| octal byte value |
| hexadecimal byte value |
| 16 or 32-bit hexadecimal Unicode character value |
For instance, the escape string literal e'\u0061\x61\141'
is equivalent to the 'aaa'
string literal.
cr> select e'\u0061\x61\141' as col1;
+------+
| col1 |
+------+
| aaa |
+------+
SELECT 1 row in set (... sec)
Any other character following a backslash is taken literally. Thus, to include a backslash character \
, two adjacent backslashes need to be used (i.e. \\
).
cr> select e'aa\\nbb' as col1;
+--------+
| col1 |
+--------+
| aa\nbb |
+--------+
SELECT 1 row in set (... sec)
Finally, a single quote can be included in an escape string literal by also using the escape backslash character: \'
, in addition to the single-quote described in the string literals section.
cr> select e'aa\'bb' as col1;
+-------+
| col1 |
+-------+
| aa'bb |
+-------+
SELECT 1 row in set (... sec)
Key words and identifiers
The table below lists all reserved key words in CrateDB. These need to be quoted if used as identifiers:
cr> SELECT word FROM pg_catalog.pg_get_keywords() WHERE catcode = 'R' ORDER BY 1;
+-------------------+
| word |
+-------------------+
| add |
| all |
| alter |
| and |
| any |
| array |
| as |
| asc |
| between |
| by |
| called |
| case |
| cast |
| column |
| constraint |
| create |
| cross |
| current_date |
| current_schema |
| current_time |
| current_timestamp |
| current_user |
| default |
| delete |
| deny |
| desc |
| describe |
| directory |
| distinct |
| drop |
| else |
| end |
| escape |
| except |
| exists |
| extract |
| false |
| first |
| for |
| from |
| full |
| function |
| grant |
| group |
| having |
| if |
| in |
| index |
| inner |
| input |
| insert |
| intersect |
| into |
| is |
| join |
| last |
| left |
| like |
| limit |
| match |
| natural |
| not |
| null |
| nulls |
| object |
| offset |
| on |
| or |
| order |
| outer |
| persistent |
| recursive |
| reset |
| returns |
| revoke |
| right |
| select |
| session_user |
| set |
| some |
| stratify |
| substring |
| table |
| then |
| transient |
| true |
| try_cast |
| unbounded |
| union |
| update |
| user |
| using |
| when |
| where |
| with |
+-------------------+
SELECT 95 rows in set (... sec)
Tokens such as my_table
, id
, name
, or data
in the example below are identifiers, which identify names of tables, columns, and other database objects.
Example:
CREATE TABLE my_table (
id INTEGER,
name STRING,
data OBJECT
) WITH (number_of_replicas = 0);
Note
Key words and unquoted identifiers are case insensitive.
This means that:
select foo from t;
is equivalent to:
select Foo from t;
or:
select FOO from t;
A widely used convention is to write key words in uppercase and identifiers in lowercase, such as
ALTER TABLE foo ADD COLUMN new_column INTEGER;
INSERT INTO foo (id, name) VALUES (1, 'bar');
Quoted identifiers can contain an arbitrary sequence of characters enclosed by double quotes ("
). Quoted identifiers are never keywords, so you can use "update"
as a table or column name.
Special characters
Some non-alphanumeric characters do have a special meaning. For their usage please refer to the sections where the respective syntax elements are described.
Semicolon
The semicolon (;
) terminates an SQL statement. It cannot appear anywhere else within the command, except within a string or quoted identifier.
Comma
The comma (,
) is used in various syntactical elements to separate elements of a list.
Brackets
Square brackets ([]
) are used to select elements of arrays and objects, e.g. arr[1]
or obj['key']
.
Asterisk
The asterisk (*
) is used in some contexts to denote all columns of a table. As an argument in global aggregate functions it has the meaning of any field, e.g. COUNT(*)
.
Period
The period (.
) is used for numeric values and to separate schema and table names, e.g. blob.my_blob_table
.
Comments
An SQL statement can contain comments. Single line comments start with a double dash (--
) and end at the end of that line. Multi line comments start with /*
and end with */
.
Example:
/*
* Retrieve information about all tables in the 'doc' schema.
*/
SELECT *
FROM information_schema.tables
WHERE table_schema = 'doc'; -- query information schema for doc tables