Lexical Structure
An SQL input consists of a sequence of commands each of which is a sequence of tokens, terminated by a semicolon (;
).
The syntax of a command defines its set of valid tokens. A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol.
Table of Contents
String Literal
String literals are defined as an arbitrary sequence of characters that are delimited with single quotes '
as defined in ANSI SQL, for example 'This is a string'
.
Escape Strings
The escape character in CrateDB is the single-quote '
. A character gets escaped when adding a single-quote before it. For example a single quote character within a string literal can be included by writing two adjacent single quotes, e.g. 'Jack''s car'
.
Note
Two adjacent single quotes are not equivalent to the double-quote character "
.
String Literals with C-Style Escapes
In addition to the escaped character '
, CrateDB supports C-Style escaped string sequences. Such a sequence is constructed by prefixing the string literal with the letter E
or e
, for example, e'hello\nWorld'
. The following escaped sequences are supported:
Escape Sequence | Interpretation |
---|---|
\b | backspace |
\f | form feed |
\n | newline |
\r | carriage return |
\t | tab |
\o , \oo , \ooo (o = [0-7]) | octal byte value |
\xh , xhh (h = [0-9,A-F,a-f]) | hexadecimal byte value |
\uxxxx , \Uxxxxxxxx (x = [0-9,A-F,a-f]) | 16 or 32-bit hexadecimal Unicode character value |
For instance, the escape string literal e'\u0061\x61\141'
is equivalent to the 'aaa'
string literal.
cr> select e'\u0061\x61\141' as col1;
+------+
| col1 |
+------+
| aaa |
+------+
SELECT 1 row in set (... sec)
Any other character following a backslash is taken literally. Thus, to include a backslash character \
, two adjacent backslashes need to be used (i.e. \\
).
cr> select e'aa\\nbb' as col1;
+--------+
| col1 |
+--------+
| aa\nbb |
+--------+
SELECT 1 row in set (... sec)
Finally, a single quote can be included in an escape string literal by also using the escape backslash character: \'
, in addition to the single-quote described in String Literals.
cr> select e'aa\'bb' as col1;
+-------+
| col1 |
+-------+
| aa'bb |
+-------+
SELECT 1 row in set (... sec)
Key Words and Identifiers
The table bellow lists all reserved key words in CrateDB. These need to be quoted if used as identifiers.
ADD | ALL | ALTER | AND |
ANY | ARRAY | AS | ASC |
BETWEEN | BOOLEAN | BY | BYTE |
CALLED | CASE | CAST | COLUMN |
CONSTRAINT | CREATE | CROSS | CURRENT_DATE |
CURRENT_SCHEMA | CURRENT_TIME | CURRENT_TIMESTAMP | CURRENT_USER |
DEFAULT | DELETE | DENY | DESC |
DESCRIBE | DIRECTORY | DISTINCT | DOUBLE |
DROP | ELSE | END | ESCAPE |
EXCEPT | EXISTS | EXTRACT | FALSE |
FIRST | FLOAT | FOR | FROM |
FULL | FUNCTION | GRANT | GROUP |
HAVING | IF | IN | INDEX |
INNER | INPUT | INSERT | INT |
INTEGER | INTERSECT | INTO | IP |
IS | JOIN | LAST | LEFT |
LICENSE | LIKE | LIMIT | LONG |
MATCH | NATURAL | NOT | NULL |
NULLS | OBJECT | OFFSET | ON |
OR | ORDER | OUTER | PERSISTENT |
PRIMARY | RECURSIVE | REPLACE | RESET |
RETURNS | REVOKE | RIGHT | SELECT |
SESSION_USER | SET | SHORT | SOME |
STRATIFY | STRING | SUBSTRING | TABLE |
THEN | TRANSIENT | TRUE | TRY_CAST |
UNBOUNDED | UNION | UPDATE | USER |
USING | WHEN | WHERE | WITH |
Tokens such as my_table
, id
, name
, or data
in the example above are identifiers, which identify names of tables, columns, and other database objects.
Example:
CREATE TABLE my_table (
id INTEGER,
name STRING,
data OBJECT
) WITH (number_of_replicas = 0);
Note
Key words and unquoted identifiers are case insensitive.
This means that:
select foo from t;
is equivalent to:
select Foo from t;
or:
select FOO from t;
A widely used convention is to write key words in uppercase and identifiers in lowercase, such as
ALTER TABLE foo ADD COLUMN new_column INTEGER;
INSERT INTO foo (id, name) VALUES (1, 'bar');
Quoted identifiers can contain an arbitrary sequence of charactes enclosed by double quotes ("
). Quoted identifiers are never keywords, so you can use "update"
as a table or column name.
Special Characters
Some non-alphanumeric characters do have a special meaning. For their usage please refer to the sections where the respective syntax elements are described.
Semicolon: | The semicolon (; ) terminates an SQL statement. It cannot appear anywhere else within the command, except within a string or quoted identifier. |
---|---|
Comma: | The comma (, ) is used in various syntactical elements to separate elements of a list. |
Brackets: | Square brackets ([] ) are used to select elements of arrays and objects, e.g. arr[1] or obj[‘key’] . |
Asterisk: | The asterisk ( ) is used in some contexts to denote all columns of a table. As an argument in global aggregate functions it has the meaning of any field, e.g. COUNT( ) . |
Period: | The period (. ) is used for numeric values and to separate schema and table names, e.g. blob.my_blob_table . |
Comments
An SQL statement can contain comments. Single line comments start with a double dash (--
) and end at the end of that line. Multi line comments start with /*
and end with */
.
Example:
/*
* Retrieve information about all tables in the 'doc' schema.
*/
SELECT *
FROM information_schema.tables
WHERE table_schema = 'doc'; -- query information schema for doc tables