Lexical Structure

Lexical Structure

This section covers the major lexical structure of SQL, which for the most part, is going to resemble that of ANSI SQL itself hence why low-levels details are not discussed in depth.

Elasticsearch SQL currently accepts only one command at a time. A command is a sequence of tokens terminated by the end of input stream.

A token can be a key word, an identifier (quoted or unquoted), a literal (or constant) or a special character symbol (typically a delimiter). Tokens are typically separated by whitespace (be it space, tab) though in some cases, where there is no ambiguity (typically due to a character symbol) this is not needed - however for readability purposes this should be avoided.

Key Words

Take the following example:

  1. SELECT * FROM table

This query has four tokens: SELECT, *, FROM and table. The first three, namely SELECT, * and FROM are key words meaning words that have a fixed meaning in SQL. The token table is an identifier meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc…​

As one can see, both key words and identifiers have the same lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the reserved appendix. Do note that key words are case-insensitive meaning the previous example can be written as:

  1. select * fRoM table;

Identifiers however are not - as Elasticsearch is case sensitive, Elasticsearch SQL uses the received value verbatim.

To help differentiate between the two, through-out the documentation the SQL key words are upper-cased a convention we find increases readability and thus recommend to others.

Identifiers

Identifiers can be of two types: quoted and unquoted:

  1. SELECT ip_address FROM "hosts-*"

This query has two identifiers, ip_address and hosts-* (an index pattern). As ip_address does not clash with any key words it can be used verbatim, hosts-* on the other hand cannot as it clashes with - (minus operation) and * hence the double quotes.

Another example:

  1. SELECT "from" FROM "<logstash-{now/d}>"

The first identifier from needs to quoted as otherwise it clashes with the FROM key word (which is case insensitive as thus can be written as from) while the second identifier using Elasticsearch Date math support in index and index alias names would have otherwise confuse the parser.

Hence why in general, especially when dealing with user input it is highly recommended to use quotes for identifiers. It adds minimal increase to your queries and in return offers clarity and disambiguation.

Literals (Constants)

Elasticsearch SQL supports two kind of implicitly-typed literals: strings and numbers.

String Literals

A string literal is an arbitrary number of characters bounded by single quotes ': 'Giant Robot'. To include a single quote in the string, escape it using another single quote: 'Captain EO''s Voyage'.

An escaped single quote is not a double quote ("), but a single quote ' repeated ('').

Numeric Literals

Numeric literals are accepted both in decimal and scientific notation with exponent marker (e or E), starting either with a digit or decimal point .:

  1. 1969 -- integer notation
  2. 3.14 -- decimal notation
  3. .1234 -- decimal notation starting with decimal point
  4. 4E5 -- scientific notation (with exponent marker)
  5. 1.2e-3 -- scientific notation with decimal point

Numeric literals that contain a decimal point are always interpreted as being of type double. Those without are considered integer if they fit otherwise their type is long (or BIGINT in ANSI SQL types).

Generic Literals

When dealing with arbitrary type literal, one creates the object by casting, typically, the string representation to the desired type. This can be achieved through the dedicated cast operator and functions:

  1. 123::LONG -- cast 123 to a LONG
  2. CAST('1969-05-13T12:34:56' AS TIMESTAMP) -- cast the given string to datetime
  3. CONVERT('10.0.0.1', IP) -- cast '10.0.0.1' to an IP

Do note that Elasticsearch SQL provides functions that out of the box return popular literals (like E()) or provide dedicated parsing for certain strings.

Single vs Double Quotes

It is worth pointing out that in SQL, single quotes ' and double quotes " have different meaning and cannot be used interchangeably. Single quotes are used to declare a string literal while double quotes for identifiers.

To wit:

  1. SELECT "first_name"
  2. FROM "musicians"
  3. WHERE "last_name"
  4. = 'Carroll'

Double quotes used for column and table identifiers

Single quotes used for a string literal

To escape single or double quotes, one needs to use that specific quote one more time. For example, the literal John's can be escaped like SELECT 'John''s' AS name. The same goes for double quotes escaping - SELECT 123 AS "test""number" will display as a result a column with the name test"number.

Special characters

A few characters that are not alphanumeric have a dedicated meaning different from that of an operator. For completeness these are specified below:

Char

Description

*

The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions.

,

Commas are used to enumerate the elements of a list.

.

Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc…​).

()

Parentheses are used for specific SQL commands, function declarations or to enforce precedence.

Operators

Most operators in Elasticsearch SQL have the same precedence and are left-associative. As this is done at parsing time, parenthesis need to be used to enforce a different precedence.

The following table indicates the supported operators and their precedence (highest to lowest);

Operator/Element

Associativity

Description

.

left

qualifier separator

::

left

PostgreSQL-style type cast

+ -

right

unary plus and minus (numeric literal sign)

* / %

left

multiplication, division, modulo

+ -

left

addition, subtraction

BETWEEN IN LIKE

range containment, string matching

< > <= >= = <=> <> !=

comparison

NOT

right

logical negation

AND

left

logical conjunction

OR

left

logical disjunction

Comments

Elasticsearch SQL allows comments which are sequence of characters ignored by the parsers.

Two styles are supported:

Single Line

Comments start with a double dash -- and continue until the end of the line.

Multi line

Comments that start with /* and end with */ (also known as C-style).

  1. -- single line comment
  2. /* multi
  3. line
  4. comment
  5. that supports /* nested comments */
  6. */