Definitions

Conventions

To aid in specifying the CQL syntax, we will use the following conventions in this document:

  • Language rules will be given in an informal BNF variant notation. In particular, we’ll use square brakets ([ item ]) for optional items, * and + for repeated items (where + imply at least one).

  • The grammar will also use the following convention for convenience: non-terminal term will be lowercase (and link to their definition) while terminal keywords will be provided “all caps”. Note however that keywords are identifiers and are thus case insensitive in practice. We will also define some early construction using regexp, which we’ll indicate with re(<some regular expression>).

  • The grammar is provided for documentation purposes and leave some minor details out. For instance, the comma on the last column definition in a CREATE TABLE statement is optional but supported if present even though the grammar in this document suggests otherwise. Also, not everything accepted by the grammar is necessarily valid CQL.

  • References to keywords or pieces of CQL code in running text will be shown in a fixed-width font.

Identifiers and keywords

The CQL language uses identifiers (or names) to identify tables, columns and other objects. An identifier is a token matching the regular expression [a-zA-Z][a-zA-Z0-9_]*.

A number of such identifiers, like SELECT or WITH, are keywords. They have a fixed meaning for the language and most are reserved. The list of those keywords can be found in Appendix A.

Identifiers and (unquoted) keywords are case insensitive. Thus SELECT is the same than select or sElEcT, and myId is the same than myid or MYID. A convention often used (in particular by the samples of this documentation) is to use uppercase for keywords and lowercase for other identifiers.

There is a second kind of identifier called a quoted identifier defined by enclosing an arbitrary sequence of characters (non-empty) in double-quotes("). Quoted identifiers are never keywords. Thus "select" is not a reserved keyword and can be used to refer to a column (note that using this is particularly ill-advised), while select would raise a parsing error. Also, unlike unquoted identifiers and keywords, quoted identifiers are case sensitive ("My Quoted Id" is different from "my quoted id"). A fully lowercase quoted identifier that matches [a-zA-Z][a-zA-Z0-9_]* is however equivalent to the unquoted identifier obtained by removing the double-quote (so "myid" is equivalent to myid and to myId but different from "myId"). Inside a quoted identifier, the double-quote character can be repeated to escape it, so "foo "" bar" is a valid identifier.

The quoted identifier can declare columns with arbitrary names, and these can sometime clash with specific names used by the server. For instance, when using conditional update, the server will respond with a result set containing a special result named “[applied]”. If you’ve declared a column with such a name, this could potentially confuse some tools and should be avoided. In general, unquoted identifiers should be preferred but if you use quoted identifiers, it is strongly advised that you avoid any name enclosed by squared brackets (like “[applied]”) and any name that looks like a function call (like “f(x)”).

More formally, we have:

  1. identifier::= unquoted_identifier | quoted_identifier
  2. unquoted_identifier::= re('[a-zA-Z][link:[a-zA-Z0-9]]*')
  3. quoted_identifier::= '"' (any character where " can appear if doubled)+ '"'

Constants

CQL defines the following constants:

  1. constant::= string | integer | float | boolean | uuid | blob | NULL
  2. string::= ''' (any character where ' can appear if doubled)+ ''' : '$$' (any character other than '$$') '$$'
  3. integer::= re('-?[0-9]+')
  4. float::= re('-?[0-9]+(.[0-9]*)?([eE][+-]?[0-9+])?') | NAN | INFINITY
  5. boolean::= TRUE | FALSE
  6. uuid::= hex\{8}-hex\{4}-hex\{4}-hex\{4}-hex\{12}
  7. hex::= re("[0-9a-fA-F]")
  8. blob::= '0' ('x' | 'X') hex+

In other words:

  • A string constant is an arbitrary sequence of characters enclosed by single-quote('). A single-quote can be included by repeating it, e.g. 'It''s raining today'. Those are not to be confused with quoted identifiers that use double-quotes. Alternatively, a string can be defined by enclosing the arbitrary sequence of characters by two dollar characters, in which case single-quote can be used without escaping (It's raining today). That latter form is often used when defining user-defined functions to avoid having to escape single-quote characters in function body (as they are more likely to occur than $$).

  • Integer, float and boolean constant are defined as expected. Note however than float allows the special NaN and Infinity constants.

  • CQL supports UUID constants.

  • The content for blobs is provided in hexadecimal and prefixed by 0x.

  • The special NULL constant denotes the absence of value.

For how these constants are typed, see the Data types section.

Terms

CQL has the notion of a term, which denotes the kind of values that CQL support. Terms are defined by:

  1. term::= constant | literal | function_call | arithmetic_operation | type_hint | bind_marker
  2. literal::= collection_literal | vector_literal | udt_literal | tuple_literal
  3. function_call::= identifier '(' [ term (',' term)* ] ')'
  4. arithmetic_operation::= '-' term | term ('+' | '-' | '*' | '/' | '%') term
  5. type_hint::= '(' cql_type ')' term
  6. bind_marker::= '?' | ':' identifier

A term is thus one of:

Comments

A comment in CQL is a line beginning by either double dashes (--) or double slash (//).

Multi-line comments are also supported through enclosure within / and / (but nesting is not supported).

  1. -- This is a comment
  2. // This is a comment too
  3. /* This is
  4. a multi-line comment */

Statements

CQL consists of statements that can be divided in the following categories:

  • data-definition statements, to define and change how the data is stored (keyspaces and tables).

  • data-manipulation statements, for selecting, inserting and deleting data.

  • secondary-indexes statements.

  • materialized-views statements.

  • cql-roles statements.

  • cql-permissions statements.

  • User-Defined Functions (UDFs) statements.

  • udts statements.

  • cql-triggers statements.

All the statements are listed below and are described in the rest of this documentation (see links above):

  1. cql_statement::= statement [ ';' ]
  2. statement:=: ddl_statement :
  3. | dml_statement
  4. | secondary_index_statement
  5. | materialized_view_statement
  6. | role_or_permission_statement
  7. | udf_statement
  8. | udt_statement
  9. | trigger_statement
  10. ddl_statement::= use_statement
  11. | create_keyspace_statement
  12. | alter_keyspace_statement
  13. | drop_keyspace_statement
  14. | create_table_statement
  15. | alter_table_statement
  16. | drop_table_statement
  17. | truncate_statement
  18. dml_statement::= select_statement
  19. | insert_statement
  20. | update_statement
  21. | delete_statement
  22. | batch_statement
  23. secondary_index_statement::= create_index_statement
  24. | drop_index_statement
  25. materialized_view_statement::= create_materialized_view_statement
  26. | drop_materialized_view_statement
  27. role_or_permission_statement::= create_role_statement
  28. | alter_role_statement
  29. | drop_role_statement
  30. | grant_role_statement
  31. | revoke_role_statement
  32. | list_roles_statement
  33. | grant_permission_statement
  34. | revoke_permission_statement
  35. | list_permissions_statement
  36. | create_user_statement
  37. | alter_user_statement
  38. | drop_user_statement
  39. | list_users_statement
  40. udf_statement::= create_function_statement
  41. | drop_function_statement
  42. | create_aggregate_statement
  43. | drop_aggregate_statement
  44. udt_statement::= create_type_statement
  45. | alter_type_statement
  46. | drop_type_statement
  47. trigger_statement::= create_trigger_statement
  48. | drop_trigger_statement

Prepared Statements

CQL supports prepared statements. Prepared statements are an optimization that allows to parse a query only once but execute it multiple times with different concrete values.

Any statement that uses at least one bind marker (see bind_marker) will need to be prepared. After which the statement can be executed by provided concrete values for each of its marker. The exact details of how a statement is prepared and then executed depends on the CQL driver used and you should refer to your driver documentation.