JSON/YAML models

Calcite models can be represented as JSON/YAML files.This page describes the structure of those files.

Models can also be built programmatically using the Schema SPI.

Elements

Root

JSON

  1. {
  2. version: '1.0',
  3. defaultSchema: 'mongo',
  4. schemas: [ Schema... ]
  5. }

YAML

  1. version: 1.0
  2. defaultSchema: mongo
  3. schemas:
  4. - [Schema...]

version (required string) must have value 1.0.

defaultSchema (optional string). If specified, it isthe name (case-sensitive) of a schema defined in this model, and willbecome the default schema for connections to Calcite that use this model.

schemas (optional list of Schema elements).

Schema

Occurs within root.schemas.

JSON

  1. {
  2. name: 'foodmart',
  3. path: ['lib'],
  4. cache: true,
  5. materializations: [ Materialization... ]
  6. }

YAML

  1. name: foodmart
  2. path:
  3. lib
  4. cache: true
  5. materializations:
  6. - [ Materialization... ]

name (required string) is the name of the schema.

type (optional string, default map) indicates sub-type. Values are:

path (optional list) is the SQL path that is used toresolve functions used in this schema. If specified it must be a list,and each element of the list must be either a string or a list ofstrings. For example,

JSON

  1. path: [ ['usr', 'lib'], 'lib' ]

YAML

  1. path:
  2. - [usr, lib]
  3. - lib

declares a path with two elements: the schema ‘/usr/lib’ and theschema ‘/lib’. Most schemas are at the top level, and for these you can use astring.

materializations (optional list ofMaterialization) defines the tablesin this schema that are materializations of queries.

cache (optional boolean, default true) tells Calcite whether tocache metadata (tables, functions and sub-schemas) generatedby this schema.

  • If false, Calcite will go back to the schema each time it needsmetadata, for example, each time it needs a list of tables in order tovalidate a query against the schema.

  • If true, Calcite will cache the metadata the first time it readsit. This can lead to better performance, especially if name-matching iscase-insensitive.

However, it also leads to the problem of cache staleness.A particular schema implementation can override theSchema.contentsHaveChangedSince method to tell Calcitewhen it should consider its cache to be out of date.

Tables, functions, types, and sub-schemas explicitly created in a schema arenot affected by this caching mechanism. They always appear in the schemaimmediately, and are never flushed.

Map Schema

Like base class Schema, occurs within root.schemas.

JSON

  1. {
  2. name: 'foodmart',
  3. type: 'map',
  4. tables: [ Table... ],
  5. functions: [ Function... ],
  6. types: [ Type... ]
  7. }

YAML

  1. name: foodmart
  2. type: map
  3. tables:
  4. - [ Table... ]
  5. functions:
  6. - [ Function... ]
  7. types:
  8. - [ Type... ]

name, type, path, cache, materializations inherited fromSchema.

tables (optional list of Table elements)defines the tables in this schema.

functions (optional list of Function elements)defines the functions in this schema.

types defines the types in this schema.

Custom Schema

Like base class Schema, occurs within root.schemas.

JSON

  1. {
  2. name: 'mongo',
  3. type: 'custom',
  4. factory: 'org.apache.calcite.adapter.mongodb.MongoSchemaFactory',
  5. operand: {
  6. host: 'localhost',
  7. database: 'test'
  8. }
  9. }

YAML

  1. name: mongo
  2. type: custom
  3. factory: org.apache.calcite.adapter.mongodb.MongoSchemaFactory
  4. operand:
  5. host: localhost
  6. database: test

name, type, path, cache, materializations inherited fromSchema.

factory (required string) is the name of the factory class for thisschema. Must implement interfaceorg.apache.calcite.schema.SchemaFactoryand have a public default constructor.

operand (optional map) contains attributes to be passed to thefactory.

JDBC Schema

Like base class Schema, occurs within root.schemas.

JSON

  1. {
  2. name: 'foodmart',
  3. type: 'jdbc',
  4. jdbcDriver: TODO,
  5. jdbcUrl: TODO,
  6. jdbcUser: TODO,
  7. jdbcPassword: TODO,
  8. jdbcCatalog: TODO,
  9. jdbcSchema: TODO
  10. }

YAML

  1. name: foodmart
  2. type: jdbc
  3. jdbcDriver: TODO
  4. jdbcUrl: TODO
  5. jdbcUser: TODO
  6. jdbcPassword: TODO
  7. jdbcCatalog: TODO
  8. jdbcSchema: TODO

name, type, path, cache, materializations inherited fromSchema.

jdbcDriver (optional string) is the name of the JDBC driver class. If notspecified, uses whichever class the JDBC DriverManager chooses.

jdbcUrl (optional string) is the JDBC connect string, for example“jdbc:mysql://localhost/foodmart”.

jdbcUser (optional string) is the JDBC user name.

jdbcPassword (optional string) is the JDBC password.

jdbcCatalog (optional string) is the name of the initial catalog in the JDBCdata source.

jdbcSchema (optional string) is the name of the initial schema in the JDBCdata source.

Materialization

Occurs within root.schemas.materializations.

JSON

  1. {
  2. view: 'V',
  3. table: 'T',
  4. sql: 'select deptno, count(*) as c, sum(sal) as s from emp group by deptno'
  5. }

YAML

  1. view: V
  2. table: T
  3. sql: select deptno, count(*) as c, sum(sal) as s from emp group by deptno

view (optional string) is the name of the view; null means that the tablealready exists and is populated with the correct data.

table (required string) is the name of the table that materializes the data inthe query. If view is not null, the table might not exist, and if it does not,Calcite will create and populate an in-memory table.

sql (optional string, or list of strings that will be concatenated as a multi-line string) is the SQL definition of the materialization.

Table

Occurs within root.schemas.tables.

JSON

  1. {
  2. name: 'sales_fact',
  3. columns: [ Column... ]
  4. }

YAML

  1. name: sales_fact
  2. columns:
  3. [ Column... ]

name (required string) is the name of this table. Must be unique within the schema.

type (optional string, default custom) indicates sub-type. Values are:

columns (list of Column elements, required forsome kinds of table, optional for others such as View)

View

Like base class Table, occurs within root.schemas.tables.

JSON

  1. {
  2. name: 'female_emps',
  3. type: 'view',
  4. sql: "select * from emps where gender = 'F'",
  5. modifiable: true
  6. }

YAML

  1. name: female_emps
  2. type: view
  3. sql: select * from emps where gender = 'F'
  4. modifiable: true

name, type, columns inherited from Table.

sql (required string, or list of strings that will be concatenated as a multi-line string) is the SQL definition of the view.

path (optional list) is the SQL path to resolve the query. If notspecified, defaults to the current schema.

modifiable (optional boolean) is whether the view is modifiable.If null or not specified, Calcite deduces whether the view is modifiable.

A view is modifiable if contains only SELECT, FROM, WHERE (no JOIN, aggregationor sub-queries) and every column:

  • is specified once in the SELECT clause; or
  • occurs in the WHERE clause with a column = literal predicate; or
  • is nullable.

The second clause allows Calcite to automatically provide the correct value forhidden columns. It is useful in multi-tenant environments, where the tenantIdcolumn is hidden, mandatory (NOT NULL), and has a constant value for aparticular view.

Errors regarding modifiable views:

  • If a view is marked modifiable: true and is not modifiable, Calcite throwsan error while reading the schema.
  • If you submit an INSERT, UPDATE or UPSERT command to a non-modifiable view,Calcite throws an error when validating the statement.
  • If a DML statement creates a row that would not appear in the view(for example, a row in female_emps, above, with gender = 'M'),Calcite throws an error when executing the statement.

Custom Table

Like base class Table, occurs within root.schemas.tables.

JSON

  1. {
  2. name: 'female_emps',
  3. type: 'custom',
  4. factory: 'TODO',
  5. operand: {
  6. todo: 'TODO'
  7. }
  8. }

YAML

  1. name: female_emps
  2. type: custom
  3. factory: TODO
  4. operand:
  5. todo: TODO

name, type, columns inherited from Table.

factory (required string) is the name of the factory class for thistable. Must implement interfaceorg.apache.calcite.schema.TableFactoryand have a public default constructor.

operand (optional map) contains attributes to be passed to thefactory.

Stream

Information about whether a table allows streaming.

Occurs within root.schemas.tables.stream.

JSON

  1. {
  2. stream: true,
  3. history: false
  4. }

YAML

  1. stream: true
  2. history: false

stream (optional; default true) is whether the table allows streaming.

history (optional; default false) is whether the history of the stream isavailable.

Column

Occurs within root.schemas.tables.columns.

JSON

  1. {
  2. name: 'empno'
  3. }

YAML

  1. name: empno

name (required string) is the name of this column.

Function

Occurs within root.schemas.functions.

JSON

  1. {
  2. name: 'MY_PLUS',
  3. className: 'com.example.functions.MyPlusFunction',
  4. methodName: 'apply',
  5. path: []
  6. }

YAML

  1. name: MY_PLUS
  2. className: com.example.functions.MyPlusFunction
  3. methodName: apply
  4. path: {}

name (required string) is the name of this function.

className (required string) is the name of the class that implements thisfunction.

methodName (optional string) is the name of the method that implements thisfunction.

If methodName is specified, the method must exist (case-sensitive) and Calcitewill create a scalar function. The method may be static or non-static, butif non-static, the class must have a public constructor with no parameters.

If methodName is “*”, Calcite creates a function for every methodin the class.

If methodName is not specified, Calcite looks for a method called “eval”, andif found, creates a a table macro or scalar function.It also looks for methods “init”, “add”, “merge”, “result”, andif found, creates an aggregate function.

path (optional list of string) is the path for resolving this function.

Type

Occurs within root.schemas.types.

JSON

  1. {
  2. name: 'mytype1',
  3. type: 'BIGINT',
  4. attributes: [
  5. {
  6. name: 'f1',
  7. type: 'BIGINT'
  8. }
  9. ]
  10. }

YAML

  1. name: mytype1
  2. type: BIGINT
  3. attributes:
  4. - name: f1
  5. type: BIGINT

name (required string) is the name of this type.

type (optional) is the SQL type.

attributes (optional) is the attribute list of this type.If attributes and type both exist at the same level,type takes precedence.

Lattice

Occurs within root.schemas.lattices.

JSON

  1. {
  2. name: 'star',
  3. sql: [
  4. 'select 1 from "foodmart"."sales_fact_1997" as "s"',
  5. 'join "foodmart"."product" as "p" using ("product_id")',
  6. 'join "foodmart"."time_by_day" as "t" using ("time_id")',
  7. 'join "foodmart"."product_class" as "pc" on "p"."product_class_id" = "pc"."product_class_id"'
  8. ],
  9. auto: false,
  10. algorithm: true,
  11. algorithmMaxMillis: 10000,
  12. rowCountEstimate: 86837,
  13. defaultMeasures: [ {
  14. agg: 'count'
  15. } ],
  16. tiles: [ {
  17. dimensions: [ 'the_year', ['t', 'quarter'] ],
  18. measures: [ {
  19. agg: 'sum',
  20. args: 'unit_sales'
  21. }, {
  22. agg: 'sum',
  23. args: 'store_sales'
  24. }, {
  25. agg: 'count'
  26. } ]
  27. } ]
  28. }

YAML

  1. name: star
  2. sql: >
  3. select 1 from "foodmart"."sales_fact_1997" as "s"',
  4. join "foodmart"."product" as "p" using ("product_id")',
  5. join "foodmart"."time_by_day" as "t" using ("time_id")',
  6. join "foodmart"."product_class" as "pc" on "p"."product_class_id" = "pc"."product_class_id"
  7. auto: false
  8. algorithm: true
  9. algorithmMaxMillis: 10000
  10. rowCountEstimate: 86837
  11. defaultMeasures:
  12. - agg: count
  13. tiles:
  14. - dimensions: [ 'the_year', ['t', 'quarter'] ]
  15. measures:
  16. - agg: sum
  17. args: unit_sales
  18. - agg: sum
  19. args: store_sales
  20. - agg: 'count'

name (required string) is the name of this lattice.

sql (required string, or list of strings that will be concatenated as amulti-line string) is the SQL statement that defines the fact table, dimensiontables, and join paths for this lattice.

auto (optional boolean, default true) is whether to materialize tiles on needas queries are executed.

algorithm (optional boolean, default false) is whether to use an optimizationalgorithm to suggest and populate an initial set of tiles.

algorithmMaxMillis (optional long, default -1, meaning no limit) is themaximum number of milliseconds for which to run the algorithm. After this point,takes the best result the algorithm has come up with so far.

rowCountEstimate (optional double, default 1000.0) estimated number of rows inthe lattice

tiles (optional list of Tile elements) is a list ofmaterialized aggregates to create up front.

defaultMeasures (optional list of Measure elements)is a list of measures that a tile should have by default.Any tile defined in tiles can still define its own measures, includingmeasures not on this list. If not specified, the default list of measures isjust ‘count(*)’:

JSON

  1. [ { name: 'count' } ]

YAML

  1. name: count

statisticProvider (optional name of a class that implementsorg.apache.calcite.materialize.LatticeStatisticProvider)provides estimates of the number of distinct values in each column.

You can use a class name, or a class plus a static field. Example:

  1. "statisticProvider": "org.apache.calcite.materialize.Lattices#CACHING_SQL_STATISTIC_PROVIDER"

If not set, Calcite will generate and execute a SQL query to find the realvalue, and cache the results.

See also: Lattices.

Tile

Occurs within root.schemas.lattices.tiles.

  1. {
  2. dimensions: [ 'the_year', ['t', 'quarter'] ],
  3. measures: [ {
  4. agg: 'sum',
  5. args: 'unit_sales'
  6. }, {
  7. agg: 'sum',
  8. args: 'store_sales'
  9. }, {
  10. agg: 'count'
  11. } ]
  12. }

YAML

  1. dimensions: [ 'the_year', ['t', 'quarter'] ]
  2. measures:
  3. - agg: sum
  4. args: unit_sales
  5. - agg: sum
  6. args: store_sales
  7. - agg: count

dimensions (list of strings or string lists, required, but may be empty)defines the dimensionality of this tile.Each dimension is a column from the lattice, like a GROUP BY clause.Each element can be either a string(the unique label of the column within the lattice)or a string list (a pair consisting of a table alias and a column name).

measures (optional list of Measure elements) is a listof aggregate functions applied to arguments. If not specified, uses thelattice’s default measure list.

Measure

Occurs within root.schemas.lattices.defaultMeasuresand root.schemas.lattices.tiles.measures.

JSON

  1. {
  2. agg: 'sum',
  3. args: [ 'unit_sales' ]
  4. }

YAML

  1. agg: sum
  2. args: unit_sales

agg is the name of an aggregate function (usually ‘count’, ‘sum’, ‘min’,‘max’).

args (optional) is a column label (string), or list of zero or more columnlabels

Valid values are:

  • Not specified: no arguments
  • null: no arguments
  • Empty list: no arguments
  • String: single argument, the name of a lattice column
  • List: multiple arguments, each a column label

Unlike lattice dimensions, measures can not be specified in qualifiedformat, {@code [“table”, “column”]}. When you define a lattice, make surethat each column you intend to use as a measure has a unique label withinthe lattice (using “{@code AS label}” if necessary), and use that labelwhen you want to pass the column as a measure argument.