Background
Apache Calcite is a dynamic data management framework.
It contains many of the pieces that comprise a typical databasemanagement system, but omits some key functions: storage of data,algorithms to process data, and a repository for storing metadata.
Calcite intentionally stays out of the business of storing andprocessing data. As we shall see, this makes it an excellent choicefor mediating between applications and one or more data storagelocations and data processing engines. It is also a perfect foundationfor building a database: just add data.
To illustrate, let’s create an empty instance of Calcite and thenpoint it at some data.
public static class HrSchema {
public final Employee[] emps = 0;
public final Department[] depts = 0;
}
Class.forName("org.apache.calcite.jdbc.Driver");
Properties info = new Properties();
info.setProperty("lex", "JAVA");
Connection connection =
DriverManager.getConnection("jdbc:calcite:", info);
CalciteConnection calciteConnection =
connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();
Schema schema = new ReflectiveSchema(new HrSchema());
rootSchema.add("hr", schema);
Statement statement = calciteConnection.createStatement();
ResultSet resultSet = statement.executeQuery(
"select d.deptno, min(e.empid)\n"
+ "from hr.emps as e\n"
+ "join hr.depts as d\n"
+ " on e.deptno = d.deptno\n"
+ "group by d.deptno\n"
+ "having count(*) > 1");
print(resultSet);
resultSet.close();
statement.close();
connection.close();
Where is the database? There is no database. The connection iscompletely empty until new ReflectiveSchema
registers a Javaobject as a schema and its collection fields emps
and depts
astables.
Calcite does not want to own data; it does not even have a favorite dataformat. This example used in-memory data sets, and processed themusing operators such as groupBy
and join
from the linq4jlibrary. But Calcite can also process data in other data formats, suchas JDBC. In the first example, replace
Schema schema = new ReflectiveSchema(new HrSchema());
with
Class.forName("com.mysql.jdbc.Driver");
BasicDataSource dataSource = new BasicDataSource();
dataSource.setUrl("jdbc:mysql://localhost");
dataSource.setUsername("username");
dataSource.setPassword("password");
Schema schema = JdbcSchema.create(rootSchema, "hr", dataSource,
null, "name");
and Calcite will execute the same query in JDBC. To the application,the data and API are the same, but behind the scenes theimplementation is very different. Calcite uses optimizer rules to pushthe JOIN
and GROUP BY
operations to the source database.
In-memory and JDBC are just two familiar examples. Calcite can handleany data source and data format. To add a data source, you need towrite an adapter that tells Calcite what collections in the datasource it should consider “tables”.
For more advanced integration, you can write optimizerrules. Optimizer rules allow Calcite to access data of a new format,allow you to register new operators (such as a better join algorithm),and allow Calcite to optimize how queries are translated tooperators. Calcite will combine your rules and operators with built-inrules and operators, apply cost-based optimization, and generate anefficient plan.
Writing an adapter
The subproject under example/csv provides a CSV adapter, which isfully functional for use in applications but is also simple enough toserve as a good template if you are writing your own adapter.
See the tutorial for information on usingthe CSV adapter and writing other adapters.
See the HOWTO for more information aboutusing other adapters, and about using Calcite in general.
Status
The following features are complete.
- Query parser, validator and optimizer
- Support for reading models in JSON format
- Many standard functions and aggregate functions
- JDBC queries against Linq4j and JDBC back-ends
- Linq4j front-end
- SQL features: SELECT, FROM (including JOIN syntax), WHERE, GROUP BY(including GROUPING SETS), aggregate functions (includingCOUNT(DISTINCT …) and FILTER), HAVING, ORDER BY (including NULLSFIRST/LAST), set operations (UNION, INTERSECT, MINUS), sub-queries(including correlated sub-queries), windowed aggregates, LIMIT(syntax as Postgres);more details in the SQL reference
- Local and remote JDBC drivers; see Avatica
- Several adapters