10.2. Connectors
Connectors are the source of all data for queries in Presto. Even ifyour data source doesn’t have underlying tables backing it, as long asyou adapt your data source to the API expected by Presto, you can writequeries against this data.
ConnectorFactory
Instances of your connector are created by a ConnectorFactory
instance which is created when Presto calls getConnectorFactory()
on theplugin. The connector factory is a simple interface responsible for creating aninstance of a Connector
object that returns instances of thefollowing services:
ConnectorMetadata
ConnectorSplitManager
ConnectorHandleResolver
ConnectorRecordSetProvider
ConnectorMetadata
The connector metadata interface has a large number of importantmethods that are responsible for allowing Presto to look at lists ofschemas, lists of tables, lists of columns, and other metadata about aparticular data source.
This interface is too big to list in this documentation, but if youare interested in seeing strategies for implementing these methods,look at the Example HTTP Connector and the Cassandra connector. Ifyour underlying data source supports schemas, tables and columns, thisinterface should be straightforward to implement. If you are attemptingto adapt something that is not a relational database (as the Example HTTPconnector does), you may need to get creative about how you map yourdata source to Presto’s schema, table, and column concepts.
ConnectorSplitManger
The split manager partitions the data for a table into the individualchunks that Presto will distribute to workers for processing.For example, the Hive connector lists the files for each Hivepartition and creates one or more split per file.For data sources that don’t have partitioned data, a good strategyhere is to simply return a single split for the entire table. Thisis the strategy employed by the Example HTTP connector.
ConnectorRecordSetProvider
Given a split and a list of columns, the record set provider isresponsible for delivering data to the Presto execution engine.It creates a RecordSet
, which in turn creates a RecordCursor
that is used by Presto to read the column values for each row.