5.15. Pinot Connector
The Pinot connector allows querying and creating tables in an external Pinotdatabase. This can be used to query pinot data or join pinot data withsomething else.
Configuration
To configure the Pinot connector, create a catalog properties filein etc/catalog
named, for example, pinot.properties
, tomount the Pinot connector as the pinot
catalog.Create the file with the following contents, replacing theconnection properties as appropriate for your setup:
- connector.name=pinot
- pinot.controller-urls=controller_host1:9000,controller_host2:9000
Where the pinot.controller-urls
property allows you to specify acomma separated list of the pinot controller host/port pairs.
Multiple Pinot Servers
You can have as many catalogs as you need, so if you have additionalPinot clusters, simply add another properties file to etc/catalog
with a different name (making sure it ends in .properties
). Forexample, if you name the property file sales.properties
, Prestowill create a catalog named sales
using the configured connector.
Querying Pinot
The Pinot catalog exposes all pinot tables inside a flat schema. Theschema name is immaterial when querying but running SHOW SCHEMAS
,will show just one schema entry of default
.
The name of the pinot catalog is the catalog file you created abovewithout the .properties
extension.
For example, if you created afile called mypinotcluster.properties
, you can see all the tablesin it using the command:
- SHOW TABLES from mypinotcluster.default
OR:
- SHOW TABLES from mypinotcluster.foo
Both of these commands will list all the tables in your pinot cluster.This is because Pinot does not have a notion of schemas.
Consider you have a table called clicks
in the mypinotcluster
.You can see a list of the columns in the clicks
table using eitherof the following:
- DESCRIBE mypinotcluster.dontcare.clicks;
- SHOW COLUMNS FROM mypinotcluster.dontcare.clicks;
Finally, you can access the clicks
table:
- SELECT count(*) FROM mypinotcluster.default.clicks;
How the Pinot connector works
The connector tries to push the maximal subquery inferred from thepresto query into pinot. It can push down everything Pinot supportsincluding aggregations, group by, all UDFs etc. It generates thecorrect Pinot PQL keeping Pinot’s quirks in mind.
By default, it sends aggregation and limit queries to the Pinot brokerand does a parallel scan for non-aggregation/non-limit queries. Thepinot broker queries create a single split that lets the Pinot brokerdo the scatter gather. Whereas, in the parallel scan mode, there isone split created for one-or-more Pinot segments and the Pinot serversare directly contacted by the Presto servers (ie., the Pinot broker isnot involved in the parallel scan mode)
There are a few configurations that control this behavior:
pinot.prefer-broker-queries
: This config is true by default.Setting it to false will also create parallel plans foraggregation and limit queries.pinot.forbid-segment-queries
: This config is false by default.Setting it to true will forbid parallel querying and force allquerying to happen via the broker.pinot.non-aggregate-limit-for-broker-queries
: To preventoverwhelming the broker, the connector only allows querying thepinot broker forshort
queries. We define ashort
query tobe either an aggregation (or group-by) query or a query with a limitless than the value configured forpinot.non-aggregate-limit-for-broker-queries
. The default valuefor this limit is 25K rows.