Apache Druid
A native connector to Druid ships with Superset (behind the DRUID_IS_ACTIVE
flag) but this is slowly getting deprecated in favor of the SQLAlchemy / DBAPI connector made available in the pydruid library.
The connection string looks like:
druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql
Here’s a breakdown of the key components of this connection string:
User
: username portion of the credentials needed to connect to your databasePassword
: password portion of the credentials needed to connect to your databaseHost
: IP address (or URL) of the host machine that’s running your databasePort
: specific port that’s exposed on your host machine where your database is running
Customizing Druid Connection
When adding a connection to Druid, you can customize the connection a few different ways in the Add Database form.
Custom Certificate
You can add certificates in the Root Certificate field when configuring the new database connection to Druid:
When using a custom certificate, pydruid will automatically use https scheme.
Disable SSL Verification
To disable SSL verification, add the following to the Extras field:
engine_params:
{"connect_args":
{"scheme": "https", "ssl_verify_cert": false}}
Aggregations
Common aggregations or Druid metrics can be defined and used in Superset. The first and simpler use case is to use the checkbox matrix exposed in your datasource’s edit view (Sources -> Druid Datasources -> [your datasource] -> Edit -> [tab] List Druid Column).
Clicking the GroupBy and Filterable checkboxes will make the column appear in the related dropdowns while in the Explore view. Checking Count Distinct, Min, Max or Sum will result in creating new metrics that will appear in the List Druid Metric tab upon saving the datasource.
By editing these metrics, you’ll notice that their JSON element corresponds to Druid aggregation definition. You can create your own aggregations manually from the List Druid Metric tab following Druid documentation.
Post-Aggregations
Druid supports post aggregation and this works in Superset. All you have to do is create a metric, much like you would create an aggregation manually, but specify postagg
as a Metric Type
. You then have to provide a valid json post-aggregation definition (as specified in the Druid docs) in the JSON field.