Querying across databases
Superset offers an experimental feature for querying across different databases. This is done via a special database called “Superset meta database” that uses the “superset://“ SQLAlchemy URI. When using the database it’s possible to query any table in any of the configured databases using the following syntax:
SELECT * FROM "database name.[[catalog.].schema].table name";
For example:
SELECT * FROM "examples.birth_names";
Spaces are allowed, but periods in the names must be replaced by %2E
. Eg:
SELECT * FROM "Superset meta database.examples%2Ebirth_names";
The query above returns the same rows as SELECT * FROM "examples.birth_names"
, and also shows that the meta database can query tables from any table — even itself!
Considerations
Before enabling this feature, there are a few considerations that you should have in mind. First, the meta database enforces permissions on the queried tables, so users should only have access via the database to tables that they originally have access to. Nevertheless, the meta database is a new surface for potential attacks, and bugs could allow users to see data they should not.
Second, there are performance considerations. The meta database will push any filtering, sorting, and limiting to the underlying databases, but any aggregations and joins will happen in memory in the process running the query. Because of this, it’s recommended to run the database in async mode, so queries are executed in Celery workers, instead of the web workers. Additionally, it’s possible to specify a hard limit on how many rows are returned from the underlying databases.
Enabling the meta database
To enable the Superset meta database, first you need to set the ENABLE_SUPERSET_META_DB
feature flag to true. Then, add a new database of type “Superset meta database” with the SQLAlchemy URI “superset://“.
If you enable DML in the meta database users will be able to run DML queries on underlying databases as long as DML is also enabled in them. This allows users to run queries that move data across databases.
Second, you might want to change the value of SUPERSET_META_DB_LIMIT
. The default value is 1000, and defines how many are read from each database before any aggregations and joins are executed. You can also set this value None
if you only have small tables.
Additionally, you might want to restrict the databases to with the meta database has access to. This can be done in the database configuration, under “Advanced” -> “Other” -> “ENGINE PARAMETERS” and adding:
{"allowed_dbs":["Google Sheets","examples"]}