Installation and Configuration - SQL Templating - 《Apache Superset v2.0 Documentation》

SQL Templating
- Jinja Templates
- Available Macros

SQL Templating

Jinja Templates

SQL Lab and Explore supports Jinja templating in queries. To enable templating, the ENABLE_TEMPLATE_PROCESSING feature flag needs to be enabled in superset_config.py. When templating is enabled, python code can be embedded in virtual datasets and in Custom SQL in the filter and metric controls in Explore. By default, the following variables are made available in the Jinja context:

columns: columns which to group by in the query
filter: filters applied in the query
from_dttm: start datetime value from the selected time range (None if undefined)
to_dttm: end datetime value from the selected time range (None if undefined)
groupby: columns which to group by in the query (deprecated)
metrics: aggregate expressions in the query
row_limit: row limit of the query
row_offset: row offset of the query
table_columns: columns available in the dataset
time_column: temporal column of the query (None if undefined)
time_grain: selected time grain (None if undefined)

For example, to add a time range to a virtual dataset, you can write the following:

SELECT * from tbl where dttm_col > '{{ from_dttm }}' and dttm_col < '{{ to_dttm }}'

You can also use Jinja’s logic to make your query robust to clearing the timerange filter:

SELECT *
FROM tbl
WHERE (
    {% if from_dttm is not none %}
        dttm_col > '{{ from_dttm }}' AND
    {% endif %}
    {% if to_dttm is not none %}
        dttm_col < '{{ to_dttm }}' AND
    {% endif %}
    true
)

Note how the Jinja parameters are called within double brackets in the query, and without in the logic blocks.

To add custom functionality to the Jinja context, you need to overload the default Jinja context in your environment by defining the JINJA_CONTEXT_ADDONS in your superset configuration (superset_config.py). Objects referenced in this dictionary are made available for users to use where the Jinja context is made available.

JINJA_CONTEXT_ADDONS = {
    'my_crazy_macro': lambda x: x*2,
}

Besides default Jinja templating, SQL lab also supports self-defined template processor by setting the CUSTOM_TEMPLATE_PROCESSORS in your superset configuration. The values in this dictionary overwrite the default Jinja template processors of the specified database engine. The example below configures a custom presto template processor which implements its own logic of processing macro template with regex parsing. It uses the $ style macro instead of {{ }} style in Jinja templating.

By configuring it with CUSTOM_TEMPLATE_PROCESSORS, a SQL template on a presto database is processed by the custom one rather than the default one.

def DATE(
    ts: datetime, day_offset: SupportsInt = 0, hour_offset: SupportsInt = 0
) -> str:
    """Current day as a string."""
    day_offset, hour_offset = int(day_offset), int(hour_offset)
    offset_day = (ts + timedelta(days=day_offset, hours=hour_offset)).date()
    return str(offset_day)
class CustomPrestoTemplateProcessor(PrestoTemplateProcessor):
    """A custom presto template processor."""
    engine = "presto"
    def process_template(self, sql: str, **kwargs) -> str:
        """Processes a sql template with $ style macro using regex."""
        # Add custom macros functions.
        macros = {
            "DATE": partial(DATE, datetime.utcnow())
        }  # type: Dict[str, Any]
        # Update with macros defined in context and kwargs.
        macros.update(self.context)
        macros.update(kwargs)
        def replacer(match):
            """Expand $ style macros with corresponding function calls."""
            macro_name, args_str = match.groups()
            args = [a.strip() for a in args_str.split(",")]
            if args == [""]:
                args = []
            f = macros[macro_name[1:]]
            return f(*args)
        macro_names = ["$" + name for name in macros.keys()]
        pattern = r"(%s)\s*\(([^()]*)\)" % "|".join(map(re.escape, macro_names))
        return re.sub(pattern, replacer, sql)
CUSTOM_TEMPLATE_PROCESSORS = {
    CustomPrestoTemplateProcessor.engine: CustomPrestoTemplateProcessor
}

SQL Lab also includes a live query validation feature with pluggable backends. You can configure which validation implementation is used with which database engine by adding a block like the following to your configuration file:

FEATURE_FLAGS = {
    'SQL_VALIDATORS_BY_ENGINE': {
        'presto': 'PrestoDBSQLValidator',
    }
}

The available validators and names can be found in sql_validators.

Available Macros

In this section, we’ll walkthrough the pre-defined Jinja macros in Superset.

Current Username

The {{ current_username() }} macro returns the username of the currently logged in user.

If you have caching enabled in your Superset configuration, then by default the username value will be used by Superset when calculating the cache key. A cache key is a unique identifier that determines if there’s a cache hit in the future and Superset can retrieve cached data.

You can disable the inclusion of the username value in the calculation of the cache key by adding the following parameter to your Jinja code:

{{ current_username(add_to_cache_keys=False) }}

Current User ID

The {{ current_user_id() }} macro returns the user_id of the currently logged in user.

If you have caching enabled in your Superset configuration, then by default the user_id value will be used by Superset when calculating the cache key. A cache key is a unique identifier that determines if there’s a cache hit in the future and Superset can retrieve cached data.

You can disable the inclusion of the user_id value in the calculation of the cache key by adding the following parameter to your Jinja code:

{{ current_user_id(add_to_cache_keys=False) }}

Custom URL Parameters

The {{ url_param('custom_variable') }} macro lets you define arbitrary URL parameters and reference them in your SQL code.

Here’s a concrete example:

You write the following query in SQL Lab:

SELECT count(*)
FROM ORDERS
WHERE country_code = '{{ url_param('countrycode') }}'

You’re hosting Superset at the domain www.example.com and you send your coworker in Spain the following SQL Lab URL www.example.com/superset/sqllab?countrycode=ES and your coworker in the USA the following SQL Lab URL www.example.com/superset/sqllab?countrycode=US
For your coworker in Spain, the SQL Lab query will be rendered as:
```
SELECT count(*)
FROM ORDERS
WHERE country_code = 'ES'
```
For your coworker in the USA, the SQL Lab query will be rendered as:
```
SELECT count(*)
FROM ORDERS
WHERE country_code = 'US'
```

Explicitly Including Values in Cache Key

The {{ cache_key_wrapper() }} function explicitly instructs Superset to add a value to the accumulated list of values used in the calculation of the cache key.

This function is only needed when you want to wrap your own custom function return values in the cache key. You can gain more context here.

Note that this function powers the caching of the user_id and username values in the current_user_id() and current_username() function calls (if you have caching enabled).

Filter Values

You can retrieve the value for a specific filter as a list using {{ filter_values() }}.

This is useful if:

You want to use a filter component to filter a query where the name of filter component column doesn’t match the one in the select statement
You want to have the ability for filter inside the main query for performance purposes

Here’s a concrete example:

SELECT action, count(*) as times
FROM logs
WHERE
    action in {{ filter_values('action_type')|where_in }}
GROUP BY action

There where_in filter converts the list of values from filter_values('action_type') into a string suitable for an IN expression.

Filters for a Specific Column

The {{ get_filters() }} macro returns the filters applied to a given column. In addition to returning the values (similar to how filter_values() does), the get_filters() macro returns the operator specified in the Explore UI.

This is useful if:

You want to handle more than the IN operator in your SQL clause
You want to handle generating custom SQL conditions for a filter
You want to have the ability to filter inside the main query for speed purposes

Here’s a concrete example:

 WITH RECURSIVE
    superiors(employee_id, manager_id, full_name, level, lineage) AS (
    SELECT
        employee_id,
        manager_id,
        full_name,
    1 as level,
    employee_id as lineage
    FROM
        employees
    WHERE
    1=1
    {# Render a blank line #}
    {%- for filter in get_filters('full_name', remove_filter=True) -%}
    {%- if filter.get('op') == 'IN' -%}
        AND
        full_name IN {{ filter.get('val')|where_in }}
    {%- endif -%}
    {%- if filter.get('op') == 'LIKE' -%}
        AND
        full_name LIKE {{ "'" + filter.get('val') + "'" }}
    {%- endif -%}
    {%- endfor -%}
    UNION ALL
        SELECT
            e.employee_id,
            e.manager_id,
            e.full_name,
    s.level + 1 as level,
    s.lineage
        FROM
            employees e,
        superiors s
        WHERE s.manager_id = e.employee_id
    )
    SELECT
    employee_id, manager_id, full_name, level, lineage
    FROM
    superiors
    order by lineage, level

Datasets

It’s possible to query physical and virtual datasets using the dataset macro. This is useful if you’ve defined computed columns and metrics on your datasets, and want to reuse the definition in adhoc SQL Lab queries.

To use the macro, first you need to find the ID of the dataset. This can be done by going to the view showing all the datasets, hovering over the dataset you’re interested in, and looking at its URL. For example, if the URL for a dataset is https://superset.example.org/explore/?dataset_type=table&dataset_id=42 its ID is 42.

Once you have the ID you can query it as if it were a table:

SELECT * FROM {{ dataset(42) }} LIMIT 10

If you want to select the metric definitions as well, in addition to the columns, you need to pass an additional keyword argument:

SELECT * FROM {{ dataset(42, include_metrics=True) }} LIMIT 10

Since metrics are aggregations, the resulting SQL expression will be grouped by all non-metric columns. You can specify a subset of columns to group by instead:

SELECT * FROM {{ dataset(42, include_metrics=True, columns=["ds", "category"]) }} LIMIT 10