Overview

Doris’s asynchronous materialized views employ an algorithm based on the SPJG (SELECT-PROJECT-JOIN-GROUP-BY) pattern structure information for transparent rewriting.

Doris can analyze the structural information of query SQL, automatically search for suitable materialized views, and attempt transparent rewriting, utilizing the optimal materialized view to express the query SQL.

By utilizing precomputed materialized view results, significant improvements in query performance and a reduction in computational costs can be achieved.

Using the three tables: lineitem, orders, and partsupp from TPC-H, let’s describe the capability of directly querying a materialized view and using the materialized view for transparent query rewriting.

  1. CREATE TABLE IF NOT EXISTS lineitem (
  2. l_orderkey integer not null,
  3. l_partkey integer not null,
  4. l_suppkey integer not null,
  5. l_linenumber integer not null,
  6. l_quantity decimalv3(15,2) not null,
  7. l_extendedprice decimalv3(15,2) not null,
  8. l_discount decimalv3(15,2) not null,
  9. l_tax decimalv3(15,2) not null,
  10. l_returnflag char(1) not null,
  11. l_linestatus char(1) not null,
  12. l_shipdate date not null,
  13. l_commitdate date not null,
  14. l_receiptdate date not null,
  15. l_shipinstruct char(25) not null,
  16. l_shipmode char(10) not null,
  17. l_comment varchar(44) not null
  18. )
  19. DUPLICATE KEY(l_orderkey, l_partkey, l_suppkey, l_linenumber)
  20. PARTITION BY RANGE(l_shipdate)
  21. (FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY)
  22. DISTRIBUTED BY HASH(l_orderkey) BUCKETS 3
  23. PROPERTIES ("replication_num" = "1");
  1. CREATE TABLE IF NOT EXISTS orders (
  2. o_orderkey integer not null,
  3. o_custkey integer not null,
  4. o_orderstatus char(1) not null,
  5. o_totalprice decimalv3(15,2) not null,
  6. o_orderdate date not null,
  7. o_orderpriority char(15) not null,
  8. o_clerk char(15) not null,
  9. o_shippriority integer not null,
  10. o_comment varchar(79) not null
  11. )
  12. DUPLICATE KEY(o_orderkey, o_custkey)
  13. PARTITION BY RANGE(o_orderdate)(
  14. FROM ('2023-10-17') TO ('2023-10-20') INTERVAL 1 DAY)
  15. DISTRIBUTED BY HASH(o_orderkey) BUCKETS 3
  16. PROPERTIES ("replication_num" = "1");
  1. CREATE TABLE IF NOT EXISTS partsupp (
  2. ps_partkey INTEGER NOT NULL,
  3. ps_suppkey INTEGER NOT NULL,
  4. ps_availqty INTEGER NOT NULL,
  5. ps_supplycost DECIMALV3(15,2) NOT NULL,
  6. ps_comment VARCHAR(199) NOT NULL
  7. )
  8. DUPLICATE KEY(ps_partkey, ps_suppkey)
  9. DISTRIBUTED BY HASH(ps_partkey) BUCKETS 3
  10. PROPERTIES (
  11. "replication_num" = "1"
  12. );

Direct Query of Materialized View

A materialized view can be considered as a table and can be queried just like a regular table.

The syntax for defining a materialized view, details can be found in CREATE-ASYNC-MATERIALIZED-VIEW

Materialized view definition:

  1. CREATE MATERIALIZED VIEW mv1
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 12
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. SELECT t1.l_linenumber,
  7. o_custkey,
  8. o_orderdate
  9. FROM (SELECT * FROM lineitem WHERE l_linenumber > 1) t1
  10. LEFT OUTER JOIN orders
  11. ON l_orderkey = o_orderkey;

Query statement: Direct queries can be performed on the materialized view with additional filtering conditions and aggregations.

  1. SELECT l_linenumber,
  2. o_custkey
  3. FROM mv1
  4. WHERE l_linenumber > 1 and o_orderdate = '2023-12-31';

Transparent Rewriting Capability

Join rewriting

Join rewriting refers to when the tables used in the query and the materialization are the same. In this case, the optimizer will attempt transparent rewriting by either joining the input of the materialized view with the query or placing the join in the outer layer of the query’s WHERE clause.

This pattern of rewriting is supported for multi-table joins and supports inner and left join types. Support for other types is continually expanding.

Case 1:

The following case can undergo transparent rewriting. The condition l_linenumber > 1 allows for pull-up, enabling transparent rewriting by expressing the query using the precomputed results of the materialized view.

Materialized view definition:

  1. SELECT t1.l_linenumber,
  2. o_custkey,
  3. o_orderdate
  4. FROM (SELECT * FROM lineitem WHERE l_linenumber > 1) t1
  5. LEFT OUTER JOIN orders
  6. ON l_orderkey = o_orderkey;

Query statement:

  1. SELECT l_linenumber,
  2. o_custkey
  3. FROM lineitem
  4. LEFT OUTER JOIN orders
  5. ON l_orderkey = o_orderkey
  6. WHERE l_linenumber > 1 and o_orderdate = '2023-12-31';

Case 2:

JOIN Derivation occurs when the join type between the query and the materialized view does not match. In cases where the materialization can provide all the necessary data for the query, transparent rewriting can still be achieved by compensating predicates outside the join through predicate push down.

For example:

Materialized view definition:

  1. SELECT
  2. l_shipdate, l_suppkey, o_orderdate
  3. sum(o_totalprice) AS sum_total,
  4. max(o_totalprice) AS max_total,
  5. min(o_totalprice) AS min_total,
  6. count(*) AS count_all,
  7. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS bitmap_union_basic
  8. FROM lineitem
  9. LEFT OUTER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  10. GROUP BY
  11. l_shipdate,
  12. l_suppkey,
  13. o_orderdate;

Query statement:

  1. SELECT
  2. l_shipdate, l_suppkey, o_orderdate
  3. sum(o_totalprice) AS sum_total,
  4. max(o_totalprice) AS max_total,
  5. min(o_totalprice) AS min_total,
  6. count(*) AS count_all,
  7. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS bitmap_union_basic
  8. FROM lineitem
  9. INNER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  10. WHERE o_orderdate = '2023-12-11' AND l_suppkey = 3
  11. GROUP BY
  12. l_shipdate,
  13. l_suppkey,
  14. o_orderdate;

Aggregate rewriting

In the definitions of both the query and the materialized view, the aggregated dimensions can either be consistent or inconsistent. Filtering of results can be achieved by using fields from the dimensions in the WHERE clause.

The dimensions used in the materialized view need to encompass those used in the query, and the metrics utilized in the query can be expressed using the metrics of the materialized view.

Case 1

The following case can undergo transparent rewriting. The query and the materialized view use consistent dimensions for aggregation, allowing the use of fields from the dimensions to filter results. The query will attempt to use the expressions after SELECT in the materialized view.

Materialized view definition:

  1. SELECT
  2. o_shippriority, o_comment,
  3. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS cnt_1,
  4. count(distinct CASE WHEN O_SHIPPRIORITY > 2 AND o_orderkey IN (2) THEN o_custkey ELSE null END) AS cnt_2,
  5. sum(o_totalprice),
  6. max(o_totalprice),
  7. min(o_totalprice),
  8. count(*)
  9. FROM orders
  10. GROUP BY
  11. o_shippriority,
  12. o_comment;

Query statement:

  1. SELECT
  2. o_shippriority, o_comment,
  3. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS cnt_1,
  4. count(distinct CASE WHEN O_SHIPPRIORITY > 2 AND o_orderkey IN (2) THEN o_custkey ELSE null END) AS cnt_2,
  5. sum(o_totalprice),
  6. max(o_totalprice),
  7. min(o_totalprice),
  8. count(*)
  9. FROM orders
  10. WHERE o_shippriority in (1, 2)
  11. GROUP BY
  12. o_shippriority,
  13. o_comment;

Case 2

The following query can be transparently rewritten: the query and the materialization use aggregated dimensions that are inconsistent, but the dimensions used in the materialized view encompass those used in the query. The query can filter results using fields from the dimensions.

The query will attempt to roll up using the functions after SELECT, such as the materialized view’s bitmap_union will eventually roll up into bitmap_union_count, maintaining consistency with the semantics of the count(distinct) in the query.

Materialized view definition:

  1. SELECT
  2. l_shipdate, o_orderdate, l_partkey, l_suppkey,
  3. sum(o_totalprice) AS sum_total,
  4. max(o_totalprice) AS max_total,
  5. min(o_totalprice) AS min_total,
  6. count(*) AS count_all,
  7. bitmap_union(to_bitmap(CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END)) AS bitmap_union_basic
  8. FROM lineitem
  9. LEFT OUTER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  10. GROUP BY
  11. l_shipdate,
  12. o_orderdate,
  13. l_partkey,
  14. l_suppkey;

Query statement:

  1. SELECT
  2. l_shipdate, l_suppkey,
  3. sum(o_totalprice) AS sum_total,
  4. max(o_totalprice) AS max_total,
  5. min(o_totalprice) AS min_total,
  6. count(*) AS count_all,
  7. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS bitmap_union_basic
  8. FROM lineitem
  9. LEFT OUTER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  10. WHERE o_orderdate = '2023-12-11' AND l_partkey = 3
  11. GROUP BY
  12. l_shipdate,
  13. l_suppkey;

Temporary support for the aggregation roll-up functions is as follows:

Functions in QueriesFunctions in Materialized ViewsAggregation Functions After Rewriting
maxmaxmax
minminmin
sumsumsum
countcountsum
count(distinct )bitmap_unionbitmap_union_count
bitmap_unionbitmap_unionbitmap_union
bitmap_union_countbitmap_unionbitmap_union_count

Query partial Transparent Rewriting (Coming soon)

When the number of tables in the materialized view is greater than the query, if the materialized view satisfies the conditions for JOIN elimination for tables more than the query, transparent rewriting can also occur. For example:

Case 1

Materialized view definition:

  1. SELECT
  2. l_linenumber,
  3. o_custkey,
  4. ps_availqty
  5. FROM lineitem
  6. LEFT OUTER JOIN orders ON L_ORDERKEY = O_ORDERKEY
  7. LEFT OUTER JOIN partsupp ON l_partkey = ps_partkey
  8. AND l_suppkey = ps_suppkey;

Query statement:

  1. SELECT
  2. l_linenumber,
  3. o_custkey,
  4. ps_availqty
  5. FROM lineitem
  6. LEFT OUTER JOIN orders ON L_ORDERKEY = O_ORDERKEY;

Union Rewriting (Coming soon)

When the materialized view is not sufficient to provide all the data for the query, it can use Union to return data by combining the original table and the materialized view. For example:

Case 1

Materialized view definition:

  1. SELECT
  2. o_orderkey,
  3. o_custkey,
  4. o_orderstatus,
  5. o_totalprice
  6. FROM orders
  7. WHERE o_orderkey > 10;

Query statement:

  1. SELECT
  2. o_orderkey,
  3. o_custkey,
  4. o_orderstatus,
  5. o_totalprice
  6. FROM orders
  7. WHERE o_orderkey > 5;

Rewriting result:

  1. SELECT *
  2. FROM mv
  3. UNION ALL
  4. SELECT
  5. o_orderkey,
  6. o_custkey,
  7. o_orderstatus,
  8. o_totalprice
  9. FROM orders
  10. WHERE o_orderkey > 5 AND o_orderkey <= 10;

Auxiliary Functions

Data Consistency Issues After Transparent Rewriting

The unit of grace_period is seconds, referring to the permissible time for inconsistency between the materialized view and the data in the underlying base tables.

For example, setting grace_period to 0 means requiring the materialized view to be consistent with the base table data before it can be used for transparent rewriting. As for external tables, since changes in data cannot be perceived, the materialized view is used with them. Regardless of whether the data in the external table is up-to-date or not, this materialized view can be used for transparent rewriting. If the external table is configured with an HMS metadata source, it becomes capable of perceiving data changes. Configuring the metadata source and enabling data change perception functionality will be supported in subsequent iterations.

Setting grace_period to 10 means allowing a 10-second delay between the data in the materialized view and the data in the base tables. If there is a delay of up to 10 seconds between the data in the materialized view and the data in the base tables, the materialized view can still be used for transparent rewriting within that time frame.

For internal tables in the materialized view, you can control the maximum delay allowed for the data used by the transparent rewriting by setting the grace_period property. Refer to CREATE-ASYNC-MATERIALIZED-VIEW

Viewing and Debugging Transparent Rewrite Hit Information

You can use the following statements to view the hit information of transparent rewriting for a materialized view. It will display a concise overview of the transparent rewriting process.

explain <query_sql> The information returned is as follows, with the relevant information pertaining to materialized views extracted:

  1. | MaterializedView |
  2. | MaterializedViewRewriteFail: |
  3. | MaterializedViewRewriteSuccessButNotChose: |
  4. | Names: |
  5. | MaterializedViewRewriteSuccessAndChose: |
  6. | Names: mv1

MaterializedViewRewriteSuccessAndChose: Transparent rewrite succeeded, and the materialized view names list chosen by the CBO.

MaterializedViewRewriteSuccessButNotChose: Transparent rewrite succeeded, but the final CBO did not choose the materialized view names list.

MaterializedViewRewriteFail: Lists transparent rewrite failures and summarizes the reasons.

If you want to know the detailed information about materialized view candidates, rewriting, and the final selection process, you can execute the following statement. It will provide a detailed breakdown of the transparent rewriting process.

explain memo plan <query_sql>

Relevant Environment Variables

SwitchDescription
SET enable_nereids_planner = true;Asynchronous materialized views are only supported under the new optimizer, so the new optimizer needs to be enabled.
SET enable_materialized_view_rewrite = true;Enable or disable query transparent rewriting, default is disabled
SET materialized_view_rewrite_enable_contain_external_table = true;Whether materialized views participating in transparent rewriting are allowed to contain external tables, default is not allowed
SET materialized_view_rewrite_success_candidate_num = 3;Transparently rewrites the successful result set, allowing the maximum number of CBO candidates to participate, the default is 3

Limitations

  • The materialized view definition statement only allows SELECT, FROM, WHERE, JOIN, and GROUP BY clauses. The input for JOIN can include simple GROUP BY (aggregation on a single table). Supported types of JOIN operations include INNER and LEFT OUTER JOIN. Support for other types of JOIN operations will be gradually added.

  • Materialized views based on External Tables do not guarantee strong consistency in query results.

  • The use of non-deterministic functions to build materialized views is not supported, including rand, now, current_time, current_date, random, uuid, etc.

  • Transparent rewriting does not support window functions and LIMIT.

  • Currently, materialized view definitions cannot utilize views or other materialized views.

  • Currently, WHERE clause compensation supports scenarios where the materialized view does not have a WHERE clause, but the query does, or where the materialized view has a WHERE clause and the query’s WHERE clause is a superset of the materialized view’s. Range condition compensation is not yet supported but will be added gradually.