概述

Doris 的异步物化视图采用了基于 SPJG(SELECT-PROJECT-JOIN-GROUP-BY)模式结构信息来进行透明改写的算法。

Doris 可以分析查询 SQL 的结构信息,自动寻找满足要求的物化视图,并尝试进行透明改写,使用最优的物化视图来表达查询 SQL。

通过使用预计算的物化视图结果,可以大幅提高查询性能,减少计算成本。

以 TPC-H 的三张 lineitem,orders 和 partsupp 表来描述直接查询物化视图和使用物化视图进行查询透明改写的能力。 表的定义如下:

  1. CREATE TABLE IF NOT EXISTS lineitem (
  2. l_orderkey integer not null,
  3. l_partkey integer not null,
  4. l_suppkey integer not null,
  5. l_linenumber integer not null,
  6. l_quantity decimalv3(15,2) not null,
  7. l_extendedprice decimalv3(15,2) not null,
  8. l_discount decimalv3(15,2) not null,
  9. l_tax decimalv3(15,2) not null,
  10. l_returnflag char(1) not null,
  11. l_linestatus char(1) not null,
  12. l_shipdate date not null,
  13. l_commitdate date not null,
  14. l_receiptdate date not null,
  15. l_shipinstruct char(25) not null,
  16. l_shipmode char(10) not null,
  17. l_comment varchar(44) not null
  18. )
  19. DUPLICATE KEY(l_orderkey, l_partkey, l_suppkey, l_linenumber)
  20. PARTITION BY RANGE(l_shipdate)
  21. (FROM ('2023-10-17') TO ('2023-11-01') INTERVAL 1 DAY)
  22. DISTRIBUTED BY HASH(l_orderkey) BUCKETS 3
  23. PROPERTIES ("replication_num" = "1");
  24. insert into lineitem values
  25. (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-17', '2023-10-17', '2023-10-17', 'a', 'b', 'yyyyyyyyy'),
  26. (2, 4, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-18', '2023-10-18', '2023-10-18', 'a', 'b', 'yyyyyyyyy'),
  27. (3, 2, 4, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-19', '2023-10-19', '2023-10-19', 'a', 'b', 'yyyyyyyyy');
  1. CREATE TABLE IF NOT EXISTS orders (
  2. o_orderkey integer not null,
  3. o_custkey integer not null,
  4. o_orderstatus char(1) not null,
  5. o_totalprice decimalv3(15,2) not null,
  6. o_orderdate date not null,
  7. o_orderpriority char(15) not null,
  8. o_clerk char(15) not null,
  9. o_shippriority integer not null,
  10. o_comment varchar(79) not null
  11. )
  12. DUPLICATE KEY(o_orderkey, o_custkey)
  13. PARTITION BY RANGE(o_orderdate)(
  14. FROM ('2023-10-17') TO ('2023-11-01') INTERVAL 1 DAY)
  15. DISTRIBUTED BY HASH(o_orderkey) BUCKETS 3
  16. PROPERTIES ("replication_num" = "1");
  17. insert into orders values
  18. (1, 1, 'o', 9.5, '2023-10-17', 'a', 'b', 1, 'yy'),
  19. (1, 1, 'o', 10.5, '2023-10-18', 'a', 'b', 1, 'yy'),
  20. (2, 1, 'o', 11.5, '2023-10-19', 'a', 'b', 1, 'yy'),
  21. (3, 1, 'o', 12.5, '2023-10-19', 'a', 'b', 1, 'yy');
  1. CREATE TABLE IF NOT EXISTS partsupp (
  2. ps_partkey INTEGER NOT NULL,
  3. ps_suppkey INTEGER NOT NULL,
  4. ps_availqty INTEGER NOT NULL,
  5. ps_supplycost DECIMALV3(15,2) NOT NULL,
  6. ps_comment VARCHAR(199) NOT NULL
  7. )
  8. DUPLICATE KEY(ps_partkey, ps_suppkey)
  9. DISTRIBUTED BY HASH(ps_partkey) BUCKETS 3
  10. PROPERTIES (
  11. "replication_num" = "1"
  12. );
  13. insert into partsupp values
  14. (2, 3, 9, 10.01, 'supply1'),
  15. (4, 3, 10, 11.01, 'supply2'),
  16. (2, 3, 10, 11.01, 'supply3');

直查物化视图

物化视图可以看作是表,可以像正常的表一样直接查询。

用例 1:

物化视图的定义语法,详情见 CREATE-ASYNC-MATERIALIZED-VIEW

mv 定义:

  1. CREATE MATERIALIZED VIEW mv1
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. SELECT t1.l_linenumber,
  7. o_custkey,
  8. o_orderdate
  9. FROM (SELECT * FROM lineitem WHERE l_linenumber > 1) t1
  10. LEFT OUTER JOIN orders
  11. ON l_orderkey = o_orderkey;

查询语句:

可以对物化视图添加过滤条件和聚合等,进行直接查询。

  1. SELECT l_linenumber,
  2. o_custkey
  3. FROM mv1
  4. WHERE l_linenumber > 1 and o_orderdate = '2023-10-18';

透明改写能力

JOIN 改写

Join 改写指的是查询和物化使用的表相同,可以在物化视图和查询 Join 的输入或者 Join 的外层写 where,优化器对此 pattern 的查询会尝试进行透明改写。

支持多表 Join,支持 Join 的类型为:

  • INNER JOIN
  • LEFT OUTER JOIN
  • RIGHT OUTER JOIN
  • FULL OUTER JOIN
  • LEFT SEMI JOIN
  • RIGHT SEMI JOIN
  • LEFT ANTI JOIN
  • RIGHT ANTI JOIN

用例 1:

如下查询可进行透明改写,条件 l_linenumber > 1可以上拉,从而进行透明改写,使用物化视图的预计算结果来表达查询。

mv 定义:

  1. CREATE MATERIALIZED VIEW mv2
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. SELECT t1.l_linenumber,
  7. o_custkey,
  8. o_orderdate
  9. FROM (SELECT * FROM lineitem WHERE l_linenumber > 1) t1
  10. LEFT OUTER JOIN orders
  11. ON l_orderkey = o_orderkey;

查询语句:

  1. SELECT l_linenumber,
  2. o_custkey
  3. FROM lineitem
  4. LEFT OUTER JOIN orders
  5. ON l_orderkey = o_orderkey
  6. WHERE l_linenumber > 1 and o_orderdate = '2023-10-18';

用例 2:

JOIN 衍生,当查询和物化视图的 JOIN 的类型不一致时,如果物化可以提供查询所需的所有数据时,通过在 JOIN 的外部补偿谓词,也可以进行透明改写,

举例如下

mv 定义:

  1. CREATE MATERIALIZED VIEW mv3
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. SELECT
  7. l_shipdate, l_suppkey, o_orderdate,
  8. sum(o_totalprice) AS sum_total,
  9. max(o_totalprice) AS max_total,
  10. min(o_totalprice) AS min_total,
  11. count(*) AS count_all,
  12. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS bitmap_union_basic
  13. FROM lineitem
  14. LEFT OUTER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  15. GROUP BY
  16. l_shipdate,
  17. l_suppkey,
  18. o_orderdate;

查询语句:

  1. SELECT
  2. l_shipdate, l_suppkey, o_orderdate,
  3. sum(o_totalprice) AS sum_total,
  4. max(o_totalprice) AS max_total,
  5. min(o_totalprice) AS min_total,
  6. count(*) AS count_all,
  7. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS bitmap_union_basic
  8. FROM lineitem
  9. INNER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  10. WHERE o_orderdate = '2023-10-18' AND l_suppkey = 3
  11. GROUP BY
  12. l_shipdate,
  13. l_suppkey,
  14. o_orderdate;

聚合改写

查询和物化视图定义中,聚合的维度可以一致或者不一致,可以使用维度中的字段写 WHERE 对结果进行过滤。

物化视图使用的维度需要包含查询的维度,并且查询使用的指标可以使用物化视图的指标来表示。

用例 1

如下查询可以进行透明改写,查询和物化使用聚合的维度一致,可以使用维度中的字段进行过滤结果,并且查询会尝试使用物化视图 SELECT 后的表达式。

mv 定义:

  1. CREATE MATERIALIZED VIEW mv4
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. SELECT
  7. o_shippriority, o_comment,
  8. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS cnt_1,
  9. count(distinct CASE WHEN O_SHIPPRIORITY > 2 AND o_orderkey IN (2) THEN o_custkey ELSE null END) AS cnt_2,
  10. sum(o_totalprice),
  11. max(o_totalprice),
  12. min(o_totalprice),
  13. count(*)
  14. FROM orders
  15. GROUP BY
  16. o_shippriority,
  17. o_comment;

查询语句:

  1. SELECT
  2. o_shippriority, o_comment,
  3. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS cnt_1,
  4. count(distinct CASE WHEN O_SHIPPRIORITY > 2 AND o_orderkey IN (2) THEN o_custkey ELSE null END) AS cnt_2,
  5. sum(o_totalprice),
  6. max(o_totalprice),
  7. min(o_totalprice),
  8. count(*)
  9. FROM orders
  10. WHERE o_shippriority in (1, 2)
  11. GROUP BY
  12. o_shippriority,
  13. o_comment;

用例 2

如下查询可以进行透明改写,查询和物化使用聚合的维度不一致,物化视图使用的维度包含查询的维度。查询可以使用维度中的字段对结果进行过滤,

查询会尝试使用物化视图 SELECT 后的函数进行上卷,如物化视图的 bitmap_union 最后会上卷成 bitmap_union_count,和查询中 count(distinct) 的语义保持一致。

mv 定义:

  1. CREATE MATERIALIZED VIEW mv5
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. SELECT
  7. l_shipdate, o_orderdate, l_partkey, l_suppkey,
  8. sum(o_totalprice) AS sum_total,
  9. max(o_totalprice) AS max_total,
  10. min(o_totalprice) AS min_total,
  11. count(*) AS count_all,
  12. bitmap_union(to_bitmap(CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END)) AS bitmap_union_basic
  13. FROM lineitem
  14. LEFT OUTER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  15. GROUP BY
  16. l_shipdate,
  17. o_orderdate,
  18. l_partkey,
  19. l_suppkey;

查询语句:

  1. SELECT
  2. l_shipdate, l_suppkey,
  3. sum(o_totalprice) AS sum_total,
  4. max(o_totalprice) AS max_total,
  5. min(o_totalprice) AS min_total,
  6. count(*) AS count_all,
  7. count(distinct CASE WHEN o_shippriority > 1 AND o_orderkey IN (1, 3) THEN o_custkey ELSE null END) AS bitmap_union_basic
  8. FROM lineitem
  9. LEFT OUTER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey AND l_shipdate = o_orderdate
  10. WHERE o_orderdate = '2023-10-18' AND l_partkey = 3
  11. GROUP BY
  12. l_shipdate,
  13. l_suppkey;

用例 3 支持多维聚合的透明改写,即如果物化视图中没有 GROUPING SETS, CUBE, ROLLUP, 查询中有多维聚合。并且物化视图 group by 后的字段包含查询中多维聚合 中的所有字段。那么也可以进行透明改写。

mv 定义:

  1. CREATE MATERIALIZED VIEW mv5_1
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. select o_orderstatus, o_orderdate, o_orderpriority,
  7. sum(o_totalprice) as sum_total,
  8. max(o_totalprice) as max_total,
  9. min(o_totalprice) as min_total,
  10. count(*) as count_all
  11. from orders
  12. group by
  13. o_orderstatus, o_orderdate, o_orderpriority;

查询语句:

  1. select o_orderstatus, o_orderdate, o_orderpriority,
  2. sum(o_totalprice),
  3. max(o_totalprice),
  4. min(o_totalprice),
  5. count(*)
  6. from orders
  7. group by
  8. GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderpriority), (o_orderstatus), ());

用例 4 当查询中包含聚合,物化视图中不包含聚合,查询中使用的列都可以从物化视图中获取,那么也可以改写成功。

mv 定义:

  1. CREATE MATERIALIZED VIEW mv5_2
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. select case when o_shippriority > 1 and o_orderkey IN (4, 5) then o_custkey else o_shippriority end,
  7. o_orderstatus,
  8. bin(o_orderkey)
  9. from orders;

查询语句:

  1. select
  2. count(case when o_shippriority > 1 and o_orderkey IN (4, 5) then o_custkey else o_shippriority end),
  3. o_orderstatus,
  4. bin(o_orderkey)
  5. from orders
  6. group by
  7. o_orderstatus,
  8. bin(o_orderkey);

暂时目前支持的聚合上卷函数列表如下:

查询中函数物化视图中函数函数上卷后
maxmaxmax
minminmin
sumsumsum
countcountsum
count(distinct )bitmap_unionbitmap_union_count
bitmap_unionbitmap_unionbitmap_union
bitmap_union_countbitmap_unionbitmap_union_count
hll_union_agg, approx_count_distinct, hll_cardinalityhll_union 或者 hll_raw_agghll_union_agg

Query partial 透明改写(Coming soon)

当物化视图的表比查询多时,如果物化视图比查询多的表满足 JOIN 消除的条件,那么也可以进行透明改写,如下可以进行透明改写,待支持。

用例 1

mv 定义:

  1. CREATE MATERIALIZED VIEW mv6
  2. BUILD IMMEDIATE REFRESH AUTO ON SCHEDULE EVERY 1 hour
  3. DISTRIBUTED BY RANDOM BUCKETS 3
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. SELECT
  7. l_linenumber,
  8. o_custkey,
  9. ps_availqty
  10. FROM lineitem
  11. LEFT OUTER JOIN orders ON L_ORDERKEY = O_ORDERKEY
  12. LEFT OUTER JOIN partsupp ON l_partkey = ps_partkey
  13. AND l_suppkey = ps_suppkey;

查询语句:

  1. SELECT
  2. l_linenumber,
  3. o_custkey,
  4. ps_availqty
  5. FROM lineitem
  6. LEFT OUTER JOIN orders ON L_ORDERKEY = O_ORDERKEY;

Union 改写

当物化视图不足以提供查询的所有数据时,可以使用 union all 的方式,将查询原表和物化视图的数据结合作为最终返回结果。 目前需要物化视图是分区物化视图,可以对分区字段的过滤条件使用 union all 补全数据。

用例 1

mv 定义:

  1. CREATE MATERIALIZED VIEW mv7
  2. BUILD IMMEDIATE REFRESH AUTO ON MANUAL
  3. partition by(l_shipdate)
  4. DISTRIBUTED BY RANDOM BUCKETS 2
  5. PROPERTIES ('replication_num' = '1')
  6. as
  7. select l_shipdate, o_orderdate, l_partkey,
  8. l_suppkey, sum(o_totalprice) as sum_total
  9. from lineitem
  10. left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
  11. group by
  12. l_shipdate,
  13. o_orderdate,
  14. l_partkey,
  15. l_suppkey;

当基表新增分区 2023-10-21 时,并且物化视图还未刷新时,可以通过物化视图 union all 原表的方式返回结果

  1. insert into lineitem values
  2. (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-10-21', '2023-10-21', '2023-10-21', 'a', 'b', 'yyyyyyyyy');

运行查询语句:

  1. select l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total
  2. from lineitem
  3. left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
  4. group by
  5. l_shipdate,
  6. o_orderdate,
  7. l_partkey,
  8. l_suppkey;

改写结果示意:

  1. SELECT *
  2. FROM mv7
  3. union all
  4. select t1.l_shipdate, o_orderdate, t1.l_partkey, t1.l_suppkey, sum(o_totalprice) as sum_total
  5. from (select * from lineitem where l_shipdate = '2023-10-21') t1
  6. left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate
  7. group by
  8. t1.l_shipdate,
  9. o_orderdate,
  10. t1.l_partkey,
  11. t1.l_suppkey;

注意: 物化视图带 where 条件,以上述为例,如果构建物化的过滤条件加上 where l_shipdate > '2023-10-19' 查询是 where l_shipdate > '2023-10-18' 目前这种还无法通过 union 补偿,待支持

嵌套物化视图改写

物化视图的定义 SQL 可以使用物化视图,此物化视图称为嵌套物化视图,嵌套的层数理论上没有限制,此物化视图可以直查,也可以进行透明改写。 嵌套物化视图也可以参与透明改写。

用例 1

首先创建内层物化视图 mv8_0_inner_mv

  1. CREATE MATERIALIZED VIEW mv8_0_inner_mv
  2. BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
  3. DISTRIBUTED BY RANDOM BUCKETS 2
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. select
  7. l_linenumber,
  8. o_custkey,
  9. o_orderkey,
  10. o_orderstatus,
  11. l_partkey,
  12. l_suppkey,
  13. l_orderkey
  14. from lineitem
  15. inner join orders on lineitem.l_orderkey = orders.o_orderkey;

创建外层物化视图 mv8_0

  1. CREATE MATERIALIZED VIEW mv8_0
  2. BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
  3. DISTRIBUTED BY RANDOM BUCKETS 2
  4. PROPERTIES ('replication_num' = '1')
  5. AS
  6. select
  7. l_linenumber,
  8. o_custkey,
  9. o_orderkey,
  10. o_orderstatus,
  11. l_partkey,
  12. l_suppkey,
  13. l_orderkey,
  14. ps_availqty
  15. from mv8_0_inner_mv
  16. inner join partsupp on l_partkey = ps_partkey AND l_suppkey = ps_suppkey;

如下查询, mv8_0_inner_mvmv8_0 都会改写成功,最终代价模型会选择 mv8_0

  1. select lineitem.l_linenumber
  2. from lineitem
  3. inner join orders on l_orderkey = o_orderkey
  4. inner join partsupp on l_partkey = ps_partkey AND l_suppkey = ps_suppkey
  5. where o_orderstatus = 'o'

注意:

  1. 嵌套物化视图的层数越多,透明改写的耗时会相应增加,建议嵌套物化视图层数不要超过 3 层。
  2. 嵌套物化视图透明改写默认关闭,开启方式见下面开关。

辅助功能

透明改写后数据一致性问题

grace_period 的单位是秒,指的是容许物化视图和所用基表数据不一致的时间。 比如 grace_period 设置成 0,意味要求物化视图和基表数据保持一致,此物化视图才可用于透明改写;对于外表,因为无法感知数据变更,所以物化视图使用了外表,

无论外表的数据是不是最新的,都可以使用此物化视图用于透明改写,如果外表配置了 HMS 元数据源,是可以感知数据变更的,配置数据源和感知数据变更的功能会在后面迭代支持。

如果设置成 10,意味物化视图和基表数据允许 10s 的延迟,如果物化视图的数据和基表的数据有延迟,如果在 10s 内,此物化视图都可以用于透明改写。

对于物化视图中的内表,可以通过设定 grace_period 属性来控制透明改写使用的物化视图所允许数据最大的延迟时间。 可查看 CREATE-ASYNC-MATERIALIZED-VIEW

查询透明改写命中情况查看和调试

可通过如下语句查看物化视图的透明改写命中情况,会展示查询透明改写简要过程信息。

explain <query_sql> 返回的信息如下,截取了物化视图相关的信息

  1. | MaterializedView |
  2. | MaterializedViewRewriteSuccessAndChose: |
  3. | Names: mv5 |
  4. | MaterializedViewRewriteSuccessButNotChose: |
  5. | |
  6. | MaterializedViewRewriteFail: |
  7. | Name: mv4 |
  8. | FailSummary: Match mode is invalid, View struct info is invalid |
  9. | Name: mv3 |
  10. | FailSummary: Match mode is invalid, Rewrite compensate predicate by view fail, View struct info is invalid |
  11. | Name: mv1 |
  12. | FailSummary: The columns used by query are not in view, View struct info is invalid |
  13. | Name: mv2 |
  14. | FailSummary: The columns used by query are not in view, View struct info is invalid

MaterializedViewRewriteSuccessAndChose:透明改写成功,并且 CBO 选择的物化视图名称列表。

MaterializedViewRewriteSuccessButNotChose:透明改写成功,但是最终 CBO 没有选择的物化视图名称列表。

MaterializedViewRewriteFail:列举透明改写失败及原因摘要。

如果想知道物化视图候选,改写和最终选择情况的过程详细信息,可以执行如下语句,会展示透明改写过程详细的信息。

explain memo plan <query_sql>

相关环境变量

开关说明
SET enable_nereids_planner = true;异步物化视图只有在新优化器下才支持,所以需要开启新优化器
SET enable_materialized_view_rewrite = true;开启或者关闭查询透明改写,默认开启
SET materialized_view_rewrite_enable_contain_external_table = true;参与透明改写的物化视图是否允许包含外表,默认不允许
SET materialized_view_rewrite_success_candidate_num = 3;透明改写成功的结果集合,允许参与到 CBO 候选的最大数量,默认是 3
SET enable_materialized_view_union_rewrite = true;当分区物化视图不足以提供查询的全部数据时,是否允许基表和物化视图 union all 来响应查询,默认允许
SET enable_materialized_view_nest_rewrite = true;是否允许嵌套改写,默认不允许
SET materialized_view_relation_mapping_max_count = 8;透明改写过程中,relation mapping 最大允许数量,如果超过,进行截取。relation mapping 通常由表自关联产生,数量一般会是笛卡尔积,比如 3 张表,可能会产生 8 种组合。默认是 8

限制

  • 物化视图定义语句中只允许包含 SELECT、FROM、WHERE、JOIN、GROUP BY 语句,JOIN 的输入可以包含简单的 GROUP BY(单表聚合),其中 JOIN 的支持的类型为 INNER,LEFT OUTER JOIN,RIGHT OUTER JOIN,FULL OUTER JOIN,LEFT SEMI JOIN,RIGHT SEMI JOIN,LEFT ANTI JOIN,RIGHT ANTI JOIN。
  • 基于 External Table 的物化视图不保证查询结果强一致。
  • 不支持使用非确定性函数来构建物化视图,包括 rand、now、current_time、current_date、random、uuid 等。
  • 不支持窗口函数的透明改写。
  • 物化视图中有 LIMIT,暂时不支持透明改写。
  • 当查询或者物化视图没有数据时,不支持透明改写。
  • 目前 WHERE 条件补偿,只支持列为数值和日期类型的条件范围补偿,比如物化视图定义是 a > 5,查询是 a > 10 支持透明改写。

常见问题

1. 物化视图没有命中是为什么?

确定物化视图是否命中需要执行如下 SQL sql explain your_query_sql;

a. 物化视图透明改写功能默认是关闭的,需要打开对应开关才可以改写,开关值见 异步物化视图相关开关

b. 可能物化视图不可用,导致透明改写不能命中,查看物化视图构建状态见问题 2

c. 经过前两步的检查,如果物化视图还是不能命中,可能物化视图的定义 SQL 和查询 SQL 不在当前物化视图改写能力的范围内,见物化视图透明改写能力

2. 怎么查看物化状态是否正常?

2.1 确认物化视图构建状态

物化视图的状态是 Success,才可以参与透明改写,首先运行

  1. select * from mv_infos('database'='db_name') where Name = 'mv_name' \G

查看物化视图的 JobName。 其次根据 JobName 查看物化视图的任务状态,运行如下语句

  1. select * from tasks("type"="mv") where JobName = 'job_name';

查看最近执行的任务状态 Status 是否是 Success

2.2 确认物化视图数据一致的可用性

物化视图构建成功,但是因为数据变更,和 grace_period 的设置导致物化视图不可用。 查看物化视图数据一致性的方法

  • 全量构建的物化视图 运行如下 sql,查看字段 SyncWithBaseTables 是否是 1
  1. select * from mv_infos('database'='db_name') where Name = 'mv_name' \G
  • 分区构建的物化视图 运行如下 sql,查看查询使用的分区是否有效
  1. show partitions from mv_name;

3. 构建物化时报错

报错信息 ERROR 1105 (HY000): errCode = 2, detailMessage = Syntax error in line 1: BUILD IMMEDIATE REFRESH AUTO ON MANUAL

  1. 异步物化视图的语句,在新优化器下才支持,确保使用的是新优化器 SET global enable_nereids_planner = true;
  2. 可能是构建物化的语句使用的关键词写错或者物化定义 SQL 的语法有问题,可以检查下物化定义 SQL 和创建物化语句是否正确。

4. 构建分区物化视图报 Unable to find a suitable base table for partitioning

报这个错,通常指的是物化视图的 SQL 定义和物化视图分区字段的选择,导致不能分区增量更新,所以创建分区物化视图会报这个错。 物化视图想要分区增量更新,需要满足以下要求,详情见 CREATE ASYNC MATERIALIZED VIEW

满足分区物化视图构建,举例如下:

  1. CREATE TABLE IF NOT EXISTS lineitem (
  2. l_orderkey INTEGER NOT NULL,
  3. l_partkey INTEGER NOT NULL,
  4. l_suppkey INTEGER NOT NULL,
  5. l_linenumber INTEGER NOT NULL,
  6. l_quantity DECIMALV3(15,2) NOT NULL,
  7. l_extendedprice DECIMALV3(15,2) NOT NULL,
  8. l_discount DECIMALV3(15,2) NOT NULL,
  9. l_tax DECIMALV3(15,2) NOT NULL,
  10. l_returnflag CHAR(1) NOT NULL,
  11. l_linestatus CHAR(1) NOT NULL,
  12. l_shipdate DATE NOT NULL,
  13. l_commitdate DATE NOT NULL,
  14. l_receiptdate DATE NOT NULL,
  15. l_shipinstruct CHAR(25) NOT NULL,
  16. l_shipmode CHAR(10) NOT NULL,
  17. l_comment VARCHAR(44) NOT NULL
  18. )
  19. DUPLICATE KEY(l_orderkey, l_partkey, l_suppkey, l_linenumber)
  20. PARTITION BY RANGE(l_shipdate) (
  21. PARTITION `day_1` VALUES LESS THAN ('2023-12-9'),
  22. PARTITION `day_2` VALUES LESS THAN ("2023-12-11"),
  23. PARTITION `day_3` VALUES LESS THAN ("2023-12-30"))
  24. DISTRIBUTED BY HASH(l_orderkey) BUCKETS 3
  25. PROPERTIES (
  26. "replication_num" = "1"
  27. );
  1. CREATE TABLE IF NOT EXISTS orders (
  2. o_orderkey INTEGER NOT NULL,
  3. o_custkey INTEGER NOT NULL,
  4. o_orderstatus CHAR(1) NOT NULL,
  5. o_totalprice DECIMALV3(15,2) NOT NULL,
  6. o_orderdate DATE NOT NULL,
  7. o_orderpriority CHAR(15) NOT NULL,
  8. o_clerk CHAR(15) NOT NULL,
  9. o_shippriority INTEGER NOT NULL,
  10. O_COMMENT VARCHAR(79) NOT NULL
  11. )
  12. DUPLICATE KEY(o_orderkey, o_custkey)
  13. PARTITION BY RANGE(o_orderdate) (
  14. PARTITION `day_2` VALUES LESS THAN ('2023-12-9'),
  15. PARTITION `day_3` VALUES LESS THAN ("2023-12-11"),
  16. PARTITION `day_4` VALUES LESS THAN ("2023-12-30")
  17. )
  18. DISTRIBUTED BY HASH(o_orderkey) BUCKETS 3
  19. PROPERTIES (
  20. "replication_num" = "1"
  21. );

物化视图定义如下,如果 l_shipdate 是基表 lineitem 的分区字段,如下的物化视图是可以进行分区增量更新的

  1. CREATE MATERIALIZED VIEW mv9
  2. BUILD IMMEDIATE REFRESH AUTO ON MANUAL
  3. partition by(l_shipdate)
  4. DISTRIBUTED BY HASH(l_orderkey) BUCKETS 10
  5. PROPERTIES ('replication_num' = '1')
  6. AS
  7. SELECT l_shipdate, l_orderkey, O_ORDERDATE,
  8. count(O_ORDERDATE) over (partition by l_shipdate order by l_orderkey) as window_count
  9. FROM lineitem
  10. LEFT OUTER JOIN orders on l_orderkey = o_orderkey
  11. GROUP BY l_shipdate, l_orderkey, O_ORDERDATE;

如下的物化视图是不可以进行分区增量更新的,因为 l_shipdate 来自 LEFT OUTER JOIN 的右侧 null 产生端。

  1. CREATE MATERIALIZED VIEW mv10
  2. BUILD IMMEDIATE REFRESH AUTO ON MANUAL
  3. partition by(l_shipdate)
  4. DISTRIBUTED BY HASH(l_orderkey) BUCKETS 10
  5. PROPERTIES ('replication_num' = '1')
  6. AS
  7. SELECT l_shipdate, l_orderkey, O_ORDERDATE,
  8. count(O_ORDERDATE) over (partition by l_shipdate order by l_orderkey) as window_count
  9. FROM orders
  10. LEFT OUTER JOIN lineitem on l_orderkey = o_orderkey
  11. GROUP BY l_shipdate, l_orderkey, O_ORDERDATE;

5. 直查物化物化视图没有数据?

可能物化在构建中,也有可能物化构建失败了。通过如下语句查看物化构建的状态

  1. -- 查看物化视图元数据信息,database 为当前数据库,mv_name 为物化视图名称
  2. select * from mv_infos('database'='db_name') where Name = 'mv_name' \G
  1. -- 查看任务元数据
  2. select * from jobs("type"="mv") order by CreateTime limit 5;
  1. -- 查看任务执行信息,这里面会展示任务执行的状态,如果失败会有失败原因
  2. select * from tasks("type"="mv") where JobName = 'job_name';

6. 物化视图使用的基表数据变了,但是此时物化视图还没有刷新,透明改写的行为是?

异步物化视图的数据时效性和基表是有一定时延的。 对于内表和可以感知数据变化的外表(比如 hive),当基表的数据变更时,此物化视图是否可用于透明改写是通过 grace_period 的阈值来决定的。 grace_period 指的是容许物化视图和所用基表数据不一致的时间。

比如 grace_period 设置成 0,意味要求物化视图和基表数据保持一致,此物化视图才可用于透明改写; 对于外表(除 hive 外),因为无法感知数据变更,所以物化视图使用了外表,无论外表的数据是不是最新的,都可以使用此物化视图用于透明改写(此种情况数据会不一致)。

如果设置成 10,意味物化视图和基表数据允许 10s 的延迟,如果物化视图的数据和基表的数据有延迟,如果在 10s 内,此物化视图都可以用于透明改写。

如果物化视图是分区物化视图,如果部分分区失效。有如下两种情况

  1. 查询没有使用失效的分区数据,那么此物化视图依然可用于透明改写。
  2. 查询使用了失效分区的数据,并且数据时效在 grace_period 范围内,那么此物化视图依然可用。如果物化视图数据时效不在 grace_period 范围内。 可以通过 union all 原表和物化视图来响应查询。

7. 怎么确认是否命中,如果不命中怎么查看原因?

可以通过 explain query_sql 的方式查看是否命中和不命中的摘要信息,例如如下物化视图

  1. CREATE MATERIALIZED VIEW mv11
  2. BUILD IMMEDIATE REFRESH AUTO ON MANUAL
  3. partition by(l_shipdate)
  4. DISTRIBUTED BY HASH(l_orderkey) BUCKETS 10
  5. PROPERTIES ('replication_num' = '1')
  6. AS
  7. SELECT l_shipdate, l_orderkey, O_ORDERDATE, count(*)
  8. FROM lineitem
  9. LEFT OUTER JOIN orders on l_orderkey = o_orderkey
  10. GROUP BY l_shipdate, l_orderkey, O_ORDERDATE;

查询如下

  1. explain
  2. SELECT l_shipdate, l_linestatus, O_ORDERDATE, count(*)
  3. FROM orders
  4. LEFT OUTER JOIN lineitem on l_orderkey = o_orderkey
  5. GROUP BY l_shipdate, l_linestatus, O_ORDERDATE;

Explain 显示信息可以看到 MaterializedViewRewriteFail 有失败的摘要信息, The graph logic between query and view is not consistent 表示查询和物化 join 的逻辑不一致,上述查询和物化 join 的表顺序不一致所以会报这个错。

  1. | MaterializedView |
  2. | MaterializedViewRewriteSuccessAndChose: |
  3. | |
  4. | MaterializedViewRewriteSuccessButNotChose: |
  5. | |
  6. | MaterializedViewRewriteFail: |
  7. | Name: internal#doc_test#mv11 |
  8. | FailSummary: View struct info is invalid, The graph logic between query and view is not consistent

来看另一个查询

  1. explain
  2. SELECT l_shipdate, l_linestatus, O_ORDERDATE, count(*)
  3. FROM lineitem
  4. LEFT OUTER JOIN orders on l_orderkey = o_orderkey
  5. GROUP BY l_shipdate, l_linestatus, O_ORDERDATE;

Explain 显示信息如下

  1. | MaterializedView |
  2. | MaterializedViewRewriteSuccessAndChose: |
  3. | |
  4. | MaterializedViewRewriteSuccessButNotChose: |
  5. | |
  6. | MaterializedViewRewriteFail: |
  7. | Name: internal#doc_test#mv11 |
  8. | FailSummary: View struct info is invalid, View dimensions doesn't not cover the query dimensions

失败的摘要信息为 View dimensions doesn't not cover the query dimensions,表示查询中 group by 的字段不能从物化 group by 中获取,会报这个错。