查询改写 - 基于代价的查询改写 - 《OceanBase v1.4 官方教程》

Oceanbase目前只实现了一个基于代价的查询改写（or-expansion）。其实数据库中很多高级的改写规则（比如complex view merge和窗口函数改写）都需要基于代价进行改写，Oceanbase后续会实现这些复杂的改写规则。

或展开（Or-Expansion）

Or-expansion把一个查询改写成若干个用union组成的子查询，这个改写可能会给每个子查询提供更优的优化空间，但是也会导致多个子查询的执行，所以这个改写需要基于代价去判断。通常来说，Or-expansion的改写主要有如下三个作用:

允许每个分支使用不同的索引来加速查询

如下图所示，Q1会被改写成Q2的形式，其中Q2中的谓词lnnvl(t1.a = 1)保证了这两个子查询不会生成重复的结果。如果不进行改写，Q1一般来说会选择主表作为访问路径，对于Q2来说，如果t1上存在索引（a）和索引（b），那么该改写可能会让Q2中的每一个子查询选择索引作为访问路径。

Q1: select * from t1 where t1.a = 1 or t1.b = 1;
Q2: select * from t1 where t1.a = 1 union all select * from t1.b = 1 and lnnvl(t1.a = 1);

OceanBase (root@test)> create table t1(a int, b int, c int, d int, e int, index idx_a(a), index idx_b(b));
--- 如果不进行or-expansion的改写，该查询只能使用主表访问路径
OceanBase (root@test)> explain select/*+NO_REWRITE()*/ * from t1 where t1.a = 1 or t1.b = 1;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                                                                                                                                                                                                                                                                  |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |4        |649 |
===================================
Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([t1.a = 1 OR t1.b = 1]),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
--- 改写之后，每个子查询能使用不同的索引访问路径
OceanBase (root@test)> explain select * from t1 where t1.a = 1 or t1.b = 1;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| =========================================
|ID|OPERATOR   |NAME     |EST. ROWS|COST|
-----------------------------------------
|0 |UNION ALL  |         |3        |190 |
|1 | TABLE SCAN|t1(idx_a)|2        |94  |
|2 | TABLE SCAN|t1(idx_b)|1        |95  |
=========================================
Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t1.c, t1.c)], [UNION(t1.d, t1.d)], [UNION(t1.e, t1.e)]), filter(nil)
  1 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter(nil),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p0)
  2 - output([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), filter([lnnvl(t1.a = 1)]),
      access([t1.a], [t1.b], [t1.c], [t1.d], [t1.e]), partitions(p02

允许每个分支使用不同的连接算法来加速查询，避免使用笛卡尔连接

如下图所示，Q1会被改写成Q2的形式。对于Q1来说，它的连接方式只能是nested loop join(笛卡尔乘积)，但是被改写之后，每个子查询都可以选择nested loop join，hash join或者merge join，这样会有更多的优化空间。

Q1: select * from t1, t2 where t1.a = t2.a or t1.b = t2.b;
Q2: select * from t1, t2 where t1.a = t2.a union all
    select * from t1, t2 where t1.b = t2.b and lnnvl(t1.a = t2.a)

OceanBase (root@test)> create table t1(a int, b int);
Query OK, 0 rows affected (0.17 sec)
OceanBase (root@test)> create table t2(a int, b int);
Query OK, 0 rows affected (0.13 sec)
---如果不进行改写，只能使用nested loop join
OceanBase (root@test)> explain select/*+NO_REWRITE()*/ * from t1, t2 where t1.a = t2.a or t1.b = t2.b;
| ===========================================
|ID|OPERATOR        |NAME|EST. ROWS|COST  |
-------------------------------------------
|0 |NESTED-LOOP JOIN|    |3957     |585457|
|1 | TABLE SCAN     |t1  |1000     |499   |
|2 | TABLE SCAN     |t2  |4        |583   |
===========================================
Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      conds(nil), nl_params_([t1.a], [t1.b])
  1 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  2 - output([t2.a], [t2.b]), filter([? = t2.a OR ? = t2.b]),
      access([t2.a], [t2.b]), partitions(p0)
---被改写之后，每个子查询都使用了hash join
OceanBase (root@test)> explain select * from t1, t2 where t1.a = t2.a or t1.b = t2.b;
|ID|OPERATOR    |NAME|EST. ROWS|COST|
-------------------------------------
|0 |UNION ALL   |    |2970     |9105|
|1 | HASH JOIN  |    |1980     |3997|
|2 |  TABLE SCAN|t1  |1000     |499 |
|3 |  TABLE SCAN|t2  |1000     |499 |
|4 | HASH JOIN  |    |990      |3659|
|5 |  TABLE SCAN|t1  |1000     |499 |
|6 |  TABLE SCAN|t2  |1000     |499 |
=====================================
Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)], [UNION(t2.a, t2.a)], [UNION(t2.b, t2.b)]), filter(nil)
  1 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      equal_conds([t1.a = t2.a]), other_conds(nil)
  2 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  3 - output([t2.a], [t2.b]), filter(nil),
      access([t2.a], [t2.b]), partitions(p0)
  4 - output([t1.a], [t1.b], [t2.a], [t2.b]), filter(nil),
      equal_conds([t1.b = t2.b]), other_conds([lnnvl(t1.a = t2.a)])
  5 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
  6 - output([t2.a], [t2.b]), filter(nil),
      access([t2.a], [t2.b]), partitions(p0)

允许每个分支分别消除排序，更加快速的获取top-k结果

如下图所示，Q1会被改写成Q2。对于Q1来说，执行方式是只能把满足条件的行数找出来，然后进行排序，最终取top-10结果。对于Q2来说，如果存在索引(a,b), 那么Q2中的两个子查询都可以使用索引把排序消除，每个子查询取top-10结果，然后最终对这20行数据排序一下取出最终的top-10行。

Q1: select * from t1 where t1.a = 1 or t1.a = 2 order by b limit 10;
Q2: select * from 
    (select * from t1 where t1.a = 1 order by b limit 10 union all
     select * from t1 where t1.a = 2 order by b limit 10) as temp
    order by temp.b limit 10;

OceanBase (root@test)> create table t1(a int, b int, index idx_a(a, b));
Query OK, 0 rows affected (0.20 sec)
---不改写的话，需要排序最终获取top-k结果
OceanBase (root@test)> explain select/*+NO_REWRITE()*/ * from t1 where t1.a = 1 or t1.a = 2 order by b limit 10;
| ==========================================
|ID|OPERATOR    |NAME     |EST. ROWS|COST|
------------------------------------------
|0 |LIMIT       |         |4        |77  |
|1 | TOP-N SORT |         |4        |76  |
|2 |  TABLE SCAN|t1(idx_a)|4        |73  |
==========================================
Outputs & filters:
-------------------------------------
  0 - output([t1.a], [t1.b]), filter(nil), limit(10), offset(nil)
  1 - output([t1.a], [t1.b]), filter(nil), sort_keys([t1.b, ASC]), topn(10)
  2 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0)
--- 进行改写的话，排序算子可以被消除，最终获取top-k结果
OceanBase (root@test)> explain select * from t1 where t1.a = 1 or t1.a = 2 order by b limit 10;
| ===========================================
|ID|OPERATOR     |NAME     |EST. ROWS|COST|
-------------------------------------------
|0 |LIMIT        |         |3        |76  |
|1 | TOP-N SORT  |         |3        |76  |
|2 |  UNION ALL  |         |3        |74  |
|3 |   TABLE SCAN|t1(idx_a)|2        |37  |
|4 |   TABLE SCAN|t1(idx_a)|1        |37  |
===========================================
Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), limit(10), offset(nil)
  1 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil), sort_keys([UNION(t1.b, t1.b), ASC]), topn(10)
  2 - output([UNION(t1.a, t1.a)], [UNION(t1.b, t1.b)]), filter(nil)
  3 - output([t1.a], [t1.b]), filter(nil),
      access([t1.a], [t1.b]), partitions(p0),
      limit(10), offset(nil)
  4 - output([t1.a], [t1.b]), filter([lnnvl(t1.a = 1)]),
      access([t1.a], [t1.b]), partitions(p0),
      limit(10), offset(nil)