分区 - 基于分区的查询优化 - 《OceanBase v1.4 官方教程》

分区裁剪

当用户访问分区表时，往往只需要访问其中部分的分区。优化器根据SQL中所带的条件，避免访问无关分区的优化过程我们称之为“分区裁剪”(Partition Pruning)。分区裁剪是分区表提供的重要优化手段，通过分区的裁剪，SQL的执行效率可以得到大幅度的提升。用户可以利用分区裁剪的特性，在访问中加入定位分区的条件，避免访问无关数据，优化查询性能。

分区裁剪本身是一个比较复杂的过程。优化器需要根据用户表的分区信息和SQL中给定的条件，抽取出相关的分区信息，由于SQL中的条件往往比较复杂，整个抽取逻辑的复杂性也随之增加，这一过程由OceanBase中的Query Range子模块完成。

举个例子，当用户使用如下SQL访问分区表时：

create table t1 (c1 int, c2 int) partition by hash(c1) partitions 5;
select * from t1 where c1 = 1;

由于c1为1的数据全部处于第1号分区（p1)，实际上我们只需要访问该分区即可（避免访问第0、2、3、4号分区）。

通过explain可以看到分区裁剪的结果：

create table t1 (c1 int, c2 int) partition by hash(c1) partitions 5;
explain select * from t1 where c1 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |1303|
===================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([t1.c1 = 1]),
      access([t1.c1], [t1.c2]), partitions(p1)

裁剪的结果（p1)显示在了Table Scan操作符的partitions属性中。

分区裁剪基本原理

一级分区的裁剪

hash/list分区分区裁剪就是根据where子句里面的条件并且计算得到分区列的值，通过结果判断需要访问哪些分区。如果分区函数为表达式，且该表达式作为一个整体出现在等值条件里，也可以做分区裁剪。例如：

create table t1 (c1 int, c2 int) partition by hash(c1 + c2) partitions 5;
explain select * from t1 where c1 + c2 = 1 \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |5        |1303|
===================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([t1.c1 + t1.c2 = 1]),
      access([t1.c1], [t1.c2]), partitions(p1)

range分区通过where子句的分区键的范围跟表定义的分区范围的交集来确定需要访问的分区。

说明

注意：对于range分区，因为考虑到函数的单调性，如果分区表达式是一个函数并且查询条件是一个范围，则不支持分区裁剪。

例如，下面的例子中，分区条件为表达式，而查询条件为非等值条件（c1 < 150 and c1 > 100），则无法进行分区裁剪：

create table t1 (c1 int, c2 int) partition by range(c1 + 1)
(partition p0 values less than (100), partition p1 values less than(200));
explain select * from t1 where c1 < 150 and c1 > 110 \G
*************************** 1. row ***************************
Query Plan: ============================================
|ID|OPERATOR           |NAME|EST. ROWS|COST|
--------------------------------------------
|0 |EXCHANGE IN DISTR  |    |19       |1410|
|1 | EXCHANGE OUT DISTR|    |19       |1303|
|2 |  TABLE SCAN       |t1  |19       |1303|
============================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil)
  2 - output([t1.c1], [t1.c2]), filter([t1.c1 < 150], [t1.c1 > 110]),
      access([t1.c1], [t1.c2]), partitions(p[0-1])

如果查询条件是等值条件，则可以进行分区裁剪：

explain select * from t1 where c1 = 150  \G
*************************** 1. row ***************************
Query Plan: ===================================
|ID|OPERATOR  |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1  |1        |1303|
===================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter([t1.c1 = 150]),
      access([t1.c1], [t1.c2]), partitions(p1)

二级分区的裁剪

对于二级分区，先按照一级分区键确定一级需要访问的分区，然后在通过二级分区键确定二级分区需要访问的分区。然后做一个乘积确定二级分区访问的所有物理分区。

例如：

create table t1 (c1 int , c2 int) partition by hash(c1) subpartition by range(c2) subpartition template (subpartition sp0 values less than(100), subpartition sp1 values less than(200)) partitions 5
select * from t1 where (c1 = 1 or c1 = 2) and (c2 > 101 and c2 < 150)

经过计算得到一级分区裁剪结果是p1、p2，而二级分区裁剪的结果是sp1，所以访问的物理分区为p1sp1、p2sp1。

explain select * from t1 where (c1 = 1 or c1 = 2) and (c2 > 101 and c2 < 150) \G
*************************** 1. row ***************************
Query Plan: ============================================
|ID|OPERATOR           |NAME|EST. ROWS|COST|
--------------------------------------------
|0 |EXCHANGE IN DISTR  |    |1        |1403|
|1 | EXCHANGE OUT DISTR|    |1        |1303|
|2 |  TABLE SCAN       |t1  |1        |1303|
============================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil)
  2 - output([t1.c1], [t1.c2]), filter([t1.c1 = 1 OR t1.c1 = 2], [t1.c2 > 101], [t1.c2 < 150]),
      access([t1.c1], [t1.c2]), partitions(p1sp1, p2sp1)

需要注意的是，上面的规则其实在某些情况下是错误的，不能得到精确的分区。例如，针对如下SQL：

select * from t1 where (c1,c2) in ((1,10), (2,150));

按照刚才的规则。一级分区要访问p1, p2，二级分区要访问sp0, sp1，那么最后访问的物理分区为p1sp0, p1sp1, p2sp0, p2sp1。但是实际上我们只需要访问p1sp0, p2sp1即可。因此，我们对向量条件做了特殊处理，能精确的做分区裁剪。

explain select * from t1 where (c1,c2) in ((1,10), (2,150)) \G
*************************** 1. row ***************************
Query Plan: ============================================
|ID|OPERATOR           |NAME|EST. ROWS|COST|
--------------------------------------------
|0 |EXCHANGE IN DISTR  |    |1        |1403|
|1 | EXCHANGE OUT DISTR|    |1        |1303|
|2 |  TABLE SCAN       |t1  |1        |1303|
============================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil)
  2 - output([t1.c1], [t1.c2]), filter([(t1.c1, t1.c2) IN ((1, 10), (2, 150))]),
      access([t1.c1], [t1.c2]), partitions(p1sp0, p2sp1)

说明

注意：某些场景下，分区裁剪可能会存在一定程度的放大，但优化器可以确保裁剪的结果是所需访问数据的超集，不会存在丢失数据的情况。

指定分区的查询

除了根据SQL的查询条件进行分区裁剪以外，OceanBase也支持用户通过SQL语法指定需要访问的分区：

select * from t1 partition (p0);

partition(p0)指定了只访问p0分区。

说明

注意：如果SQL中指定了分区，系统会将查询的范围限定在所指定的分区集合内，同时根据SQL的查询条件进行分区裁剪。最终访问的分区为指定分区和分区裁剪二者的交集。

分区名字规则

由于需要指定分区查询，所以用户需要知道分区的命名规则。

对于list和range分区，因为在创建表的过程中就指定了分区的名字。所以名字就是用户给定的名字。对于hash/key分区，用户没有给定分区的名字，分区的命名由系统根据命名规则完成，具体是：如果hash/key出现在一级分区里面，那么每个分区分别命名为p0, p1, …, pn。如果出现在二级分区里面，那么就是sp0, sp1, …, spn。

二级分区的名字由“一级分区+二级分区”方式构成。例如p0sp0，其中p0是一级分区的名字，sp0是二级分区的名字。

获取二级分区的各级分区id

对于二级分区，我们经常在日志里面看到的分区id很大。

举个例子：

create table t1 (c1 int , c2 int) partition by hash(c1) 
subpartition by range(c2) 
subpartition template 
(subpartition sp0 values less than(100), 
subpartition sp1 values less than(200)) partitions 5

对于t1，实际的分区id如下：

select partition_id from __all_meta_table join __all_table using(table_id) 
where table_name = 't1';
+---------------------+
| partition_id        |
+---------------------+
| 1152921504875282432 |
| 1152921504875282433 |
| 1152921509170249728 |
| 1152921509170249729 |
| 1152921513465217024 |
| 1152921513465217025 |
| 1152921517760184320 |
| 1152921517760184321 |
| 1152921522055151616 |
| 1152921522055151617 |
+---------------------+

Partition-Wise Join

Partition-Wise Joins：当需要连接的表是按照连接条件进行分区的时候，连接只需对连接表对应分区进行连接操作，能极大提高连接的性能。举个例子：

create table t1 (c1 int, c2 int) partition by hash(c1) partitions 5;
create table t2 (c1 int, c2 int) partition by hash(c1) partitions 5;
explain select * from t1 join t2 using (c1) \G
*************************** 1. row ***************************
Query Plan: =============================================
|ID|OPERATOR           |NAME|EST. ROWS|COST |
---------------------------------------------
|0 |EXCHANGE IN DISTR  |    |49500    |52169|
|1 | EXCHANGE OUT DISTR|    |49500    |42798|
|2 |  HASH JOIN        |    |49500    |42798|
|3 |   TABLE SCAN      |t1  |5000     |499  |
|4 |   TABLE SCAN      |t2  |5000     |499  |
=============================================
Outputs & filters:
-------------------------------------
  0 - output([t1.c1], [t1.c2], [t2.c2]), filter(nil)
  1 - output([t1.c1], [t1.c2], [t2.c2]), filter(nil)
  2 - output([t1.c1], [t1.c2], [t2.c2]), filter(nil),
      equal_conds([t1.c1 = t2.c1]), other_conds(nil)
  3 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p[0-4])
  4 - output([t2.c1], [t2.c2]), filter(nil),
      access([t2.c1], [t2.c2]), partitions(p[0-4])

说明

注意：能使用Partition-Wise Join需要满足以下条件：连接条件包含所有的分区键分区方式需要完全一致对应分区必须分布在同一台机器上