在数据中,子查询可以分成有依赖关系的子查询和没有依赖关系的子查询。有依赖关系的子查询是指该子查询的执行依赖了外部查询的’变量’,所以这种子查询通常会被计算多次。没有依赖关系的子查询是指该子查询的执行不依赖外部查询的’变量’, 这种子查询一般只需要计算一次。下图分别展示了一个没有依赖关系的子查询和一个有依赖关系的子查询。
OceanBase (root@test)> create table t1(a int primary key, b int, c int);
Query OK, 0 rows affected (0.70 sec)
OceanBase (root@test)> create table t2(a int primary key, b int, c int);
Query OK, 0 rows affected (0.92 sec)
-- 没有依赖关系的子查询
OceanBase (root@test)> select * from t1 where t1.a in (select t2.a from t2);
Empty set (0.22 sec)
-- 有依赖关系的子查询,子查询中用到了外层查询变量t1.b
OceanBase (root@test)> select * from t1 where t1.a in (select t2.a from t2 where t2.b = t1.b);
Empty set (0.05 sec)
在Oceanbase中,子查询是通过subplan filter算子执行的。Subplan filter算子会遍历外层查询中的每一行,对于遍历的每一行都会去检查相应的子查询来判断是否满足条件,这有点类似于Nested Loop Join。为了高效地执行子查询,满足特定条件的子查询会被改写成连接语句, 这样优化器在选择连接算法的时候就可以有更多的选择(除了Nested Loop Join,还可以选择Hash Join和Merge Join)。在Oceanbase中,没有依赖关系的子查询都会被改写成连接语句。对于有依赖关系的子查询,只有满足特定条件才会被改写成连接语句。下图分别展示了一个被改写成连接语句的子查询和一个没有被改写成连接语句的子查询。
OceanBase (root@test)> create table t1(a int primary key, b int, c int);
Query OK, 0 rows affected (0.70 sec)
OceanBase (root@test)> create table t2(a int primary key, b int, c int);
Query OK, 0 rows affected (0.92 sec)
-- 有依赖关系的子查询被改写成了semi-join,并且使用了hash semi-join来实现
OceanBase (root@test)> explain select * from t1 where t1.a in (select t2.c from t2 where t2.b = t1.b);
| =======================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------
|0 |HASH SEMI JOIN| |1 |2924|
|1 | TABLE SCAN |t1 |1000 |455 |
|2 | TABLE SCAN |t2 |1000 |455 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([t1.a], [t1.b], [t1.c]), filter(nil),
equal_conds([t1.a = t2.c], [t2.b = t1.b]), other_conds(nil)
1 - output([t1.b], [t1.a], [t1.c]), filter(nil),
access([t1.b], [t1.a], [t1.c]), partitions(p0)
2 - output([t2.b], [t2.c]), filter(nil),
access([t2.b], [t2.c]), partitions(p0)
-- 有依赖关系的子查询不能被改写成semi-join,使用了subplan filter来实现
OceanBase (root@test)> explain select * from t1 where t1.a > (select sum(t2.c) from t2 where t2.b = t1.b);
|ID|OPERATOR |NAME|EST. ROWS|COST |
-------------------------------------------
|0 |SUBPLAN FILTER | |334 |207683|
|1 | TABLE SCAN |t1 |334 |176 |
|2 | SCALAR GROUP BY| |1 |623 |
|3 | TABLE SCAN |t2 |2 |622 |
===========================================
Outputs & filters:
-------------------------------------
0 - output([t1.a], [t1.b], [t1.c]), filter([t1.a > subquery(1)]),
exec_params_([t1.b]), onetime_exprs_(nil), init_plan_idxs_(nil)
1 - output([t1.b], [t1.a], [t1.c]), filter(nil),
access([t1.b], [t1.a], [t1.c]), partitions(p0)
2 - output([T_FUN_SUM(t2.c)]), filter(nil),
group(nil), agg_func([T_FUN_SUM(t2.c)])
3 - output([t2.c]), filter([t2.b = ?]),
access([t2.b], [t2.c]), partitions(p0)