PolarDB子查询改写系列(三)子查询折叠

Author: 陈江(恬泰)

背景

一个逻辑运算符析取合取式(Item_cond)下如果有多个子查询,可以尝试对这多个子查询进行折叠,折叠成1个或者改写为true、false。 显而易见可以大大加速sql执行。

注:该功能在PolarDB 8.02.2.22 版本中, 默认在ro上开启,在rw上开启需设置如下参数

  1. set subquery_coalesce_mode=on; // 开启折叠,默认只执行启发式折叠
  2. set subquery_coalesce=on; // 开启基于代价折叠,折叠会增加新算子,大部分场景是个better plan,但并不总是
  3. // 如果开启了cbqt,会忽略这个开关,基于代价进行枚举。

规则生效场景:

  • 折叠的对象可以出现在where, having, join on条件的任何位置上。做到应消尽消。
  • 子查询可以是Exists, IN子查询, ALL/ALL子查询
  • 算子可以是正值, 负值运算,如in /not in, exists/not exists。
  • 父亲逻辑算子可以是and, 可以是or

改写规则

同类型消除

  1. SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE c2 = 0) --子查询1
  2. AND EXISTS (SELECT 1 FROM t2) --子查询2
  3. ==》
  4. SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE c2 = 0)

*注:

子查询1是子查询2的子集,子查询2直接消除掉

or条件下:

  1. SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2 WHERE c2 = 0) --子查询1
  2. or EXIST (SELECT 1 FROM t2) --子查询2
  3. ==》
  4. SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2)

*注: or条件下改写为 false or EXISTs (SELECT 1 FROM t2), 保留大集合

同类型子查询合并

  1. select * from t1 where exists (select t1.a as f from t1 where a >10 and b < 10)
  2. or exists (select a from t1 where a > 10 and c <3);
  3. =>
  4. select * from t1 where exists (select t1.a as f from t1 where a >10 and (b < 10 or c <3);

也可以是and_cond表达式

  1. select * from t1 where not exists (select t1.a as f from t1 where a >10 and b < 10)
  2. and not exists (select a from t1 where a > 10 and c <3);
  3. =>
  4. select * from t1 where not exists (select t1.a as f from t1 where a >10 and (b < 10 or c <3);

互斥类型子查询消除

exists冲突 //消除子查询,改写为false

场景:

  • exists VS not exists
  • in VS not in
  1. SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2 WHERE c1 = 0) --子查询1
  2. AND NOT EXISTs (SELECT 1 FROM t2) --子查询2
  3. ==>
  4. SELECT * FROM t1 WHERE false

any/all 冲突 //SQL2

场景:

  • >any vs <all、<=all
  • <any vs >all, >=all

    1. SELECT * FROM t1 WHERE t1.c1 > any (SELECT c1 FROM t2 where c1 > 100) and
    2. t1.c1 < all (select c1 from t2 where c1 > 10)
    3. ==>
    4. SELECT * FROM t1 WHERE false; //any 是all 子集

    Item_or下的消除

    1. SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2 ) --子查询1
    2. or NOT EXISTs (SELECT 1 FROM t2 WHERE c1 = 0) --子查询2
    3. ==>
    4. SELECT * FROM t1 WHERE true

    互斥exists子查询合并 //SQL3

    1. SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2) --子查询1
    2. AND NOT EXIST (SELECT 1 FROM t2 WHERE c2 = 0) --子查询2
    3. =>
    4. SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2 having sum (case when extra_cond 1 else 0) ==0)

    注:

  • 增加 为having sum (case when extra_cond 1 else 0) ==0

  • 并不总是最优的,基于代价选择是否进行折叠,如确认改写较优,开启subquery_coalesce参数。
  • tpch Q21 热数据测试对比如下
sql/耗时(64dop)折叠(增加sum算子)不折叠(semijoin执行)
Q213 min 50 sec2 min 49.45 sec

any/all合并 //SQL4

场景:只支持in/not in子查询

  • in vs not in 并且 (not in集合更小,是左侧子集)
  • \= any vs != all 并且 (all集合更小,是左侧子集)

    1. SELECT * FROM t1 WHERE t1.c1 = any (SELECT c1 FROM t2 where c1 > 10) and
    2. t1.c1 != all (select c1 from t2 where c1 > 100)
    3. ==>
    4. SELECT * FROM t1 WHERE t1.c1 =
    5. any (SELECT c1 FROM t2 where c1 > 10 and LNNVL(c1 >100));

    注: 该plan也是基于代价进行改写的,通常都是改写更优。

or 条件下合并

  1. SELECT * FROM t1 WHERE t1.c1 = any (SELECT c1 FROM t2 ) or
  2. t1.c1 != all (select c1 from t2 where c1 > 100)
  3. ==>
  4. SELECT * FROM t1 WHERE 1;

结语

PolarDB子查询折叠基本上是能转尽转,可以折叠在where条件, join on条件, having条件内部的任何冗余子查询, 能大大加速SQL执行。

原文:http://mysql.taobao.org/monthly/2024/01/01/