分析函数(某些数据库也叫做窗口函数)与聚集函数类似,计算总是基于一组行的集合,不同的是,聚集函数一组只能返回一行,而窗口函数每组可以返回多行,组内每一行都是基于窗口的逻辑计算的结果。分析函数可以显著优化需要 SELF-JOIN 的查询。
“窗口”也称为 FRAME,OceanBase 数据库同时支持 ROWS 与 RANGE 两种 FRAME 语义,前者是基于物理行偏移的窗口,后者则是基于逻辑值偏移的窗口。
目前 OceanBase 数据库支持以下分析函数:
SUM
MIN
MAX
COUNT
AVG
GROUP_CONCAT**
LISTAGG*
ROW_NUMBER
RANK
DENSE_RANK
PERCENT_RANK
CUME_DIST
FIRST_VALUE
LAST_VALUE
NTH_VALUE
NTILE
LEAD
LAG
注意
函数名称后带 * 的表示仅 Oracle 模式下包含,带 ** 的表示仅 MySQL 模式下包含,没有标记的表示两种模式都有。
执行计划
OceanBase 数据库分析函数的相应算子是: WINDOW FUNCTION,该算子依赖下层算子的输出按分区 + 排序键有序,所以在必要时会分配排序算子。
mysql> explain select row_number() over (partition by table_id order by partition_id) from __all_root_table;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |WINDOW FUNCTION| |1000 |2240|
|1 | SORT | |1000 |2049|
|2 | TABLE SCAN |__all_root_table|1000 |499 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
1 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_table.partition_id, ASC])
2 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
access([__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)
排序消除
如果优化阶段已经知道下层输出是有序的,则不会分配排序算子((tenant_id,table_id)为__all_root_table
主键前缀)
explain select row_number() over (partition by tenant_id, table_id order by partition_id) from __all_root_table;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |WINDOW FUNCTION| |1000 |690 |
|1 | TABLE SCAN |__all_root_table|1000 |499 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.tenant_id], [__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
1 - output([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
access([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)
|
计算合并
当同时存在多个分析函数时,如果存在可以共享一个序的子集,子集内所有函数会在同一个算子里计算而不是拆到多个算子,这样可以有效减少内存开销和边界判断。
如下例所示,row_number() 与 rank() 两个分析函数的计算可以共用同一个排序的输出,因此,只会分配一个 SORT 节点:
mysql> explain select row_number() over (partition by table_id order by partition_id), rank() over (partition by table_id, partition_id order by svr_ip) from __all_root_table;
| ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |WINDOW FUNCTION| |1000 |3017|
|1 | SORT | |1000 |2826|
|2 | TABLE SCAN |__all_root_table|1000 |499 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([T_WIN_FUN_ROW_NUMBER()], [T_WIN_FUN_RANK()]), filter(nil),
win_expr(T_WIN_FUN_RANK()), partition_by([__all_root_table.table_id], [__all_root_table.partition_id]), order_by([__all_root_table.svr_ip, ASC]), window_type(RANGE), uppe r(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING ), lower(UNBOUNDED FOLLOWING)
1 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_tabl e.partition_id, ASC], [__all_root_table.svr_ip, ASC])
2 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil),
access([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), partitions(p0)
落盘
分析函数的计算也支持执行期的落盘操作,其逻辑与排序操作符的落盘逻辑一致。
两种兼容模式下的差异
虽然 MySQL 模式和 Oracle 模式下支持的分析函数基本一样,但即使对于同一个分析函数,也存在着语法、约束检查和实现上的诸多差异。
Oracle 模式下不支持命名窗口功能。
窗口函数如 NTH_VALUE、LEAD、LAG 的某些参数在 MySQL 模式下仅接受整型,但在 Oracle 模式下可以接受能转换成整型的其它类型。
NTH_VALUE(measure_expr, n) 当 n 的取值超过窗口范围时 Oracle 模式下会返回 NULL,而 MySQL 模式下会返回边界值。
MySQL 模式下 GROUP_CONCAT 的默认 seperator 是“
,
”,显式指定分隔符后结果末尾会包含分隔符“;
”Oracle 模式下 LISTAGG 的默认分隔符是''
, 且结果末尾不输出 seperator。Oracle 模式下窗口函数会多出合法性检查;如 NTILE、LISTAGG 要求窗口函数参数来自 partition by 项,再如 FIRST_VALUE、LAST_VALUE 指定了窗口时必须同时进行排序。