简介
分析函数(某些数据库下也叫做窗口函数)与聚集函数类似,计算总是基于一组行的集合,不同的是,聚集函数一组只能返回一行,而窗口函数每组可以返回多行,组内每一行都是基于窗口的逻辑计算的结果。分析函数可以显著优化需要self-join的查询。
功能支持
目前OceanBase支持以下分析函数:
SUM
MIN
MAX
COUNT
AVG
GROUP_CONCAT
ROW_NUMBER
RANK
DENSE_RANK
PERCENT_RANK
CUME_DIST
FIRST_VALUE
LAST_VALUE
NTH_VALUE
NTILE
LEAD
LAG
“窗口”也称为FRAME,OceanBase同时支持ROWS与RANGE两种FRAME语义,前者是基于物理行偏移的窗口,后者则是基于逻辑值偏移的窗口。
执行计划
OceanBase分析函数的相应算子是: WINDOW FUNCTION,该算子依赖下层算子的输出按分区 + 排序键有序,所以在必要时会分配排序算子。注:
mysql> explain select row_number() over (partition by table_id order by partition_id) from __all_root_table;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |WINDOW FUNCTION| |1000 |2240|
|1 | SORT | |1000 |2049|
|2 | TABLE SCAN |__all_root_table|1000 |499 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
1 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_table.partition_id, ASC])
2 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
access([__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)
排序消除
如果优化阶段已经知道下层输出是有序的,则不会分配排序算子((tenant_id,table_id)为__all_root_table主键前缀)。
explain select row_number() over (partition by tenant_id, table_id order by partition_id) from __all_root_table;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |WINDOW FUNCTION| |1000 |690 |
|1 | TABLE SCAN |__all_root_table|1000 |499 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.tenant_id], [__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
1 - output([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
access([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)
|
计算合并
当同时存在多个分析函数时,如果存在可以共享一个序的子集,子集内所有函数会在同一个算子里计算而不是拆到多个算子,这样可以有效减少内存开销和边界判断。
如下面的例子所示,row_number()与rank()两个分析函数的计算可以共用同一个排序的输出,因此,只会分配一个sort节点:
mysql> explain select row_number() over (partition by table_id order by partition_id), rank() over (partition by table_id, partition_id order by svr_ip) from __all_root_table;
| ====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------
|0 |WINDOW FUNCTION| |1000 |3017|
|1 | SORT | |1000 |2826|
|2 | TABLE SCAN |__all_root_table|1000 |499 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([T_WIN_FUN_ROW_NUMBER()], [T_WIN_FUN_RANK()]), filter(nil),
win_expr(T_WIN_FUN_RANK()), partition_by([__all_root_table.table_id], [__all_root_table.partition_id]), order_by([__all_root_table.svr_ip, ASC]), window_type(RANGE), uppe r(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING ), lower(UNBOUNDED FOLLOWING)
1 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_tabl e.partition_id, ASC], [__all_root_table.svr_ip, ASC])
2 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil),
access([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), partitions(p0)
落盘
分析函数的计算也支持执行期的落盘操作,其逻辑与排序操作符的落盘逻辑一致。