简介

数据库中的集合操作可以把多个查询的结果组合成一个结果集。集合操作主要包含:

  • UNION

  • INTERSECT

  • EXCEPT/MINUS

注意:

OceanBase 同时支持 EXCEPT 与 MINUS,这两者的语义和语法都是相同的。

对于 UNION 来说用户可以指定 UNION 的属性为 ALL(集合可重复) 和 DISTINCT/UNIQUE(集合不可重复)。所有的集合操作默认的属性是 DISTINCT(OceanBase 的 DISTINCT 和 UNIQUE 的语义语法是相同的),而其他的几种集合操作是不能指定属性的。

执行计划

所有的集合操作在逻辑计划层面的展示都是 [UNION|INTERSECT|EXCEPT] [ALL|DISTINCT] 的形式。带有 DISTINCT 关键字时,需要从输出的结果中去掉重复的行,目前 OceanBase 只支持基于排序的集合 DISTINCT 去重,在计划层面,排序操作会下压到各分支内部进行。

  1. OceanBase_114 (root@test)> explain select * from t1 union select * from t1\G
  2. *************************** 1. row ***************************
  3. Query Plan: =======================================
  4. |ID|OPERATOR |NAME|EST. ROWS|COST|
  5. ---------------------------------------
  6. |0 |UNION DISTINCT| |2000 |4861|
  7. |1 | SORT | |1000 |2048|
  8. |2 | TABLE SCAN |t1 |1000 |498 |
  9. |3 | SORT | |1000 |2048|
  10. |4 | TABLE SCAN |t1 |1000 |498 |
  11. =======================================
  12. Outputs & filters:
  13. -------------------------------------
  14. 0 - output([UNION(t1.c1, t1.c1)], [UNION(t1.c2, t1.c2)]), filter(nil)
  15. 1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  16. 2 - output([t1.c1], [t1.c2]), filter(nil),
  17. access([t1.c1], [t1.c2]), partitions(p0)
  18. 3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  19. 4 - output([t1.c1], [t1.c2]), filter(nil),
  20. access([t1.c1], [t1.c2]), partitions(p0)
  21. 1 row in set (0.02 sec)
  22. OceanBase_114 (root@test)> explain select * from t1 intersect select * from t1\G
  23. *************************** 1. row ***************************
  24. Query Plan: ===========================================
  25. |ID|OPERATOR |NAME|EST. ROWS|COST|
  26. -------------------------------------------
  27. |0 |INTERSECT DISTINCT| |1000 |4723|
  28. |1 | SORT | |1000 |2048|
  29. |2 | TABLE SCAN |t1 |1000 |498 |
  30. |3 | SORT | |1000 |2048|
  31. |4 | TABLE SCAN |t1 |1000 |498 |
  32. ===========================================
  33. Outputs & filters:
  34. -------------------------------------
  35. 0 - output([INTERSECT(t1.c1, t1.c1)], [INTERSECT(t1.c2, t1.c2)]), filter(nil)
  36. 1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  37. 2 - output([t1.c1], [t1.c2]), filter(nil),
  38. access([t1.c1], [t1.c2]), partitions(p0)
  39. 3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  40. 4 - output([t1.c1], [t1.c2]), filter(nil),
  41. access([t1.c1], [t1.c2]), partitions(p0)
  42. 1 row in set (0.02 sec)
  43. OceanBase_114 (root@test)> explain select * from t1 minus select * from t1\G
  44. *************************** 1. row ***************************
  45. Query Plan: ========================================
  46. |ID|OPERATOR |NAME|EST. ROWS|COST|
  47. ----------------------------------------
  48. |0 |EXCEPT DISTINCT| |1000 |4723|
  49. |1 | SORT | |1000 |2048|
  50. |2 | TABLE SCAN |t1 |1000 |498 |
  51. |3 | SORT | |1000 |2048|
  52. |4 | TABLE SCAN |t1 |1000 |498 |
  53. ========================================
  54. Outputs & filters:
  55. -------------------------------------
  56. 0 - output([EXCEPT(t1.c1, t1.c1)], [EXCEPT(t1.c2, t1.c2)]), filter(nil)
  57. 1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  58. 2 - output([t1.c1], [t1.c2]), filter(nil),
  59. access([t1.c1], [t1.c2]), partitions(p0)
  60. 3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  61. 4 - output([t1.c1], [t1.c2]), filter(nil),
  62. access([t1.c1], [t1.c2]), partitions(p0)
  63. 1 row in set (0.01 sec)

如下例所示,带有 ALL 属性的集合操作不要求下层有序。

  1. OceanBase_114 (root@test)> explain select * from t1 union all select * from t1\G
  2. *************************** 1. row ***************************
  3. Query Plan: ====================================
  4. |ID|OPERATOR |NAME|EST. ROWS|COST|
  5. ------------------------------------
  6. |0 |UNION ALL | |2000 |1761|
  7. |1 | TABLE SCAN|t1 |1000 |498 |
  8. |2 | TABLE SCAN|t1 |1000 |498 |
  9. ====================================
  10. Outputs & filters:
  11. -------------------------------------
  12. 0 - output([UNION(t1.c1, t1.c1)], [UNION(t1.c2, t1.c2)]), filter(nil)
  13. 1 - output([t1.c1], [t1.c2]), filter(nil),
  14. access([t1.c1], [t1.c2]), partitions(p0)
  15. 2 - output([t1.c1], [t1.c2]), filter(nil),
  16. access([t1.c1], [t1.c2]), partitions(p0)
  17. 1 row in set (0.02 sec)

当集合操作两边出现类型不一致的时候,SQL 层会为 UNION 左右两边的表达式分配 CAST 转换,这个操作在 EXPLAIN 中是可见的。

  1. OceanBase_114 (root@test)> create table t1 (c1 int, c2 varchar(5));
  2. Query OK, 0 rows affected (0.12 sec)
  3. OceanBase_114 (root@test)> explain select c1 from t1 union select c2 from t1\G
  4. *************************** 1. row ***************************
  5. Query Plan: ========================================
  6. |ID|OPERATOR |NAME|EST. ROWS|COST |
  7. ----------------------------------------
  8. |0 |UNION DISTINCT| |2000 |11986|
  9. |1 | SORT | |1000 |5664 |
  10. |2 | TABLE SCAN |t1 |1000 |476 |
  11. |3 | SORT | |1000 |5664 |
  12. |4 | TABLE SCAN |t1 |1000 |476 |
  13. ========================================
  14. Outputs & filters:
  15. -------------------------------------
  16. 0 - output([UNION(cast(t1.c1, VARCHAR(11)), cast(t1.c2, VARCHAR(11)))]), filter(nil)
  17. 1 - output([cast(t1.c1, VARCHAR(11))]), filter(nil), sort_keys([cast(t1.c1, VARCHAR(11)), ASC])
  18. 2 - output([cast(t1.c1, VARCHAR(11))]), filter(nil),
  19. access([t1.c1]), partitions(p0)
  20. 3 - output([cast(t1.c2, VARCHAR(11))]), filter(nil), sort_keys([cast(t1.c2, VARCHAR(11)), ASC])
  21. 4 - output([cast(t1.c2, VARCHAR(11))]), filter(nil),
  22. access([t1.c2]), partitions(p0)
  23. 1 row in set (0.01 sec)