子查询

本篇文档向你介绍 MatrixOne 的子查询功能。

概述

子查询是嵌套在另一个查询中的 SQL 表达式,借助子查询,可以在一个查询当中使用另外一个查询的查询结果。

通常情况下,从 SQL 语句结构上,子查询语句一般有以下几种形式:

  • 标量子查询(Scalar Subquery),如 SELECT (SELECT s1 FROM t2) FROM t1
  • 派生表(Derived Tables),如 SELECT t1.s1 FROM (SELECT s1 FROM t2) t1
  • 存在性测试(Existential Test),如 WHERE NOT EXISTS(SELECT ... FROM t2)WHERE t1.a IN (SELECT ... FROM t2)
  • 集合比较(Quantified Comparison),如 WHERE t1.a = ANY(SELECT ... FROM t2)
  • 作为比较运算符操作数的子查询,如 WHERE t1.a > (SELECT ... FROM t2)

关于子查询 SQL 语句,参见 SUBQUERY

另外,从 SQL 语句执行情况上,子查询语句一般有以下两种形式:

  • 关联子查询(Correlated Subquery):数据库嵌套查询中内层查询和外层查询不相互独立,内层查询也依赖于外层查询。执行顺序为:

    • 先从外层查询中查询中一条记录。

    • 再将查询到的记录放到内层查询中符合条件的记录,再放到外层中查询。

    • 重复以上步骤

    例如:select * from tableA where tableA.cloumn < (select column from tableB where tableA.id = tableB.id))

  • 无关联子查询 (Self-contained Subquery):数据库嵌套查询中内层查询是完全独立于外层查询的。执行顺序为:

    • 先执行内层查询。
    • 得到内层查询的结果后带入外层,再执行外层查询。

    例如:select * from tableA where tableA.column = (select tableB.column from tableB )

子查询的作用

  • 子查询允许结构化的查询,这样就可以把一个查询语句的每个部分隔开。
  • 子查询提供了另一种方法来执行有些需要复杂的 JOINUNION 来实现的操作。

我们将举一个简单的例子帮助你理解关联子查询无关联子查询

示例

开始前准备

你需要确认在开始之前,已经完成了以下任务:

数据准备

  1. 下载数据集:

    1. https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/tpch/tpch-1g.zip
  2. 创建数据库和数据表:

    1. create database d1;
    2. use d1;
    3. CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
    4. N_NAME CHAR(25) NOT NULL,
    5. N_REGIONKEY INTEGER NOT NULL,
    6. N_COMMENT VARCHAR(152),
    7. PRIMARY KEY (N_NATIONKEY));
    8. CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
    9. R_NAME CHAR(25) NOT NULL,
    10. R_COMMENT VARCHAR(152),
    11. PRIMARY KEY (R_REGIONKEY));
    12. CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
    13. P_NAME VARCHAR(55) NOT NULL,
    14. P_MFGR CHAR(25) NOT NULL,
    15. P_BRAND CHAR(10) NOT NULL,
    16. P_TYPE VARCHAR(25) NOT NULL,
    17. P_SIZE INTEGER NOT NULL,
    18. P_CONTAINER CHAR(10) NOT NULL,
    19. P_RETAILPRICE DECIMAL(15,2) NOT NULL,
    20. P_COMMENT VARCHAR(23) NOT NULL,
    21. PRIMARY KEY (P_PARTKEY));
    22. CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
    23. S_NAME CHAR(25) NOT NULL,
    24. S_ADDRESS VARCHAR(40) NOT NULL,
    25. S_NATIONKEY INTEGER NOT NULL,
    26. S_PHONE CHAR(15) NOT NULL,
    27. S_ACCTBAL DECIMAL(15,2) NOT NULL,
    28. S_COMMENT VARCHAR(101) NOT NULL,
    29. PRIMARY KEY (S_SUPPKEY));
    30. CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
    31. PS_SUPPKEY INTEGER NOT NULL,
    32. PS_AVAILQTY INTEGER NOT NULL,
    33. PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
    34. PS_COMMENT VARCHAR(199) NOT NULL,
    35. PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY));
    36. CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
    37. C_NAME VARCHAR(25) NOT NULL,
    38. C_ADDRESS VARCHAR(40) NOT NULL,
    39. C_NATIONKEY INTEGER NOT NULL,
    40. C_PHONE CHAR(15) NOT NULL,
    41. C_ACCTBAL DECIMAL(15,2) NOT NULL,
    42. C_MKTSEGMENT CHAR(10) NOT NULL,
    43. C_COMMENT VARCHAR(117) NOT NULL,
    44. PRIMARY KEY (C_CUSTKEY));
    45. CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL,
    46. O_CUSTKEY INTEGER NOT NULL,
    47. O_ORDERSTATUS CHAR(1) NOT NULL,
    48. O_TOTALPRICE DECIMAL(15,2) NOT NULL,
    49. O_ORDERDATE DATE NOT NULL,
    50. O_ORDERPRIORITY CHAR(15) NOT NULL,
    51. O_CLERK CHAR(15) NOT NULL,
    52. O_SHIPPRIORITY INTEGER NOT NULL,
    53. O_COMMENT VARCHAR(79) NOT NULL,
    54. PRIMARY KEY (O_ORDERKEY));
    55. CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL,
    56. L_PARTKEY INTEGER NOT NULL,
    57. L_SUPPKEY INTEGER NOT NULL,
    58. L_LINENUMBER INTEGER NOT NULL,
    59. L_QUANTITY DECIMAL(15,2) NOT NULL,
    60. L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    61. L_DISCOUNT DECIMAL(15,2) NOT NULL,
    62. L_TAX DECIMAL(15,2) NOT NULL,
    63. L_RETURNFLAG CHAR(1) NOT NULL,
    64. L_LINESTATUS CHAR(1) NOT NULL,
    65. L_SHIPDATE DATE NOT NULL,
    66. L_COMMITDATE DATE NOT NULL,
    67. L_RECEIPTDATE DATE NOT NULL,
    68. L_SHIPINSTRUCT CHAR(25) NOT NULL,
    69. L_SHIPMODE CHAR(10) NOT NULL,
    70. L_COMMENT VARCHAR(44) NOT NULL,
    71. PRIMARY KEY (L_ORDERKEY, L_LINENUMBER));
  3. 把数据导入到数据表中:

    1. load data infile '/YOUR_TPCH_DATA_PATH/nation.tbl' into table NATION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    2. load data infile '/YOUR_TPCH_DATA_PATH/region.tbl' into table REGION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    3. load data infile '/YOUR_TPCH_DATA_PATH/part.tbl' into table PART FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    4. load data infile '/YOUR_TPCH_DATA_PATH/supplier.tbl' into table SUPPLIER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    5. load data infile '/YOUR_TPCH_DATA_PATH/partsupp.tbl' into table PARTSUPP FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    6. load data infile '/YOUR_TPCH_DATA_PATH/orders.tbl' into table ORDERS FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    7. load data infile '/YOUR_TPCH_DATA_PATH/customer.tbl' into table CUSTOMER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    8. load data infile '/YOUR_TPCH_DATA_PATH/lineitem.tbl' into table LINEITEM FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

现在你可以使用这些数据进行查询。

无关联子查询

对于将子查询作为比较运算符 (>/ >=/ < / <= / = / !=) 操作数的这类无关联子查询而言,内层子查询只需要进行一次查询,MatrixOne 在生成执行计划阶段会将内层子查询改写为常量。

  1. mysql> select p.p_name from (select * from part where p_brand='Brand#21' and p_retailprice between 1100 and 1200) p, partsupp ps where p.p_partkey=ps.ps_partkey and p.p_name like '%pink%' limit 10;

在 MatrixOne 执行上述查询的时候会先执行一次内层子查询:

  1. mysql> select * from part where p_brand='Brand#21' and p_retailprice between 1100 and 1200

运行结果为:

  1. +-----------------------------------+
  2. | p_name |
  3. +-----------------------------------+
  4. | olive chartreuse smoke pink tan |
  5. | olive chartreuse smoke pink tan |
  6. | olive chartreuse smoke pink tan |
  7. | olive chartreuse smoke pink tan |
  8. | pink sienna dark bisque turquoise |
  9. | pink sienna dark bisque turquoise |
  10. | pink sienna dark bisque turquoise |
  11. | pink sienna dark bisque turquoise |
  12. | honeydew orchid cyan magenta pink |
  13. | honeydew orchid cyan magenta pink |
  14. +-----------------------------------+
  15. 10 rows in set (0.06 sec)

对于存在性测试和集合比较两种情况下的无关联列子查询,MatrixOne 会将其进行改写和等价替换以获得更好的执行性能。

关联子查询

对于关联子查询而言,由于内层的子查询引用外层查询的列,子查询需要对外层查询得到的每一行都执行一遍,也就是说假设外层查询得到一千万的结果,那么子查询也会被执行一千万次,这会导致查询需要消耗更多的时间和资源。

因此在处理过程中,MatrixOne 会尝试对关联子查询去关联,以从执行计划层面上提高查询效率。

  1. mysql> select p_name from part where P_PARTKEY in (select PS_PARTKEY from PARTSUPP where PS_SUPPLYCOST>=500) and p_name like '%pink%' limit 10;

MatrixOne 在处理该 SQL 语句是会将其改写为等价的 JOIN 查询:

  1. select p_name from part join partsupp on P_PARTKEY=PS_PARTKEY where PS_SUPPLYCOST>=500 and p_name like '%pink%' limit 10;

运行结果为:

  1. +------------------------------------+
  2. | p_name |
  3. +------------------------------------+
  4. | papaya red almond hot pink |
  5. | turquoise hot smoke green pink |
  6. | purple cornsilk red pink floral |
  7. | pink cyan purple white burnished |
  8. | sandy dark pink indian cream |
  9. | powder cornsilk chiffon slate pink |
  10. | rosy light black pink orange |
  11. | pink white goldenrod ivory steel |
  12. | cornsilk dim pink tan sienna |
  13. | lavender navajo steel sandy pink |
  14. +------------------------------------+
  15. 10 rows in set (0.23 sec)

作为最佳实践,在实际开发当中,为提高计算效率,尽量选择等价计算方法进行查询,避免使用关联子查询的方式进行查询。