用 EXPLAIN 查看分区查询的执行计划

使用 EXPLAIN 语句可以查看 TiDB 在执行查询时需要访问的分区。由于存在分区裁剪,所显示的分区通常只是所有分区的一个子集。本文档介绍了常见分区表的一些优化方式,以及如何解读 EXPLAIN 语句返回的执行计划信息。

本文档所使用的示例数据如下:

  1. CREATE TABLE t1 (
  2. id BIGINT NOT NULL auto_increment,
  3. d date NOT NULL,
  4. pad1 BLOB,
  5. pad2 BLOB,
  6. pad3 BLOB,
  7. PRIMARY KEY (id,d)
  8. ) PARTITION BY RANGE (YEAR(d)) (
  9. PARTITION p2016 VALUES LESS THAN (2017),
  10. PARTITION p2017 VALUES LESS THAN (2018),
  11. PARTITION p2018 VALUES LESS THAN (2019),
  12. PARTITION p2019 VALUES LESS THAN (2020),
  13. PARTITION pmax VALUES LESS THAN MAXVALUE
  14. );
  15. INSERT INTO t1 (d, pad1, pad2, pad3) VALUES
  16. ('2016-01-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  17. ('2016-06-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  18. ('2016-09-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  19. ('2017-01-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  20. ('2017-06-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  21. ('2017-09-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  22. ('2018-01-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  23. ('2018-06-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  24. ('2018-09-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  25. ('2019-01-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  26. ('2019-06-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  27. ('2019-09-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  28. ('2020-01-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  29. ('2020-06-01', RANDOM_BYTES(102), RANDOM_BYTES(1024), RANDOM_BYTES(1024)),
  30. ('2020-09-01', RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024));
  31. INSERT INTO t1 SELECT NULL, a.d, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
  32. INSERT INTO t1 SELECT NULL, a.d, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
  33. INSERT INTO t1 SELECT NULL, a.d, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
  34. INSERT INTO t1 SELECT NULL, a.d, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
  35. SELECT SLEEP(1);
  36. ANALYZE TABLE t1;

以下示例解释了基于新建分区表 t1 的一条语句:

  1. EXPLAIN SELECT COUNT(*) FROM t1 WHERE d = '2017-06-01';
  1. +------------------------------+---------+-----------+---------------------------+-------------------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +------------------------------+---------+-----------+---------------------------+-------------------------------------------+
  4. | StreamAgg_21 | 1.00 | root | | funcs:count(Column#8)->Column#6 |
  5. | └─TableReader_22 | 1.00 | root | | data:StreamAgg_10 |
  6. | └─StreamAgg_10 | 1.00 | cop[tikv] | | funcs:count(1)->Column#8 |
  7. | └─Selection_20 | 8.87 | cop[tikv] | | eq(test.t1.d, 2017-06-01 00:00:00.000000) |
  8. | └─TableFullScan_19 | 8870.00 | cop[tikv] | table:t1, partition:p2017 | keep order:false |
  9. +------------------------------+---------+-----------+---------------------------+-------------------------------------------+
  10. 5 rows in set (0.01 sec)

由上述 EXPLAIN 结果可知,从最末尾的 —TableFullScan_19 算子开始,再返回到根部的 StreamAgg_21 算子的执行过程如下:

  • TiDB 成功地识别出只需要访问一个分区 (p2017),并将该信息在 access object 列中注明。
  • └─TableFullScan_19 算子先对整个分区进行扫描,然后执行 └─Selection_20 算子筛选起始日期为 2017-06-01 00:00:00.000000 的行。
  • 之后,└─Selection_20 算子匹配的行在 Coprocessor 中进行流式聚合,Coprocessor 本身就可以理解聚合函数 count
  • 每个 Coprocessor 请求会发送一行数据给 TiDB 的 └─TableReader_22 算子,然后将数据在 StreamAgg_21 算子下进行流式聚合,再将一行数据返回给客户端。

以下示例中,分区裁剪不会消除任何分区:

  1. EXPLAIN SELECT COUNT(*) FROM t1 WHERE YEAR(d) = 2017;
  1. +------------------------------------+----------+-----------+---------------------------+----------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +------------------------------------+----------+-----------+---------------------------+----------------------------------+
  4. | HashAgg_20 | 1.00 | root | | funcs:count(Column#7)->Column#6 |
  5. | └─PartitionUnion_21 | 5.00 | root | | |
  6. | ├─StreamAgg_36 | 1.00 | root | | funcs:count(Column#9)->Column#7 |
  7. | └─TableReader_37 | 1.00 | root | | data:StreamAgg_25 |
  8. | └─StreamAgg_25 | 1.00 | cop[tikv] | | funcs:count(1)->Column#9 |
  9. | └─Selection_35 | 6000.00 | cop[tikv] | | eq(year(test.t1.d), 2017) |
  10. | └─TableFullScan_34 | 7500.00 | cop[tikv] | table:t1, partition:p2016 | keep order:false |
  11. | ├─StreamAgg_55 | 1.00 | root | | funcs:count(Column#11)->Column#7 |
  12. | └─TableReader_56 | 1.00 | root | | data:StreamAgg_44 |
  13. | └─StreamAgg_44 | 1.00 | cop[tikv] | | funcs:count(1)->Column#11 |
  14. | └─Selection_54 | 14192.00 | cop[tikv] | | eq(year(test.t1.d), 2017) |
  15. | └─TableFullScan_53 | 17740.00 | cop[tikv] | table:t1, partition:p2017 | keep order:false |
  16. | ├─StreamAgg_74 | 1.00 | root | | funcs:count(Column#13)->Column#7 |
  17. | └─TableReader_75 | 1.00 | root | | data:StreamAgg_63 |
  18. | └─StreamAgg_63 | 1.00 | cop[tikv] | | funcs:count(1)->Column#13 |
  19. | └─Selection_73 | 3977.60 | cop[tikv] | | eq(year(test.t1.d), 2017) |
  20. | └─TableFullScan_72 | 4972.00 | cop[tikv] | table:t1, partition:p2018 | keep order:false |
  21. | ├─StreamAgg_93 | 1.00 | root | | funcs:count(Column#15)->Column#7 |
  22. | └─TableReader_94 | 1.00 | root | | data:StreamAgg_82 |
  23. | └─StreamAgg_82 | 1.00 | cop[tikv] | | funcs:count(1)->Column#15 |
  24. | └─Selection_92 | 20361.60 | cop[tikv] | | eq(year(test.t1.d), 2017) |
  25. | └─TableFullScan_91 | 25452.00 | cop[tikv] | table:t1, partition:p2019 | keep order:false |
  26. | └─StreamAgg_112 | 1.00 | root | | funcs:count(Column#17)->Column#7 |
  27. | └─TableReader_113 | 1.00 | root | | data:StreamAgg_101 |
  28. | └─StreamAgg_101 | 1.00 | cop[tikv] | | funcs:count(1)->Column#17 |
  29. | └─Selection_111 | 8892.80 | cop[tikv] | | eq(year(test.t1.d), 2017) |
  30. | └─TableFullScan_110 | 11116.00 | cop[tikv] | table:t1, partition:pmax | keep order:false |
  31. +------------------------------------+----------+-----------+---------------------------+----------------------------------+
  32. 27 rows in set (0.00 sec)

由上述 EXPLAIN 结果可知:

  • TiDB 认为需要访问所有分区 (p2016..pMax)。这是因为 TiDB 将谓词 YEAR(d)= 2017 视为 non-sargable。这个问题并非是 TiDB 特有的。
  • 在扫描每个分区时,Selection 算子将筛选出年份不为 2017 的行。
  • 在每个分区上会执行流式聚合,以计算匹配的行数。
  • └─PartitionUnion_21 算子会合并访问每个分区后的结果。

其他类型查询的执行计划