命令的输出

    1. MariaDB> analyze select * from tbl1 where key1 between 10 and 200 and col1 like 'foo%'\G
    2. *************************** 1\. row ***************************
    3. id: 1
    4. select_type: SIMPLE
    5. table: tbl1
    6. type: range
    7. possible_keys: key1
    8. key: key1
    9. key_len: 5
    10. ref: NULL
    11. rows: 181
    12. r_rows: 181
    13. filtered: 100.00
    14. r_filtered: 10.50
    15. Extra: Using index condition; Using where

    我们可以看到 ANALYZE 命令多了r_rows和r_filterd两行,我们来比较一下 EXPLAIN 计算的 rows/filtered 和 ANALYZE 计算的 r_rows/r_filtered 两列的区别。

    r_rows 是基于实际观察的 rows 列,它表示实际从表中读取了多少行数据。

    r_filtered 是基于实际观察的 filtered 列,它表示经过应用WHERE条件之后还有百分之多少的数据剩余。

    输出结果解析

    让我们来看一个更复杂的SQL。

    1. analyze select *
    2. from orders, customer
    3. where
    4. customer.c_custkey=orders.o_custkey and
    5. customer.c_acctbal < 0 and
    6. orders.o_totalprice > 200*1000
    1. +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+
    2. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
    3. +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+
    4. | 1 | SIMPLE | customer | ALL | PRIMARY,... | NULL | NULL | NULL | 149095 | 150000 | 18.08 | 9.13 | Using where |
    5. | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | 10 | 100.00 | 30.03 | Using where |
    6. +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+

    从上面的结果,我们可以获得如下信息:

    对于 customer 表,customer.rows=149095, customer.r_rows=150000. 从这两个值来看,优化器对 customer 表的访问估算还是很准确的。

    customer.filtered=18.08, customer.r_filtered=9.13. 优化器有点高估了customer 表所匹配的记录的条数。(一般来说,当你有个全表扫描,并且 r_filtered 少于15%的时候,你得考虑为表增加相应的索引了)

    orders.filtered=100, orders.r_filtered=30.03. 优化器无法预估经过条件(orders.o_totalprice > 200*1000)检查后还剩多少比例的记录。因此,优化器显示了100%。事实上,这个值是30%,通常来说30%的过滤性并不值得去建一个索引。但是对于多表Join,采集和使用列统计信息也许对查询有帮助,也可能帮助优化器选择更好的执行计划。(因为在关联中,关联条件和普通过滤条件组合以后,可能过滤性会非常好,并且有助于优化器判断哪张表做驱动表比较好)

    然后我们再把前面的例子稍微修改一下

    1. analyze select *
    2. from orders, customer
    3. where
    4. customer.c_custkey=orders.o_custkey and
    5. customer.c_acctbal < -0 and customer.c_comment like '%foo%'
    6. orders.o_totalprice > 200*1000
    7. +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+
    8. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
    9. +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+
    10. | 1 | SIMPLE | customer | ALL | PRIMARY,... | NULL | NULL | NULL | 149095 | 150000 | 18.08 | 0.00 | Using where |
    11. | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | NULL | 100.00 | NULL | Using where |
    12. +----+-------------+----------+------+---------------+-------------+---------+--------------------+--------+--------+----------+------------+-------------+

    这里我们可以看到 orders.r_rows=NULL,以及 orders.r_filtered=NULL。这意味着 orders 表连一次都没有被扫描到。