背景

性能数据是云上数据库服务的一个重要组成部分,对于性能数据,当前云厂商的一般做法是:由单独的采集系统进行性能数据的收集和处理,然后通过用户控制台进行性能数据的展示。借助控制台的将性能数据以图表的形式进行展示,比较直观,但是用户很难与自己的监控平台进行集成。特别是对于企业级用户,这些用户在上云之前往往有比较成熟的自建性能监控平台,虽然部分云厂商开始提供OpenAPI等方式对外开放性能数据,但是与自建平台的整合依然有诸多限制。

AliSQL解决方案

基于上述背景,AliSQL提出了一种全内聚的性能数据解决方案,直接通过系统表的方式对外提供性能数据。用户可以像查询普通数据一样,直接查询INFORMATION_SCHEMA库下的PERF_STATISTICS表得到性能数据。

设计实现

对于MySQL,用户关心的性能数据主要可以分为以下三种:

  1. Host层性能数据,包括主机的CPU占用、内存使用情况、IO调用等;
  2. Server层性能数据,包括各类连接信息、QPS信息、网络流量等;
  3. Engine层性能数据,包括数据读写情况、事务提交情况等;

MySQL内核内部,除了没有统计Host层的性能数据外,Server层和Engine层的性能数据都有统计并且提供了获取方法,例如:

  1. MySQL [information_schema]> show status like "Com_select";
  2. +---------------+-------+
  3. | Variable_name | Value |
  4. +---------------+-------+
  5. | Com_select | 3 |
  6. +---------------+-------+
  7. 1 row in set (0.00 sec)
  8. MySQL [information_schema]> show status like "Innodb_data_read";
  9. +------------------+-----------+
  10. | Variable_name | Value |
  11. +------------------+-----------+
  12. | Innodb_data_read | 470798848 |
  13. +------------------+-----------+
  14. 1 row in set (0.00 sec)

AliSQL Performance Agent需要解决的问题就是:1)整合MySQL内核统计的Server层和Engine层性能指标;2)增加Host层的性能统计;3)提供便捷的外部访问方式。具体实现上:

  1. 新增Performance Agent Plugin,在Plugin内部启动一个性能采集线程,按照指定的采样周期,采集Host层、Server层和Engine层的性能数据;
  2. Host层性能数据的获取方式:根据PID信息,直接读取/proc以及/proc/PID目录下的系统文件;
  3. Server层性能数据获取方式:Plugin内调用Server层统计接口,获取Server层性能数据;
  4. Engine层性能数据获取方式:以InnoDB为例,Plugin内调用InnoDB对外接口,获取InnoDB内部性能数据;
  5. 性能数据的汇总计算:不同的性能指标,计算单个采样周期内的差值或者实时值;
  6. 数据保存方式:以CSV文件格式本地保存,同时在INFORMATION_SCHEMA库下新增一张PERF_STATISTICS表,保存最近1小时的性能数据;
  1. /** 获取Server层性能数据 **/
  2. typedef struct system_status_var STATUS_VAR;
  3. struct system_status_var {
  4. ...
  5. ulonglong created_tmp_disk_tables;
  6. ulonglong created_tmp_tables;
  7. ...
  8. ulong com_stat[(uint) SQLCOM_END];
  9. ...
  10. }
  11. void calc_sum_of_all_status(STATUS_VAR *to)
  12. {
  13. DBUG_ENTER("calc_sum_of_all_status");
  14. mysql_mutex_assert_owner(&LOCK_status);
  15. /* Get global values as base. */
  16. *to= global_status_var;
  17. Add_status add_status(to);
  18. Global_THD_manager::get_instance()->do_for_all_thd_copy(&add_status);
  19. DBUG_VOID_RETURN;
  20. }
  21. /** 获取InnoDB层性能数据 **/
  22. /** Status variables to be passed to MySQL */
  23. extern struct export_var_t export_vars;
  24. struct export_var_t {
  25. ...
  26. ulint innodb_data_read; /*!< Data bytes read */
  27. ulint innodb_data_writes; /*!< I/O write requests */
  28. ulint innodb_data_written; /*!< Data bytes written */
  29. ...
  30. }
  31. /* Function to pass InnoDB status variables to MySQL */
  32. void srv_export_innodb_status(void)
  33. {
  34. ...
  35. mutex_enter(&srv_innodb_monitor_mutex);
  36. ...
  37. export_vars.innodb_data_read = srv_stats.data_read;
  38. export_vars.innodb_data_writes = os_n_file_writes;
  39. export_vars.innodb_data_written = srv_stats.data_written;
  40. ...
  41. }

性能测试

AliSQL Performance Agent启动一个独立的线程用于性能数据的采集和处理,不干扰用户线程的处理。Sysbench下oltp_read_write场景的性能测试结果显示,开启Performance Agent带来的性能损失在1%以内,对性能的影响可以忽略。

并发数关闭Performance Agent开启Performance AgentOverhead
141214101-0.49%
83083430740-0.30%
165802757774-0.44%
326497264321-1.00%
645703556945-0.16%
1285034349990-0.70%
2564836048307-0.11%
51245347454000.12%
10244364943272-0.86%

使用说明

相比通过外部系统获取MySQL的性能数据,直接读取INFORMATION_SCHEMA库下的PERF_STATISTICS表不仅更加方便,而且数据的实时性也更好。

参数说明

  1. MySQL [information_schema]> show variables like "%performance_agent%";
  2. +----------------------------------------+-----------------+
  3. | Variable_name | Value |
  4. +----------------------------------------+-----------------+
  5. | performance_agent_enabled | ON |
  6. | performance_agent_file_size | 100 |
  7. | performance_agent_interval | 1 |
  8. | performance_agent_perfstat_volume_size | 3600 |
  9. +----------------------------------------+-----------------+
  10. 4 rows in set (0.00 sec)

其中:

  1. performance_agent_enabled: plugin启动开关,支持动态开启/关闭;
  2. performance_agent_file_size: 本地CSV文件大小,单位MB;
  3. performance_agent_interval: 采样周期,单位Second;
  4. performance_agent_perfstat_volume_size: PERF_STATISTICS表大小;

表结构说明

INFORMATION_SCHEMA库下的PERF_STATISTICS表结构如下:

  1. CREATE TEMPORARY TABLE `PERF_STATISTICS` (
  2. `TIME` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  3. `PROCS_MEM_USAGE` double NOT NULL DEFAULT '0',
  4. `PROCS_CPU_RATIO` double NOT NULL DEFAULT '0',
  5. `PROCS_IOPS` double NOT NULL DEFAULT '0',
  6. `PROCS_IO_READ_BYTES` bigint(21) NOT NULL DEFAULT '0',
  7. `PROCS_IO_WRITE_BYTES` bigint(21) NOT NULL DEFAULT '0',
  8. `MYSQL_CONN_ABORT` int(11) NOT NULL DEFAULT '0',
  9. `MYSQL_CONN_CREATED` int(11) NOT NULL DEFAULT '0',
  10. `MYSQL_USER_CONN_COUNT` int(11) NOT NULL DEFAULT '0',
  11. `MYSQL_CONN_COUNT` int(11) NOT NULL DEFAULT '0',
  12. `MYSQL_CONN_RUNNING` int(11) NOT NULL DEFAULT '0',
  13. `MYSQL_LOCK_IMMEDIATE` int(11) NOT NULL DEFAULT '0',
  14. `MYSQL_LOCK_WAITED` int(11) NOT NULL DEFAULT '0',
  15. `MYSQL_COM_INSERT` int(11) NOT NULL DEFAULT '0',
  16. `MYSQL_COM_UPDATE` int(11) NOT NULL DEFAULT '0',
  17. `MYSQL_COM_DELETE` int(11) NOT NULL DEFAULT '0',
  18. `MYSQL_COM_SELECT` int(11) NOT NULL DEFAULT '0',
  19. `MYSQL_COM_COMMIT` int(11) NOT NULL DEFAULT '0',
  20. `MYSQL_COM_ROLLBACK` int(11) NOT NULL DEFAULT '0',
  21. `MYSQL_COM_PREPARE` int(11) NOT NULL DEFAULT '0',
  22. `MYSQL_LONG_QUERY` int(11) NOT NULL DEFAULT '0',
  23. `MYSQL_TCACHE_GET` bigint(21) NOT NULL DEFAULT '0',
  24. `MYSQL_TCACHE_MISS` bigint(21) NOT NULL DEFAULT '0',
  25. `MYSQL_TMPFILE_CREATED` int(11) NOT NULL DEFAULT '0',
  26. `MYSQL_TMP_TABLES` int(11) NOT NULL DEFAULT '0',
  27. `MYSQL_TMP_DISKTABLES` int(11) NOT NULL DEFAULT '0',
  28. `MYSQL_SORT_MERGE` int(11) NOT NULL DEFAULT '0',
  29. `MYSQL_SORT_ROWS` int(11) NOT NULL DEFAULT '0',
  30. `MYSQL_BYTES_RECEIVED` bigint(21) NOT NULL DEFAULT '0',
  31. `MYSQL_BYTES_SENT` bigint(21) NOT NULL DEFAULT '0',
  32. `MYSQL_BINLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
  33. `MYSQL_IOLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
  34. `MYSQL_RELAYLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
  35. `EXTRA` json NOT NULL DEFAULT 'null'
  36. ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

注:EXTRA字段为json类型,记录Engine层统计信息:

  1. {
  2. "INNODB_LOG_LSN":0,
  3. "INNODB_TRX_CNT":0,
  4. "INNODB_DATA_READ":0,
  5. "INNODB_IBUF_SIZE":0,
  6. "INNODB_LOG_WAITS":0,
  7. "INNODB_MAX_PURGE":0,
  8. "INNODB_N_WAITING":0,
  9. "INNODB_ROWS_READ":0,
  10. "INNODB_LOG_WRITES":0,
  11. "INNODB_IBUF_MERGES":0,
  12. "INNODB_DATA_WRITTEN":0,
  13. "INNODB_DBLWR_WRITES":0,
  14. "INNODB_IBUF_MERGEOP":0,
  15. "INNODB_IBUF_SEGSIZE":0,
  16. "INNODB_ROWS_DELETED":0,
  17. "INNODB_ROWS_UPDATED":0,
  18. "INNODB_COMMIT_TRXCNT":0,
  19. "INNODB_IBUF_FREELIST":0,
  20. "INNODB_MYSQL_TRX_CNT":0,
  21. "INNODB_ROWS_INSERTED":0,
  22. "INNODB_ACTIVE_TRX_CNT":0,
  23. "INNODB_COMMIT_TRXTIME":0,
  24. "INNODB_IBUF_DISCARDOP":0,
  25. "INNODB_OS_LOG_WRITTEN":0,
  26. "INNODB_ACTIVE_VIEW_CNT":0,
  27. "INNODB_LOG_FLUSHED_LSN":0,
  28. "INNODB_RSEG_HISTORY_LEN":0,
  29. "INNODB_AVG_COMMIT_TRXTIME":0,
  30. "INNODB_LOG_CHECKPOINT_LSN":0,
  31. "INNODB_MAX_COMMIT_TRXTIME":0,
  32. "INNODB_DBLWR_PAGES_WRITTEN":0
  33. }

系统集成

AliSQL Performance Agent通过对外提供INFORMATION_SCHEMA库下的PERF_STATISTICS表的方式,让用户可以像查询普通数据的一样直接查询性能数据。

  1. -- 查询最近30S内的内存和CPU使用情况 --
  2. MySQL [information_schema]> SELECT TIME, PROCS_MEM_USAGE, PROCS_CPU_RATIO
  3. -> FROM PERF_STATISTICS ORDER BY TIME DESC LIMIT 30;
  4. +---------------------+-----------------+-----------------+
  5. | TIME | PROCS_MEM_USAGE | PROCS_CPU_RATIO |
  6. +---------------------+-----------------+-----------------+
  7. | 2020-03-19 15:09:50 | 6070943744 | 101.11 |
  8. | 2020-03-19 15:09:49 | 6070837248 | 100.99 |
  9. | 2020-03-19 15:09:48 | 6070546432 | 101.11 |
  10. | 2020-03-19 15:09:47 | 6071123968 | 101.17 |
  11. | 2020-03-19 15:09:46 | 6070509568 | 101.23 |
  12. | 2020-03-19 15:09:45 | 6070030336 | 101.63 |
  13. | 2020-03-19 15:09:44 | 6069497856 | 100.72 |
  14. | 2020-03-19 15:09:43 | 6069764096 | 100.85 |
  15. | 2020-03-19 15:09:42 | 6069522432 | 101.23 |
  16. | 2020-03-19 15:09:41 | 6068592640 | 101.25 |
  17. | 2020-03-19 15:09:40 | 6069272576 | 100.87 |
  18. | 2020-03-19 15:09:39 | 6069297152 | 101.31 |
  19. | 2020-03-19 15:09:38 | 6069706752 | 101.04 |
  20. | 2020-03-19 15:09:37 | 6069907456 | 100.8 |
  21. | 2020-03-19 15:09:36 | 6069907456 | 103.72 |
  22. | 2020-03-19 15:09:35 | 6069235712 | 99.05 |
  23. | 2020-03-19 15:09:34 | 6068707328 | 101.32 |
  24. | 2020-03-19 15:09:33 | 6068723712 | 100.66 |
  25. | 2020-03-19 15:09:32 | 6069379072 | 101.25 |
  26. | 2020-03-19 15:09:31 | 6069243904 | 103.62 |
  27. | 2020-03-19 15:09:30 | 6069567488 | 101.17 |
  28. | 2020-03-19 15:09:29 | 6069641216 | 98.15 |
  29. | 2020-03-19 15:09:28 | 6069968896 | 101.12 |
  30. | 2020-03-19 15:09:27 | 6070087680 | 104.15 |
  31. | 2020-03-19 15:09:26 | 6069633024 | 101.3 |
  32. | 2020-03-19 15:09:25 | 6069846016 | 100.94 |
  33. | 2020-03-19 15:09:24 | 6068805632 | 101.26 |
  34. | 2020-03-19 15:09:23 | 6068228096 | 98.45 |
  35. | 2020-03-19 15:09:22 | 6067957760 | 103.89 |
  36. | 2020-03-19 15:09:21 | 6067544064 | 98.66 |
  37. +---------------------+-----------------+-----------------+
  38. 30 rows in set (0.26 sec)
  39. -- 查询最近30SInnoDB层的读取和插入行数 --
  40. MySQL [information_schema]> SELECT TIME, EXTRA->'$.INNODB_ROWS_READ' AS INNODB_ROWS_READ,
  41. -> EXTRA->'$.INNODB_ROWS_INSERTED' AS INNODB_ROWS_INSERTED
  42. -> FROM information_schema.PERF_STATISTICS ORDER BY TIME DESC LIMIT 30;
  43. +---------------------+------------------+----------------------+
  44. | TIME | INNODB_ROWS_READ | INNODB_ROWS_INSERTED |
  45. +---------------------+------------------+----------------------+
  46. | 2020-03-19 15:09:50 | 1588696 | 6309 |
  47. | 2020-03-19 15:09:49 | 1534831 | 22712 |
  48. | 2020-03-19 15:09:48 | 1445766 | 25011 |
  49. | 2020-03-19 15:09:47 | 1455092 | 25038 |
  50. | 2020-03-19 15:09:46 | 1427958 | 24966 |
  51. | 2020-03-19 15:09:45 | 1460370 | 25054 |
  52. | 2020-03-19 15:09:44 | 1441310 | 24989 |
  53. | 2020-03-19 15:09:43 | 1430437 | 25963 |
  54. | 2020-03-19 15:09:42 | 1512929 | 24179 |
  55. | 2020-03-19 15:09:41 | 1432366 | 24979 |
  56. | 2020-03-19 15:09:40 | 1471565 | 25075 |
  57. | 2020-03-19 15:09:39 | 1440499 | 24995 |
  58. | 2020-03-19 15:09:38 | 1442158 | 24996 |
  59. | 2020-03-19 15:09:37 | 1457681 | 25035 |
  60. | 2020-03-19 15:09:36 | 1401060 | 24865 |
  61. | 2020-03-19 15:09:35 | 1538809 | 25281 |
  62. | 2020-03-19 15:09:34 | 1465982 | 25073 |
  63. | 2020-03-19 15:09:33 | 1441252 | 24997 |
  64. | 2020-03-19 15:09:32 | 1478242 | 24235 |
  65. | 2020-03-19 15:09:31 | 1449499 | 22237 |
  66. | 2020-03-19 15:09:30 | 1460754 | 25021 |
  67. | 2020-03-19 15:09:29 | 1461106 | 25029 |
  68. | 2020-03-19 15:09:28 | 1471250 | 22653 |
  69. | 2020-03-19 15:09:27 | 1453101 | 21005 |
  70. | 2020-03-19 15:09:26 | 1468384 | 21649 |
  71. | 2020-03-19 15:09:25 | 1413783 | 28213 |
  72. | 2020-03-19 15:09:24 | 1510981 | 16213 |
  73. | 2020-03-19 15:09:23 | 1432580 | 27732 |
  74. | 2020-03-19 15:09:22 | 1486866 | 20387 |
  75. | 2020-03-19 15:09:21 | 1430200 | 26969 |
  76. +---------------------+------------------+----------------------+
  77. 30 rows in set (0.20 sec)

BI集成

由于INFORMATION_SCHEMA库下的PERF_STATISTICS表中保存了标准的时间信息,所以用户可以直接与BI系统进行集成,例如:Grafana。以下是利用Grafana实现的实时监控平台。

PERF_STATISTICS

参考SQL如下:

  1. -- 实时监控CPU和内存使用情况 --
  2. SELECT
  3. $__timeGroupAlias(TIME,1s),
  4. sum(PROCS_CPU_RATIO) AS "PROCS_CPU_RATIO",
  5. sum(PROCS_MEM_USAGE) AS "PROCS_MEM_USAGE"
  6. FROM PERF_STATISTICS
  7. GROUP BY 1
  8. ORDER BY $__timeGroup(TIME,1s);