背景
性能数据是云上数据库服务的一个重要组成部分,对于性能数据,当前云厂商的一般做法是:由单独的采集系统进行性能数据的收集和处理,然后通过用户控制台进行性能数据的展示。借助控制台的将性能数据以图表的形式进行展示,比较直观,但是用户很难与自己的监控平台进行集成。特别是对于企业级用户,这些用户在上云之前往往有比较成熟的自建性能监控平台,虽然部分云厂商开始提供OpenAPI等方式对外开放性能数据,但是与自建平台的整合依然有诸多限制。
AliSQL解决方案
基于上述背景,AliSQL提出了一种全内聚的性能数据解决方案,直接通过系统表的方式对外提供性能数据。用户可以像查询普通数据一样,直接查询INFORMATION_SCHEMA
库下的PERF_STATISTICS
表得到性能数据。
设计实现
对于MySQL,用户关心的性能数据主要可以分为以下三种:
- Host层性能数据,包括主机的CPU占用、内存使用情况、IO调用等;
- Server层性能数据,包括各类连接信息、QPS信息、网络流量等;
- Engine层性能数据,包括数据读写情况、事务提交情况等;
MySQL内核内部,除了没有统计Host层的性能数据外,Server层和Engine层的性能数据都有统计并且提供了获取方法,例如:
MySQL [information_schema]> show status like "Com_select";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_select | 3 |
+---------------+-------+
1 row in set (0.00 sec)
MySQL [information_schema]> show status like "Innodb_data_read";
+------------------+-----------+
| Variable_name | Value |
+------------------+-----------+
| Innodb_data_read | 470798848 |
+------------------+-----------+
1 row in set (0.00 sec)
AliSQL Performance Agent需要解决的问题就是:1)整合MySQL内核统计的Server层和Engine层性能指标;2)增加Host层的性能统计;3)提供便捷的外部访问方式。具体实现上:
- 新增Performance Agent Plugin,在Plugin内部启动一个性能采集线程,按照指定的采样周期,采集Host层、Server层和Engine层的性能数据;
- Host层性能数据的获取方式:根据PID信息,直接读取
/proc
以及/proc/PID
目录下的系统文件; - Server层性能数据获取方式:Plugin内调用Server层统计接口,获取Server层性能数据;
- Engine层性能数据获取方式:以InnoDB为例,Plugin内调用InnoDB对外接口,获取InnoDB内部性能数据;
- 性能数据的汇总计算:不同的性能指标,计算单个采样周期内的差值或者实时值;
- 数据保存方式:以CSV文件格式本地保存,同时在
INFORMATION_SCHEMA
库下新增一张PERF_STATISTICS
表,保存最近1小时的性能数据;
/** 获取Server层性能数据 **/
typedef struct system_status_var STATUS_VAR;
struct system_status_var {
...
ulonglong created_tmp_disk_tables;
ulonglong created_tmp_tables;
...
ulong com_stat[(uint) SQLCOM_END];
...
}
void calc_sum_of_all_status(STATUS_VAR *to)
{
DBUG_ENTER("calc_sum_of_all_status");
mysql_mutex_assert_owner(&LOCK_status);
/* Get global values as base. */
*to= global_status_var;
Add_status add_status(to);
Global_THD_manager::get_instance()->do_for_all_thd_copy(&add_status);
DBUG_VOID_RETURN;
}
/** 获取InnoDB层性能数据 **/
/** Status variables to be passed to MySQL */
extern struct export_var_t export_vars;
struct export_var_t {
...
ulint innodb_data_read; /*!< Data bytes read */
ulint innodb_data_writes; /*!< I/O write requests */
ulint innodb_data_written; /*!< Data bytes written */
...
}
/* Function to pass InnoDB status variables to MySQL */
void srv_export_innodb_status(void)
{
...
mutex_enter(&srv_innodb_monitor_mutex);
...
export_vars.innodb_data_read = srv_stats.data_read;
export_vars.innodb_data_writes = os_n_file_writes;
export_vars.innodb_data_written = srv_stats.data_written;
...
}
性能测试
AliSQL Performance Agent启动一个独立的线程用于性能数据的采集和处理,不干扰用户线程的处理。Sysbench下oltp_read_write场景的性能测试结果显示,开启Performance Agent带来的性能损失在1%以内,对性能的影响可以忽略。
并发数 | 关闭Performance Agent | 开启Performance Agent | Overhead |
---|---|---|---|
1 | 4121 | 4101 | -0.49% |
8 | 30834 | 30740 | -0.30% |
16 | 58027 | 57774 | -0.44% |
32 | 64972 | 64321 | -1.00% |
64 | 57035 | 56945 | -0.16% |
128 | 50343 | 49990 | -0.70% |
256 | 48360 | 48307 | -0.11% |
512 | 45347 | 45400 | 0.12% |
1024 | 43649 | 43272 | -0.86% |
使用说明
相比通过外部系统获取MySQL的性能数据,直接读取INFORMATION_SCHEMA
库下的PERF_STATISTICS
表不仅更加方便,而且数据的实时性也更好。
参数说明
MySQL [information_schema]> show variables like "%performance_agent%";
+----------------------------------------+-----------------+
| Variable_name | Value |
+----------------------------------------+-----------------+
| performance_agent_enabled | ON |
| performance_agent_file_size | 100 |
| performance_agent_interval | 1 |
| performance_agent_perfstat_volume_size | 3600 |
+----------------------------------------+-----------------+
4 rows in set (0.00 sec)
其中:
- performance_agent_enabled: plugin启动开关,支持动态开启/关闭;
- performance_agent_file_size: 本地CSV文件大小,单位MB;
- performance_agent_interval: 采样周期,单位Second;
- performance_agent_perfstat_volume_size:
PERF_STATISTICS
表大小;
表结构说明
INFORMATION_SCHEMA
库下的PERF_STATISTICS
表结构如下:
CREATE TEMPORARY TABLE `PERF_STATISTICS` (
`TIME` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`PROCS_MEM_USAGE` double NOT NULL DEFAULT '0',
`PROCS_CPU_RATIO` double NOT NULL DEFAULT '0',
`PROCS_IOPS` double NOT NULL DEFAULT '0',
`PROCS_IO_READ_BYTES` bigint(21) NOT NULL DEFAULT '0',
`PROCS_IO_WRITE_BYTES` bigint(21) NOT NULL DEFAULT '0',
`MYSQL_CONN_ABORT` int(11) NOT NULL DEFAULT '0',
`MYSQL_CONN_CREATED` int(11) NOT NULL DEFAULT '0',
`MYSQL_USER_CONN_COUNT` int(11) NOT NULL DEFAULT '0',
`MYSQL_CONN_COUNT` int(11) NOT NULL DEFAULT '0',
`MYSQL_CONN_RUNNING` int(11) NOT NULL DEFAULT '0',
`MYSQL_LOCK_IMMEDIATE` int(11) NOT NULL DEFAULT '0',
`MYSQL_LOCK_WAITED` int(11) NOT NULL DEFAULT '0',
`MYSQL_COM_INSERT` int(11) NOT NULL DEFAULT '0',
`MYSQL_COM_UPDATE` int(11) NOT NULL DEFAULT '0',
`MYSQL_COM_DELETE` int(11) NOT NULL DEFAULT '0',
`MYSQL_COM_SELECT` int(11) NOT NULL DEFAULT '0',
`MYSQL_COM_COMMIT` int(11) NOT NULL DEFAULT '0',
`MYSQL_COM_ROLLBACK` int(11) NOT NULL DEFAULT '0',
`MYSQL_COM_PREPARE` int(11) NOT NULL DEFAULT '0',
`MYSQL_LONG_QUERY` int(11) NOT NULL DEFAULT '0',
`MYSQL_TCACHE_GET` bigint(21) NOT NULL DEFAULT '0',
`MYSQL_TCACHE_MISS` bigint(21) NOT NULL DEFAULT '0',
`MYSQL_TMPFILE_CREATED` int(11) NOT NULL DEFAULT '0',
`MYSQL_TMP_TABLES` int(11) NOT NULL DEFAULT '0',
`MYSQL_TMP_DISKTABLES` int(11) NOT NULL DEFAULT '0',
`MYSQL_SORT_MERGE` int(11) NOT NULL DEFAULT '0',
`MYSQL_SORT_ROWS` int(11) NOT NULL DEFAULT '0',
`MYSQL_BYTES_RECEIVED` bigint(21) NOT NULL DEFAULT '0',
`MYSQL_BYTES_SENT` bigint(21) NOT NULL DEFAULT '0',
`MYSQL_BINLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
`MYSQL_IOLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
`MYSQL_RELAYLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
`EXTRA` json NOT NULL DEFAULT 'null'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
注:EXTRA
字段为json类型,记录Engine层统计信息:
{
"INNODB_LOG_LSN":0,
"INNODB_TRX_CNT":0,
"INNODB_DATA_READ":0,
"INNODB_IBUF_SIZE":0,
"INNODB_LOG_WAITS":0,
"INNODB_MAX_PURGE":0,
"INNODB_N_WAITING":0,
"INNODB_ROWS_READ":0,
"INNODB_LOG_WRITES":0,
"INNODB_IBUF_MERGES":0,
"INNODB_DATA_WRITTEN":0,
"INNODB_DBLWR_WRITES":0,
"INNODB_IBUF_MERGEOP":0,
"INNODB_IBUF_SEGSIZE":0,
"INNODB_ROWS_DELETED":0,
"INNODB_ROWS_UPDATED":0,
"INNODB_COMMIT_TRXCNT":0,
"INNODB_IBUF_FREELIST":0,
"INNODB_MYSQL_TRX_CNT":0,
"INNODB_ROWS_INSERTED":0,
"INNODB_ACTIVE_TRX_CNT":0,
"INNODB_COMMIT_TRXTIME":0,
"INNODB_IBUF_DISCARDOP":0,
"INNODB_OS_LOG_WRITTEN":0,
"INNODB_ACTIVE_VIEW_CNT":0,
"INNODB_LOG_FLUSHED_LSN":0,
"INNODB_RSEG_HISTORY_LEN":0,
"INNODB_AVG_COMMIT_TRXTIME":0,
"INNODB_LOG_CHECKPOINT_LSN":0,
"INNODB_MAX_COMMIT_TRXTIME":0,
"INNODB_DBLWR_PAGES_WRITTEN":0
}
系统集成
AliSQL Performance Agent通过对外提供INFORMATION_SCHEMA
库下的PERF_STATISTICS
表的方式,让用户可以像查询普通数据的一样直接查询性能数据。
-- 查询最近30S内的内存和CPU使用情况 --
MySQL [information_schema]> SELECT TIME, PROCS_MEM_USAGE, PROCS_CPU_RATIO
-> FROM PERF_STATISTICS ORDER BY TIME DESC LIMIT 30;
+---------------------+-----------------+-----------------+
| TIME | PROCS_MEM_USAGE | PROCS_CPU_RATIO |
+---------------------+-----------------+-----------------+
| 2020-03-19 15:09:50 | 6070943744 | 101.11 |
| 2020-03-19 15:09:49 | 6070837248 | 100.99 |
| 2020-03-19 15:09:48 | 6070546432 | 101.11 |
| 2020-03-19 15:09:47 | 6071123968 | 101.17 |
| 2020-03-19 15:09:46 | 6070509568 | 101.23 |
| 2020-03-19 15:09:45 | 6070030336 | 101.63 |
| 2020-03-19 15:09:44 | 6069497856 | 100.72 |
| 2020-03-19 15:09:43 | 6069764096 | 100.85 |
| 2020-03-19 15:09:42 | 6069522432 | 101.23 |
| 2020-03-19 15:09:41 | 6068592640 | 101.25 |
| 2020-03-19 15:09:40 | 6069272576 | 100.87 |
| 2020-03-19 15:09:39 | 6069297152 | 101.31 |
| 2020-03-19 15:09:38 | 6069706752 | 101.04 |
| 2020-03-19 15:09:37 | 6069907456 | 100.8 |
| 2020-03-19 15:09:36 | 6069907456 | 103.72 |
| 2020-03-19 15:09:35 | 6069235712 | 99.05 |
| 2020-03-19 15:09:34 | 6068707328 | 101.32 |
| 2020-03-19 15:09:33 | 6068723712 | 100.66 |
| 2020-03-19 15:09:32 | 6069379072 | 101.25 |
| 2020-03-19 15:09:31 | 6069243904 | 103.62 |
| 2020-03-19 15:09:30 | 6069567488 | 101.17 |
| 2020-03-19 15:09:29 | 6069641216 | 98.15 |
| 2020-03-19 15:09:28 | 6069968896 | 101.12 |
| 2020-03-19 15:09:27 | 6070087680 | 104.15 |
| 2020-03-19 15:09:26 | 6069633024 | 101.3 |
| 2020-03-19 15:09:25 | 6069846016 | 100.94 |
| 2020-03-19 15:09:24 | 6068805632 | 101.26 |
| 2020-03-19 15:09:23 | 6068228096 | 98.45 |
| 2020-03-19 15:09:22 | 6067957760 | 103.89 |
| 2020-03-19 15:09:21 | 6067544064 | 98.66 |
+---------------------+-----------------+-----------------+
30 rows in set (0.26 sec)
-- 查询最近30S内InnoDB层的读取和插入行数 --
MySQL [information_schema]> SELECT TIME, EXTRA->'$.INNODB_ROWS_READ' AS INNODB_ROWS_READ,
-> EXTRA->'$.INNODB_ROWS_INSERTED' AS INNODB_ROWS_INSERTED
-> FROM information_schema.PERF_STATISTICS ORDER BY TIME DESC LIMIT 30;
+---------------------+------------------+----------------------+
| TIME | INNODB_ROWS_READ | INNODB_ROWS_INSERTED |
+---------------------+------------------+----------------------+
| 2020-03-19 15:09:50 | 1588696 | 6309 |
| 2020-03-19 15:09:49 | 1534831 | 22712 |
| 2020-03-19 15:09:48 | 1445766 | 25011 |
| 2020-03-19 15:09:47 | 1455092 | 25038 |
| 2020-03-19 15:09:46 | 1427958 | 24966 |
| 2020-03-19 15:09:45 | 1460370 | 25054 |
| 2020-03-19 15:09:44 | 1441310 | 24989 |
| 2020-03-19 15:09:43 | 1430437 | 25963 |
| 2020-03-19 15:09:42 | 1512929 | 24179 |
| 2020-03-19 15:09:41 | 1432366 | 24979 |
| 2020-03-19 15:09:40 | 1471565 | 25075 |
| 2020-03-19 15:09:39 | 1440499 | 24995 |
| 2020-03-19 15:09:38 | 1442158 | 24996 |
| 2020-03-19 15:09:37 | 1457681 | 25035 |
| 2020-03-19 15:09:36 | 1401060 | 24865 |
| 2020-03-19 15:09:35 | 1538809 | 25281 |
| 2020-03-19 15:09:34 | 1465982 | 25073 |
| 2020-03-19 15:09:33 | 1441252 | 24997 |
| 2020-03-19 15:09:32 | 1478242 | 24235 |
| 2020-03-19 15:09:31 | 1449499 | 22237 |
| 2020-03-19 15:09:30 | 1460754 | 25021 |
| 2020-03-19 15:09:29 | 1461106 | 25029 |
| 2020-03-19 15:09:28 | 1471250 | 22653 |
| 2020-03-19 15:09:27 | 1453101 | 21005 |
| 2020-03-19 15:09:26 | 1468384 | 21649 |
| 2020-03-19 15:09:25 | 1413783 | 28213 |
| 2020-03-19 15:09:24 | 1510981 | 16213 |
| 2020-03-19 15:09:23 | 1432580 | 27732 |
| 2020-03-19 15:09:22 | 1486866 | 20387 |
| 2020-03-19 15:09:21 | 1430200 | 26969 |
+---------------------+------------------+----------------------+
30 rows in set (0.20 sec)
BI集成
由于INFORMATION_SCHEMA
库下的PERF_STATISTICS
表中保存了标准的时间信息,所以用户可以直接与BI系统进行集成,例如:Grafana。以下是利用Grafana实现的实时监控平台。
参考SQL如下:
-- 实时监控CPU和内存使用情况 --
SELECT
$__timeGroupAlias(TIME,1s),
sum(PROCS_CPU_RATIO) AS "PROCS_CPU_RATIO",
sum(PROCS_MEM_USAGE) AS "PROCS_MEM_USAGE"
FROM PERF_STATISTICS
GROUP BY 1
ORDER BY $__timeGroup(TIME,1s);