OCP 中的常用监控指标如下表所示。
说明
本节以 OCP V2.4.4 的监控指标为例,其他版本的 OCP 监控指标信息请参考对应版本的《OCP 用户指南》文档中监控指标章节的内容。
指标分组 | 指标名称 | 指标解释 | 计算表达式 |
---|---|---|---|
CPU 使用率 | cpupercent | CPU 使用率 | 100 (1 - sum(rate(node_cpu_seconds_total{mode=”idle”, @LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(node_cpu_seconds_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) |
IO 吞吐率 | read | 每次读取数据量 | avg(rate(node_disk_read_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576 |
IO 吞吐率 | write | 每次写入数据量 | avg(rate(node_disk_written_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576 |
IO 耗时 | read | 每秒读取平均耗时 | 1000000 avg(rate(node_disk_read_time_seconds_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
IO 耗时 | write | 每秒写入平均耗时 | 1000000 * avg(rate(node_disk_write_time_seconds_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
IOPS | read | 每秒读次数 | avg(rate(node_disk_reads_completed_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
IOPS | write | 每秒写次数 | avg(rate(node_disk_writes_completed_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
Linux 系统负载 | load1 | 过去1分钟系统平均负载 | avg(node_load1{@LABELS}) by (@GBLABELS) |
Linux 系统负载 | load15 | 过去15分钟系统平均负载 | avg(node_load15{@LABELS}) by (@GBLABELS) |
Linux 系统负载 | load5 | 过去5分钟系统平均负载 | avg(node_load5{@LABELS}) by (@GBLABELS) |
MEMStore | active | 活跃 MEMStore 大小 | sum(sysstat_value{metric_group=”sysstat”,stat_id=”130000”,@LABELS}) by (@GBLABELS) / 1048576 |
MEMStore | limit | MEMStore的limit | sum(sysstat_value{metric_group=”sysstat”,stat_id=”130004”,@LABELS}) by (@GBLABELS) / 1048576 |
MEMStore | total | MEMStore 总大小 | sum(sysstat_value{metric_group=”sysstat”,stat_id=”130001”,@LABELS}) by (@GBLABELS) / 1048576 |
MEMStore | trigger | 触发合并阈值 | sum(sysstat_value{metric_group=”sysstat”,stat_id=”130002”,@LABELS}) by (@GBLABELS) / 1048576 |
QPS | all | 每秒处理 SQL 语句数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
QPS | delete | 每秒处理 Delete 语句数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
QPS | insert | 每秒处理 Insert 语句数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
QPS | replace | 每秒处理 Replace 语句数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
QPS | select | 每秒处理 Select 语句数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
QPS | update | 每秒处理 Update 语句数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
SQL 响应时间 | all | 服务端每条 SQL 语句平均处理耗时 | (sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) /(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) |
SQL 执行计划类别 | distributed | 每秒处理分布式执行计划数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40012”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
SQL 执行计划类别 | local | 每秒处理本地执行数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40010”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
SQL 执行计划类别 | remote | 每秒处理远程执行计划数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40011”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
TPS | trans_count | 每秒处理事务数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
事件等待时间 | waittime | 等待事件平均耗时 | sum(rate(time_wait{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
事件等待次数 | waitcount | 每秒等待事件次数 | sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
事务响应时间 | trans_time | 服务端每个事务平均处理耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
事务日志数 | log_count | 每秒提交的事务日志数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
事务日志耗时 | sync_time | 每次事务日志网络同步平均耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
事务日志耗时 | write_disk | 每次事务日志写盘平均耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”80041”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”80040”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
事务日志量 | log_size | 每秒提交的事务日志大小 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”80057”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
内存 | buffers | 内核 Buffer Cache 大小 | avg(node_memory_Buffers_bytes{@LABELS}) by (@GBLABELS) / 1073741824 |
内存 | free | 可用物理内存大小 | avg(node_memory_MemFree_bytes{@LABELS}) by (@GBLABELS) / 1073741824 |
内存 | used | 使用物理内存大小 | (avg(node_memory_MemTotal_bytes{@LABELS}) by (@GBLABELS) - avg(node_memory_MemFree_bytes{@LABELS}) by (@GBLABELS) - avg(node_memory_Cached_bytes{@LABELS}) by (@GBLABELS) - avg(node_memory_Buffers_bytes{@LABELS}) by (@GBLABELS)) / 1073741824 |
响应时间 | all | 服务端每条 SQL 语句平均处理耗时 | (sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) /(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) |
响应时间 | delete | 服务端每条 Delete 语句平均处理耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
响应时间 | insert | 服务端每条 Insert 语句平均处理耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
响应时间 | replace | 服务端每条 Replace 语句平均处理耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
响应时间 | select | 服务端每条 Select 语句平均处理耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
响应时间 | update | 服务端每条 Update 语句平均处理耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
容量分区数量 | partitioncount | 分区数量 | sum(partition_count{metric_group=”all_meta_table”,@LABELS}) by (@GBLABELS) |
容量表数量 | table_count | 表数量 | max(table_count{metric_group=”all_table”,@LABELS}) by (@GBLABELS) |
查询响应时间 | all | 服务端每条 SQL 语句平均处理耗时 | (sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) /(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) |
活跃会话数 | active_session | 当前活跃会话数 | sum(active_sessions{metric_group=”all_virtual_processlist”,@LABELS}) by (@GBLABELS) |
等待事件 | wait_count | 每秒等待事件次数 | sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
等待事件耗时 | wait_time | 等待事件平均耗时 | sum(rate(time_wait{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
等锁耗时 | wait_time | 写锁平均等待耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60023”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / (sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60021”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60022”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) |
缓存命中率 | block_cache | 块缓存命中率 | 100 1 / (1 + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) |
缓存命中率 | plan_cache | 执行计划缓存命中率 | 100 sum(rate(hit_count{metric_group=”plan_cache_stat”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(access_count{metric_group=”plan_cache_stat”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
缓存命中率 | row_cache | 行缓存命中率 | 100 * 1 / (1 + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) |
缓存大小 | block_cache | 块缓存大小 | sum(cache_size{metric_group=”all_virtual_kvcache_info”,cache_name=”user_block_cache”,@LABELS}) by (@GBLABELS) / 1048576 |
缓存大小 | plan_cache | 执行计划缓存大小 | sum(mem_used{metric_group=”plan_cache_stat”,@LABELS}) by (@GBLABELS) / 1048576 |
缓存大小 | row_cache | 行缓存大小 | sum(cache_size{metric_group=”all_virtual_kvcache_info”,cache_name=”user_row_cache”,@LABELS}) by (@GBLABELS) / 1048576 |
网络吞吐率 | receive | 每秒接收数据量 | avg(rate(node_network_receive_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576 |
网络吞吐率 | send | 每秒发送数据量 | avg(rate(node_network_transmit_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576 |
请求等待队列 | queue_count | 平均每秒 SQL 进等待队列的次数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”20001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
请求等待队列耗时 | queue_time | SQL 在等待队列中等待耗时 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”20002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”20001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
锁等待 | fail | 写锁等待失败次数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60022”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |
锁等待 | success | 写锁等待成功次数 | sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60021”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) |