OCP 中的常用监控指标如下表所示。

    说明

    本节以 OCP V2.4.4 的监控指标为例,其他版本的 OCP 监控指标信息请参考对应版本的《OCP 用户指南》文档中监控指标章节的内容。

    指标分组

    指标名称

    指标解释

    计算表达式

    CPU 使用率cpupercentCPU 使用率100 (1 - sum(rate(node_cpu_seconds_total{mode=”idle”, @LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(node_cpu_seconds_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS))
    IO 吞吐率read每次读取数据量avg(rate(node_disk_read_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576
    IO 吞吐率write每次写入数据量avg(rate(node_disk_written_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576
    IO 耗时read每秒读取平均耗时1000000 avg(rate(node_disk_read_time_seconds_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    IO 耗时write每秒写入平均耗时1000000 * avg(rate(node_disk_write_time_seconds_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    IOPSread每秒读次数avg(rate(node_disk_reads_completed_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    IOPSwrite每秒写次数avg(rate(node_disk_writes_completed_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    Linux 系统负载load1过去1分钟系统平均负载avg(node_load1{@LABELS}) by (@GBLABELS)
    Linux 系统负载load15过去15分钟系统平均负载avg(node_load15{@LABELS}) by (@GBLABELS)
    Linux 系统负载load5过去5分钟系统平均负载avg(node_load5{@LABELS}) by (@GBLABELS)
    MEMStoreactive活跃 MEMStore 大小sum(sysstat_value{metric_group=”sysstat”,stat_id=”130000”,@LABELS}) by (@GBLABELS) / 1048576
    MEMStorelimitMEMStore的limitsum(sysstat_value{metric_group=”sysstat”,stat_id=”130004”,@LABELS}) by (@GBLABELS) / 1048576
    MEMStoretotalMEMStore 总大小sum(sysstat_value{metric_group=”sysstat”,stat_id=”130001”,@LABELS}) by (@GBLABELS) / 1048576
    MEMStoretrigger触发合并阈值sum(sysstat_value{metric_group=”sysstat”,stat_id=”130002”,@LABELS}) by (@GBLABELS) / 1048576
    QPSall每秒处理 SQL 语句数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    QPSdelete每秒处理 Delete 语句数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    QPSinsert每秒处理 Insert 语句数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    QPSreplace每秒处理 Replace 语句数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    QPSselect每秒处理 Select 语句数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    QPSupdate每秒处理 Update 语句数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    SQL 响应时间all服务端每条 SQL 语句平均处理耗时(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) /(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS))
    SQL 执行计划类别distributed每秒处理分布式执行计划数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40012”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    SQL 执行计划类别local每秒处理本地执行数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40010”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    SQL 执行计划类别remote每秒处理远程执行计划数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40011”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    TPStrans_count每秒处理事务数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    事件等待时间waittime等待事件平均耗时sum(rate(time_wait{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    事件等待次数waitcount每秒等待事件次数sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    事务响应时间trans_time服务端每个事务平均处理耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    事务日志数log_count每秒提交的事务日志数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    事务日志耗时sync_time每次事务日志网络同步平均耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”30001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    事务日志耗时write_disk每次事务日志写盘平均耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”80041”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”80040”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    事务日志量log_size每秒提交的事务日志大小sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”80057”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    内存buffers内核 Buffer Cache 大小avg(node_memory_Buffers_bytes{@LABELS}) by (@GBLABELS) / 1073741824
    内存free可用物理内存大小avg(node_memory_MemFree_bytes{@LABELS}) by (@GBLABELS) / 1073741824
    内存used使用物理内存大小(avg(node_memory_MemTotal_bytes{@LABELS}) by (@GBLABELS) - avg(node_memory_MemFree_bytes{@LABELS}) by (@GBLABELS) - avg(node_memory_Cached_bytes{@LABELS}) by (@GBLABELS) - avg(node_memory_Buffers_bytes{@LABELS}) by (@GBLABELS)) / 1073741824
    响应时间all服务端每条 SQL 语句平均处理耗时(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) /(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS))
    响应时间delete服务端每条 Delete 语句平均处理耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    响应时间insert服务端每条 Insert 语句平均处理耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    响应时间replace服务端每条 Replace 语句平均处理耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    响应时间select服务端每条 Select 语句平均处理耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    响应时间update服务端每条 Update 语句平均处理耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    容量分区数量partitioncount分区数量sum(partition_count{metric_group=”all_meta_table”,@LABELS}) by (@GBLABELS)
    容量表数量table_count表数量max(table_count{metric_group=”all_table”,@LABELS}) by (@GBLABELS)
    查询响应时间all服务端每条 SQL 语句平均处理耗时(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40003”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40005”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40007”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)) /(sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40004”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40006”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”40000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS))
    活跃会话数active_session当前活跃会话数sum(active_sessions{metric_group=”all_virtual_processlist”,@LABELS}) by (@GBLABELS)
    等待事件wait_count每秒等待事件次数sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    等待事件耗时wait_time等待事件平均耗时sum(rate(time_wait{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(total_waits{metric_group=”waitevent”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    等锁耗时wait_time写锁平均等待耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60023”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / (sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60021”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60022”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS))
    缓存命中率block_cache块缓存命中率100 1 / (1 + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50009”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50008”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS))
    缓存命中率plan_cache执行计划缓存命中率100 sum(rate(hit_count{metric_group=”plan_cache_stat”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(access_count{metric_group=”plan_cache_stat”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    缓存命中率row_cache行缓存命中率100 * 1 / (1 + sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”50000”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS))
    缓存大小block_cache块缓存大小sum(cache_size{metric_group=”all_virtual_kvcache_info”,cache_name=”user_block_cache”,@LABELS}) by (@GBLABELS) / 1048576
    缓存大小plan_cache执行计划缓存大小sum(mem_used{metric_group=”plan_cache_stat”,@LABELS}) by (@GBLABELS) / 1048576
    缓存大小row_cache行缓存大小sum(cache_size{metric_group=”all_virtual_kvcache_info”,cache_name=”user_row_cache”,@LABELS}) by (@GBLABELS) / 1048576
    网络吞吐率receive每秒接收数据量avg(rate(node_network_receive_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576
    网络吞吐率send每秒发送数据量avg(rate(node_network_transmit_bytes_total{@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / 1048576
    请求等待队列queue_count平均每秒 SQL 进等待队列的次数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”20001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    请求等待队列耗时queue_timeSQL 在等待队列中等待耗时sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”20002”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS) / sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”20001”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    锁等待fail写锁等待失败次数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60022”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)
    锁等待success写锁等待成功次数sum(rate(sysstat_value{metric_group=”sysstat”,stat_id=”60021”,@LABELS}[@INTERVAL]) by (@GBLABELS)) by (@GBLABELS)