检查办法
通过openGauss提供的gs_check工具可以开展openGauss健康状态检查。
注意事项
- 扩容新节点检查只能在root用户下执行,其他场景都必须在omm用户下执行。
- 必须指定-i或-e参数,-i会检查指定的单项,-e会检查对应场景配置中的多项。
- 如果-i参数中不包含root类检查项或-e场景配置列表中没有root类检查项,则不需要交互输入root权限的用户及其密码。
- 可使用–skip-root-items跳过检查项中包含的root类检查,以免需要输入root权限用户及密码。
- 检查扩容新节点与现有节点之间的一致性,在现有节点执行gs_check命令指定–hosts参数进行检查,其中hosts文件中需要写入新节点ip。
操作步骤
方式1:
- 以操作系统用户omm登录数据库主节点。
执行如下命令对openGauss数据库状态进行检查。
gs_check -i CheckClusterState
其中,-i指定检查项,注意区分大小写。格式:-i CheckClusterState、-i CheckCPU或-i CheckClusterState,CheckCPU。
取值范围为所有支持的检查项名称,详细列表请参见《openGauss 工具参考》中“服务端工具 > gs_checkos > openGauss状态检查表”,用户可以根据需求自己编写新检查项。
方式2:
- 以操作系统用户omm登录数据库主节点。
执行如下命令对openGauss数据库进行健康检查。
gs_check -e inspect
其中,-e指定场景名,注意区分大小写。格式:-e inspect或-e upgrade。
取值范围为所有支持的巡检场景名称,默认列表包括:inspect(例行巡检)、upgrade(升级前巡检)、expand(扩容前巡检)、binary_upgrade(就地升级前巡检)、health(健康检查巡检),用户可以根据需求自己编写场景。
方式3:
- 以操作系统用户omm登录数据库主节点。
- 将巡检工具gs_check及相关目录inspection拷贝分发到所有扩容新节点。
- 将扩容新节点ip写到文件_ipListFile_中,以换行符进行分隔。
执行如下命令对扩容新节点进行扩容前检查。
gs_check -e expand_new_node --hosts ipListFile
-e必须为expand_new_node,即扩容前新节点检查。
openGauss巡检的主要作用是在openGauss运行过程中,检查整个openGauss状态是否正常,或者重大操作前(升级、扩容),确保openGauss满足操作所需的环境条件和状态条件。详细的巡检项目和场景请参见《openGauss 工具参考》中“服务端工具 > gs_checkos > openGauss状态检查表”。
示例
执行单项检查结果:
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3
Checking... [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes. success: 3
Analysis the check result successfully
Success. All check items run completed. Total:1 Success:1 Failed:0
For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz
本地执行结果:
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L
2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] 检查主机CPU占用率,如果idle 大于30%并且iowait 小于 30%.则检查项通过,否则检查项不通过
2017-12-29 17:09:29 [RST] OK
2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749) 12/29/17 _x86_64_
17:09:24 CPU %user %nice %system %iowait %steal %idle
17:09:25 all 0.25 0.00 0.25 0.00 0.00 99.50
17:09:26 all 0.25 0.00 0.13 0.00 0.00 99.62
17:09:27 all 0.25 0.00 0.25 0.13 0.00 99.37
17:09:28 all 0.38 0.00 0.25 0.00 0.13 99.25
17:09:29 all 1.00 0.00 0.88 0.00 0.00 98.12
Average: all 0.43 0.00 0.35 0.03 0.03 99.17
执行场景检查结果:
[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOrequestqueue CheckIOConfigure CheckCheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckNoCheckSum CheckSctpSeProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:root
Please enter password for user[root]:
Please enter password for user[root] on the node[10.244.57.240]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:59 Nodes:2
Checking... [ ] 21/59
Checking... [=========================] 59/59
Start to analysis the check result
CheckClusterState...........................OK
The item run on 2 nodes. success: 2
CheckDBParams...............................OK
The item run on 1 nodes. success: 1
CheckDebugSwitch............................OK
The item run on 2 nodes. success: 2
CheckDirPermissions.........................OK
The item run on 2 nodes. success: 2
CheckReadonlyMode...........................OK
The item run on 1 nodes. success: 1
CheckEnvProfile.............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
GAUSSHOME /usr1/gaussdb/app
LD_LIBRARY_PATH /usr1/gaussdb/app/lib
PATH /usr1/gaussdb/app/bin
CheckBlockdev...............................OK
The item run on 2 nodes. success: 2
CheckCurConnCount...........................OK
The item run on 1 nodes. success: 1
CheckCursorNum..............................OK
The item run on 1 nodes. success: 1
CheckPgxcgroup..............................OK
The item run on 1 nodes. success: 1
CheckDiskFormat.............................OK
The item run on 2 nodes. success: 2
CheckSpaceUsage.............................OK
The item run on 2 nodes. success: 2
CheckInodeUsage.............................OK
The item run on 2 nodes. success: 2
CheckSwapMemory.............................OK
The item run on 2 nodes. success: 2
CheckLogicalBlock...........................OK
The item run on 2 nodes. success: 2
CheckIOrequestqueue.....................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
On device (vdb) 'IO Request' RealValue '256' ExpectedValue '32768'
On device (vda) 'IO Request' RealValue '256' ExpectedValue '32768'
CheckMaxAsyIOrequests.......................OK
The item run on 2 nodes. success: 2
CheckIOConfigure............................OK
The item run on 2 nodes. success: 2
CheckMTU....................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
1500
CheckPing...................................OK
The item run on 2 nodes. success: 2
CheckRXTX...................................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
NetWork[eth0]
RX: 256
TX: 256
CheckNetWorkDrop............................OK
The item run on 2 nodes. success: 2
CheckMultiQueue.............................OK
The item run on 2 nodes. success: 2
CheckEncoding...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
LANG=en_US.UTF-8
CheckFirewall...............................OK
The item run on 2 nodes. success: 2
CheckKernelVer..............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
3.10.0-957.el7.x86_64
CheckMaxHandle..............................OK
The item run on 2 nodes. success: 2
CheckNTPD...................................OK
host240: NTPD service is running, 2020-06-02 17:00:28
host157: NTPD service is running, 2020-06-02 17:00:06
CheckOSVer..................................OK
host240: The current OS is centos 7.6 64bit.
host157: The current OS is centos 7.6 64bit.
CheckSysParams..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
Warning reason: variable 'net.ipv4.tcp_retries1' RealValue '3' ExpectedValue '5'.
Warning reason: variable 'net.ipv4.tcp_syn_retries' RealValue '6' ExpectedValue '5'.
Warning reason: variable 'net.sctp.path_max_retrans' RealValue '5' ExpectedValue '10'.
Warning reason: variable 'net.sctp.max_init_retransmits' RealValue '8' ExpectedValue '10'.
CheckTHP....................................OK
The item run on 2 nodes. success: 2
CheckTimeZone...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
+0800
CheckCPU....................................OK
The item run on 2 nodes. success: 2
CheckSshdService............................OK
The item run on 2 nodes. success: 2
CheckSshdConfig.........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
Warning reason: UseDNS parameter is not set; expected: no
CheckCrondService...........................OK
The item run on 2 nodes. success: 2
CheckStack..................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
8192
CheckNoCheckSum.............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
Nochecksum value is N,Check items pass.
CheckSysPortRange...........................OK
The item run on 2 nodes. success: 2
CheckMemInfo................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
totalMem: 31.260929107666016G
CheckHyperThread............................OK
The item run on 2 nodes. success: 2
CheckTableSpace.............................OK
The item run on 1 nodes. success: 1
CheckSctpService............................OK
The item run on 2 nodes. success: 2
CheckSysadminUser...........................OK
The item run on 1 nodes. success: 1
CheckGUCConsistent..........................OK
All DN instance guc value is consistent.
CheckMaxProcMemory..........................OK
The item run on 1 nodes. success: 1
CheckBootItems..............................OK
The item run on 2 nodes. success: 2
CheckHashIndex..............................OK
The item run on 1 nodes. success: 1
CheckPgxcRedistb............................OK
The item run on 1 nodes. success: 1
CheckNodeGroupName..........................OK
The item run on 1 nodes. success: 1
CheckTDDate.................................OK
The item run on 1 nodes. success: 1
CheckDilateSysTab...........................OK
The item run on 1 nodes. success: 1
CheckKeyProAdj..............................OK
The item run on 2 nodes. success: 2
CheckProStartTime.......................WARNING
host157:
STARTED COMMAND
Tue Jun 2 16:57:18 2020 /usr1/dmuser/dmserver/metricdb1/server/bin/gaussdb --single_node -D /usr1/dmuser/dmb1/data -p 22204
Mon Jun 1 16:15:15 2020 /usr1/gaussdb/app/bin/gaussdb -D /usr1/gaussdb/data/dn1 -M standby
CheckFilehandle.............................OK
The item run on 2 nodes. success: 2
CheckRouting................................OK
The item run on 2 nodes. success: 2
CheckNICModel...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
version: 1.0.0
model: Red Hat, Inc. Virtio network device
CheckDropCache..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
No DropCache process is running
CheckMpprcFile..............................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
There is no mpprc file
Analysis the check result successfully
Failed. All check items run completed. Total:59 Success:52 Warning:5 NG:2
For more information please refer to /usr1/gaussdb/tool/script/gspylib/inspection/output/CheckReport_inspect611.tar.gz