Alarm

Alarm

EMQ X Broker has built-in monitoring and alarm functions. Currently, it supports monitoring of CPU occupancy, (system, process) memory occupancy, number of processes, rule engine resource status, cluster partition and healing and it can alarm. Both activation and deactivation of alarms will generate a alarm log and the Broker will publish an MQTT message with the topic of $SYS/brokers/<Node>/alarms/activate or $SYS/brokers/<Node>/alarms/deactivate, Users can subscribe to the topics of $SYS/brokers/+/alarms/avtivate and $SYS/brokers/+/alarms/deactivate to get alarm notifications.

The Payload of the alarm notification message is in Json format and contains the following fields:

Field	Type	Description
name	string	Alarm name
details	object	Alarm details
message	string	Human-readable alarm instructions
activate_at	integer	The time to activate the alarm, UNIX timestamp in microseconds
deactivate_at	integer / string	The time to deactivate the alarm, the UNIX timestamp in microseconds. The value of this field for the activated alarm is infinity
activated	boolean	Whether the alarm is activated

Taking the alarm of high system memory usage as an example, you will receive a message in the following format:

The alarm will not be generated repeatedly. That is to say, if the high CPU usage alarm has been activated, the same alarm will not appear during its activation. The alarm will be automatically deactivated when the monitored item returns to normal. However, it also supports manual deactivation by the user (if the user clearly does not care about the alarm). Users can view current alarms (activated alarms) and historical alarms (deactivated alarms) on the Dashboard, and they can also use the HTTP API provided by EMQ X Broker to Query and manage alarms.

EMQ X Broker allows users to adjust the alarm function to a certain extent to meet actual needs. The following configuration items are currently opened:

Configuration item	Type	Default value	Description
os_mon.cpu_check_interval	duration	60s	Check interval for CPU usage
os_mon.cpu_high_watermark	percent	80%	The high watermark of the CPU usage, which is the threshold to activate the alarm
os_mon.cpu_low_watermark	percent	60%	The low watermark of the CPU usage, which is the threshold to deactivate the alarm
os_mon.mem_check_interval	duration	60%	Check interval for memory usage
os_mon.sysmem_high_watermark	percent	70%	The high water mark of the system memory usage. The alarm is activated when the total memory occupied by the application reaches this value
os_mon.procmem_high_watermark	percent	5%	The high water mark of the process memory usage. The alarm will be activated when the memory occupied by a single process reaches this value.
vm_mon.check_interval	duration	30s	Check interval for the number of processes
vm_mon.process_high_watermark	percent	80%	The high watermark of the process occupancy rate, that is, the alarm is activated when the ratio of the number of created processes to the maximum number limit reaches this value
vm_mon.process_low_watermark	percent	60%	The low water mark of the process occupancy rate, that is, the alarm is deactivated when the ratio of the number of created processes to the maximum number limit drops to this value
alarm.actions	string	log,publish	The action triggered when the alarm is activated, and it currently only supports log and publish, which is to output log and publish system messages
alarm.size_limit	integer	1000	The maximum number of saved alarms that has been deactivated. After the limit is reached, these alarms will be cleared according to the FIFO principle
alarm.validity_period	duration	24h	The maximum storage time of deactivated alarms, and expired alarms will be cleared