Understanding OpenShift Logging alerts
All of the logging collector alerts are listed on the Alerting UI of the OKD web console.
Viewing logging collector alerts
Alerts are shown in the OKD web console, on the Alerts tab of the Alerting UI. Alerts are in one of the following states:
Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.
Pending The alert condition is currently true, but the timeout has not been reached.
Not Firing. The alert is not currently triggered.
Procedure
To view OpenShift Logging and other OKD alerts:
In the OKD console, click Monitoring → Alerting.
Click the Alerts tab. The alerts are listed, based on the filters selected.
Additional resources
- For more information on the Alerting UI, see Managing alerts.
About logging collector alerts
The following alerts are generated by the logging collector. You can view these alerts in the OKD web console, on the Alerts page of the Alerting UI.
Alert | Message | Description | Severity |
---|---|---|---|
|
| The number of FluentD output errors is high, by default more than 10 in the previous 15 minutes. | Warning |
|
| Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance. | Critical |
|
| Fluentd is reporting that it cannot keep up with the data being indexed. | Warning |
|
| Fluentd is reporting that the queue size is increasing. | Critical |
|
| The number of FluentD output errors is very high, by default more than 25 in the previous 15 minutes. | Critical |
About Elasticsearch alerting rules
You can view these alerting rules in Prometheus.
Alert | Description | Severity |
---|---|---|
| The cluster health status has been RED for at least 2 minutes. The cluster does not accept writes, shards may be missing, or the master node hasn’t been elected yet. | Critical |
| The cluster health status has been YELLOW for at least 20 minutes. Some shard replicas are not allocated. | Warning |
| The cluster is expected to be out of disk space within the next 6 hours. | Critical |
| The cluster is predicted to be out of file descriptors within the next hour. | Warning |
| The JVM Heap usage on the specified node is high. | Alert |
| The specified node has hit the low watermark due to low free disk space. Shards can not be allocated to this node anymore. You should consider adding more disk space to the node. | Info |
| The specified node has hit the high watermark due to low free disk space. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node. | Warning |
| The specified node has hit the flood watermark due to low free disk space. Every index that has a shard allocated on this node is enforced a read-only block. The index block must be manually released when the disk use falls below the high watermark. | Critical |
| The JVM Heap usage on the specified node is too high. | Alert |
| Elasticsearch is experiencing an increase in write rejections on the specified node. This node might not be keeping up with the indexing speed. | Warning |
| The CPU used by the system on the specified node is too high. | Alert |
| The CPU used by Elasticsearch on the specified node is too high. | Alert |