Custom logging alerts
In logging 5.7 and later versions, users can configure the LokiStack deployment to produce customized alerts and recorded metrics. If you want to use customized alerting and recording rules, you must enable the LokiStack ruler component.
LokiStack log-based alerts and recorded metrics are triggered by providing LogQL expressions to the ruler component. The Loki Operator manages a ruler that is optimized for the selected LokiStack size, which can be 1x.extra-small
, 1x.small
, or 1x.medium
.
The |
To provide these expressions, you must create an AlertingRule
custom resource (CR) containing Prometheus-compatible alerting rules, or a RecordingRule
CR containing Prometheus-compatible recording rules.
Administrators can configure log-based alerts or recorded metrics for application
, audit
, or infrastructure
tenants. Users without administrator permissions can configure log-based alerts or recorded metrics for application
tenants of the applications that they have access to.
Application, audit, and infrastructure alerts are sent by default to the OKD monitoring stack Alertmanager in the openshift-monitoring
namespace, unless you have disabled the local Alertmanager instance. If the Alertmanager that is used to monitor user-defined projects in the openshift-user-workload-monitoring
namespace is enabled, application alerts are sent to the Alertmanager in this namespace by default.
Configuring the ruler
When the LokiStack ruler component is enabled, users can define a group of LogQL expressions that trigger logging alerts or recorded metrics.
Administrators can enable the ruler by modifying the LokiStack
custom resource (CR).
Procedure
Enable the ruler by ensuring that the
LokiStack
CR contains the following spec configuration:apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: <name>
namespace: <namespace>
spec:
# ...
rules:
enabled: true (1)
selector:
matchLabels:
openshift.io/<label_name>: "true" (2)
namespaceSelector:
matchLabels:
openshift.io/<label_name>: "true" (3)
1 Enable Loki alerting and recording rules in your cluster. 2 Add a custom label that can be added to namespaces where you want to enable the use of logging alerts and metrics. 3 Add a custom label that can be added to namespaces where you want to enable the use of logging alerts and metrics.
Authorizing Loki rules RBAC permissions
Administrators can allow users to create and manage their own alerting rules by creating a ClusterRole
object and binding this role to usernames. The ClusterRole
object defines the necessary role-based access control (RBAC) permissions for users.
Prerequisites
The Cluster Logging Operator is installed in the
openshift-logging
namespace.You have administrator permissions.
Procedure
Create a cluster role that defines the necessary RBAC permissions.
Bind the appropriate cluster roles to the username:
Example binding command
$ oc adm policy add-role-to-user <cluster_role_name> -n <namespace> <username>
Creating a log-based alerting rule with Loki
The AlertingRule
CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack
instance. In addition, the webhook validation definition provides support for rule validation conditions:
If an
AlertingRule
CR includes an invalidinterval
period, it is an invalid alerting ruleIf an
AlertingRule
CR includes an invalidfor
period, it is an invalid alerting rule.If an
AlertingRule
CR includes an invalid LogQLexpr
, it is an invalid alerting rule.If an
AlertingRule
CR includes two groups with the same name, it is an invalid alerting rule.If none of above applies, an alerting rule is considered valid.
Tenant type | Valid namespaces for AlertingRule CRs |
---|---|
application | |
audit |
|
infrastructure |
|
Prerequisites
Logging subsystem for Red Hat OpenShift Operator 5.7 and later
OKD 4.13 and later
Procedure
Create an
AlertingRule
custom resource (CR):Example infrastructure AlertingRule CR
apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata:
name: loki-operator-alerts
namespace: openshift-operators-redhat (1)
labels: (2)
openshift.io/<label_name>: "true"
spec:
tenantID: "infrastructure" (3)
groups:
- name: LokiOperatorHighReconciliationError
rules:
- alert: HighPercentageError
expr: | (4)
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
/
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
> 0.01
for: 10s
labels:
severity: critical (5)
annotations:
summary: High Loki Operator Reconciliation Errors (6)
description: High Loki Operator Reconciliation Errors (7)
1 The namespace where this AlertingRule
CR is created must have a label matching the LokiStackspec.rules.namespaceSelector
definition.2 The labels
block must match the LokiStackspec.rules.selector
definition.3 AlertingRule
CRs forinfrastructure
tenants are only supported in theopenshift-
,kube-\
, ordefault
namespaces.4 The value for kubernetes_namespace_name:
must match the value formetadata.namespace
.5 The value of this mandatory field must be critical
,warning
, orinfo
.6 This field is mandatory. 7 This field is mandatory. Example application AlertingRule CR
apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata:
name: app-user-workload
namespace: app-ns (1)
labels: (2)
openshift.io/<label_name>: "true"
spec:
tenantID: "application"
groups:
- name: AppUserWorkloadHighError
rules:
- alert:
expr: | (3)
sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
for: 10s
labels:
severity: critical (4)
annotations:
summary: (5)
description: (6)
1 The namespace where this AlertingRule
CR is created must have a label matching the LokiStackspec.rules.namespaceSelector
definition.2 The labels
block must match the LokiStackspec.rules.selector
definition.3 Value for kubernetes_namespace_name:
must match the value formetadata.namespace
.4 The value of this mandatory field must be critical
,warning
, orinfo
.5 The value of this mandatory field is a summary of the rule. 6 The value of this mandatory field is a detailed description of the rule. Apply the
AlertingRule
CR:$ oc apply -f <filename>.yaml