Understanding the monitoring stack
OKD includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. OKD delivers monitoring best practices out of the box. A set of alerts are included by default that immediately notify cluster administrators about issues with a cluster. Default dashboards in the OKD web console include visual representations of cluster metrics to help you to quickly understand the state of your cluster.
After installing OKD 4.9, cluster administrators can optionally enable monitoring for user-defined projects. By using this feature, cluster administrators, developers, and other users can specify how services and pods are monitored in their own projects. You can then query metrics, review dashboards, and manage alerting rules and silences for your own projects in the OKD web console.
Cluster administrators can grant developers and other users permission to monitor their own projects. Privileges are granted by assigning one of the predefined monitoring roles. |
Understanding the monitoring stack
The OKD monitoring stack is based on the Prometheus open source project and its wider ecosystem. The monitoring stack includes the following:
Default platform monitoring components. A set of platform monitoring components are installed in the
openshift-monitoring
project by default during an OKD installation. This provides monitoring for core OKD components including Kubernetes services. The default monitoring stack also enables remote health monitoring for clusters. These components are illustrated in the Installed by default section in the following diagram.Components for monitoring user-defined projects. After optionally enabling monitoring for user-defined projects, additional monitoring components are installed in the
openshift-user-workload-monitoring
project. This provides monitoring for user-defined projects. These components are illustrated in the User section in the following diagram.
Default monitoring components
By default, the OKD 4.9 monitoring stack includes these components:
Component | Description |
---|---|
Cluster Monitoring Operator | The Cluster Monitoring Operator (CMO) is a central component of the monitoring stack. It deploys and manages Prometheus instances, the Thanos Querier, the Telemeter Client, and metrics targets and ensures that they are up to date. The CMO is deployed by the Cluster Version Operator (CVO). |
Prometheus Operator | The Prometheus Operator (PO) in the |
Prometheus | Prometheus is the monitoring system on which the OKD monitoring stack is based. Prometheus is a time-series database and a rule evaluation engine for metrics. Prometheus sends alerts to Alertmanager for processing. |
Prometheus Adapter | The Prometheus Adapter (PA in the preceding diagram) translates Kubernetes node and pod queries for use in Prometheus. The resource metrics that are translated include CPU and memory utilization metrics. The Prometheus Adapter exposes the cluster resource metrics API for horizontal pod autoscaling. The Prometheus Adapter is also used by the |
Alertmanager | The Alertmanager service handles alerts received from Prometheus. Alertmanager is also responsible for sending the alerts to external notification systems. |
| The |
| The |
| The |
Thanos Querier | The Thanos Querier aggregates and optionally deduplicates core OKD metrics and metrics for user-defined projects under a single, multi-tenant interface. |
Grafana | The Grafana analytics platform provides dashboards for analyzing and visualizing the metrics. The Grafana instance that is provided with the monitoring stack, along with its dashboards, is read-only. |
Telemeter Client | The Telemeter Client sends a subsection of the data from platform Prometheus instances to Red Hat to facilitate Remote Health Monitoring for clusters. |
All of the components in the monitoring stack are monitored by the stack and are automatically updated when OKD is updated.
Default monitoring targets
In addition to the components of the stack itself, the default monitoring stack monitors:
CoreDNS
Elasticsearch (if Logging is installed)
etcd
Fluentd (if Logging is installed)
HAProxy
Image registry
Kubelets
Kubernetes API server
Kubernetes controller manager
Kubernetes scheduler
Metering (if Metering is installed)
OpenShift API server
OpenShift Controller Manager
Operator Lifecycle Manager (OLM)
Each OKD component is responsible for its monitoring configuration. For problems with the monitoring of an OKD component, open a bug in Bugzilla against that component, not against the general monitoring component. |
Other OKD framework components might be exposing metrics as well. For details, see their respective documentation.
Components for monitoring user-defined projects
OKD 4.9 includes an optional enhancement to the monitoring stack that enables you to monitor services and pods in user-defined projects. This feature includes the following components:
Component | Description |
---|---|
Prometheus Operator | The Prometheus Operator (PO) in the |
Prometheus | Prometheus is the monitoring system through which monitoring is provided for user-defined projects. Prometheus sends alerts to Alertmanager for processing. |
Thanos Ruler | The Thanos Ruler is a rule evaluation engine for Prometheus that is deployed as a separate process. In OKD 4.9, Thanos Ruler provides rule and alerting evaluation for the monitoring of user-defined projects. |
The components in the preceding table are deployed after monitoring is enabled for user-defined projects. |
All of the components in the monitoring stack are monitored by the stack and are automatically updated when OKD is updated.
Monitoring targets for user-defined projects
When monitoring is enabled for user-defined projects, you can monitor:
Metrics provided through service endpoints in user-defined projects.
Pods running in user-defined projects.