OKD cluster monitoring, logging, and Telemetry

OKD provides various resources for monitoring at the cluster level.

About OKD monitoring

OKD includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. OKD delivers monitoring best practices out of the box. A set of alerts are included by default that immediately notify cluster administrators about issues with a cluster. Default dashboards in the OKD web console include visual representations of cluster metrics to help you to quickly understand the state of your cluster.

After installing OKD 4.12, cluster administrators can optionally enable monitoring for user-defined projects. By using this feature, cluster administrators, developers, and other users can specify how services and pods are monitored in their own projects. You can then query metrics, review dashboards, and manage alerting rules and silences for your own projects in the OKD web console.

Cluster administrators can grant developers and other users permission to monitor their own projects. Privileges are granted by assigning one of the predefined monitoring roles.

About logging subsystem components

The logging subsystem components include a collector deployed to each node in the OKD cluster that collects all node and container logs and writes them to a log store. You can use a centralized web UI to create rich visualizations and dashboards with the aggregated data.

The major components of the logging subsystem are:

  • collection - This is the component that collects logs from the cluster, formats them, and forwards them to the log store. The current implementation is Fluentd.

  • log store - This is where the logs are stored. The default implementation is Elasticsearch. You can use the default Elasticsearch log store or forward logs to external log stores. The default log store is optimized and tested for short-term storage.

  • visualization - This is the UI component you can use to view logs, graphs, charts, and so forth. The current implementation is Kibana.

For more information on OpenShift Logging, see the OpenShift Logging documentation.

About Telemetry

Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. The Telemeter Client fetches the metrics values every four minutes and thirty seconds and uploads the data to Red Hat. These metrics are described in this document.

This stream of data is used by Red Hat to monitor the clusters in real-time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out OKD upgrades to customers to minimize service impact and continuously improve the upgrade experience.

This debugging information is available to Red Hat Support and Engineering teams with the same restrictions as accessing data reported through support cases. All connected cluster information is used by Red Hat to help make OKD better and more intuitive to use.

Information collected by Telemetry

The following information is collected by Telemetry:

  • The unique random identifier that is generated during an installation

  • Version information, including the OKD cluster version and installed update details that are used to determine update version availability

  • Update information, including the number of updates available per cluster, the channel and image repository used for an update, update progress information, and the number of errors that occur in an update

  • The name of the provider platform that OKD is deployed on and the data center location

  • Sizing information about clusters, machine types, and machines, including the number of CPU cores and the amount of RAM used for each

  • The number of running virtual machine instances in a cluster

  • The number of etcd members and the number of objects stored in the etcd cluster

  • The OKD framework components installed in a cluster and their condition and status

  • Usage information about components, features, and extensions

  • Usage details about Technology Previews and unsupported configurations

  • Information about degraded software

  • Information about nodes that are marked as NotReady

  • Events for all namespaces listed as “related objects” for a degraded Operator

  • Configuration details that help Red Hat Support to provide beneficial support for customers, including node configuration at the cloud infrastructure level, hostnames, IP addresses, Kubernetes pod names, namespaces, and services

  • Information about the validity of certificates

  • Number of application builds by build strategy type

Telemetry does not collect identifying information such as user names or passwords. Red Hat does not intend to collect personal information. If Red Hat discovers that personal information has been inadvertently received, Red Hat will delete such information. To the extent that any telemetry data constitutes personal data, please refer to the Red Hat Privacy Statement for more information about Red Hat’s privacy practices.

CLI troubleshooting and debugging commands

For a list of the oc client troubleshooting and debugging commands, see the OKD CLI tools documentation.