OKD cluster monitoring, logging, and Telemetry
OKD provides various resources for monitoring at the cluster level.
About OKD monitoring
OKD includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. OKD delivers monitoring best practices out of the box. A set of alerts are included by default that immediately notify cluster administrators about issues with a cluster. Default dashboards in the OKD web console include visual representations of cluster metrics to help you to quickly understand the state of your cluster.
After installing OKD 4.12, cluster administrators can optionally enable monitoring for user-defined projects. By using this feature, cluster administrators, developers, and other users can specify how services and pods are monitored in their own projects. You can then query metrics, review dashboards, and manage alerting rules and silences for your own projects in the OKD web console.
Cluster administrators can grant developers and other users permission to monitor their own projects. Privileges are granted by assigning one of the predefined monitoring roles. |
About logging subsystem components
The logging subsystem components include a collector deployed to each node in the OKD cluster that collects all node and container logs and writes them to a log store. You can use a centralized web UI to create rich visualizations and dashboards with the aggregated data.
The major components of the logging subsystem are:
collection - This is the component that collects logs from the cluster, formats them, and forwards them to the log store. The current implementation is Fluentd.
log store - This is where the logs are stored. The default implementation is Elasticsearch. You can use the default Elasticsearch log store or forward logs to external log stores. The default log store is optimized and tested for short-term storage.
visualization - This is the UI component you can use to view logs, graphs, charts, and so forth. The current implementation is Kibana.
For more information on OpenShift Logging, see the OpenShift Logging documentation.
About Telemetry
Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. The Telemeter Client fetches the metrics values every four minutes and thirty seconds and uploads the data to Red Hat. These metrics are described in this document.
This stream of data is used by Red Hat to monitor the clusters in real-time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out OKD upgrades to customers to minimize service impact and continuously improve the upgrade experience.
This debugging information is available to Red Hat Support and Engineering teams with the same restrictions as accessing data reported through support cases. All connected cluster information is used by Red Hat to help make OKD better and more intuitive to use.
Information collected by Telemetry
The following information is collected by Telemetry:
The unique random identifier that is generated during an installation
Version information, including the OKD cluster version and installed update details that are used to determine update version availability
Update information, including the number of updates available per cluster, the channel and image repository used for an update, update progress information, and the number of errors that occur in an update
The name of the provider platform that OKD is deployed on and the data center location
Sizing information about clusters, machine types, and machines, including the number of CPU cores and the amount of RAM used for each
The number of running virtual machine instances in a cluster
The number of etcd members and the number of objects stored in the etcd cluster
The OKD framework components installed in a cluster and their condition and status
Usage information about components, features, and extensions
Usage details about Technology Previews and unsupported configurations
Information about degraded software
Information about nodes that are marked as
NotReady
Events for all namespaces listed as “related objects” for a degraded Operator
Configuration details that help Red Hat Support to provide beneficial support for customers, including node configuration at the cloud infrastructure level, hostnames, IP addresses, Kubernetes pod names, namespaces, and services
Information about the validity of certificates
Number of application builds by build strategy type
Telemetry does not collect identifying information such as user names or passwords. Red Hat does not intend to collect personal information. If Red Hat discovers that personal information has been inadvertently received, Red Hat will delete such information. To the extent that any telemetry data constitutes personal data, please refer to the Red Hat Privacy Statement for more information about Red Hat’s privacy practices.
CLI troubleshooting and debugging commands
For a list of the oc
client troubleshooting and debugging commands, see the OKD CLI tools documentation.