About Logging

About Logging

As a cluster administrator, you can deploy logging subsystem on an OKD cluster, and use it to collect and aggregate node system audit logs, application container logs, and infrastructure logs. You can forward logs to your chosen log outputs, including on-cluster, Red Hat managed log storage. You can also visualize your log data in the OKD web console, or the Kibana web console, depending on your deployed log storage solution.

The Kibana web console is now deprecated is planned to be removed in a future logging release.

OKD cluster administrators can deploy the logging subsystem by using Operators. For information, see Installing the logging subsystem for Red Hat OpenShift.

The Operators are responsible for deploying, upgrading, and maintaining the logging subsystem. After the Operators are installed, you can create a ClusterLogging custom resource (CR) to schedule logging subsystem pods and other resources necessary to support the logging subsystem. You can also create a ClusterLogForwarder CR to specify which logs are collected, how they are transformed, and where they are forwarded to.

Because the internal OKD Elasticsearch log store does not provide secure storage for audit logs, audit logs are not stored in the internal Elasticsearch instance by default. If you want to send the audit logs to the default internal Elasticsearch log store, for example to view the audit logs in Kibana, you must use the Log Forwarding API as described in Forward audit logs to the log store.

Logging architecture

The major components of the logging subsystem are:

Collector

The collector is a daemonset that deploys pods to each OKD node. It collects log data from each node, transforms the data, and forwards it to configured outputs. You can use the Vector collector or the legacy Fluentd collector.

As of logging version 5.6 Fluentd is deprecated and is planned to be removed in a future release. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed. As an alternative to Fluentd, you can use Vector instead.

Log store

The log store stores log data for analysis and is the default output for the log forwarder. You can use the default LokiStack log store, the legacy Elasticsearch log store, or forward logs to additional external log stores.

As of logging version 5.4.3 the OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.

Visualization

You can use a UI component to view a visual representation of your log data. The UI provides a graphical interface to search, query, and view stored logs. If you are using LokiStack as the default log storage, the OKD web console UI is provided by enabling the OKD console plugin. If you are using Elasticsearch as the default log storage, you can use Kibana.

The Kibana web console is now deprecated is planned to be removed in a future logging release.

The logging subsystem for Red Hat OpenShift collects container logs and node logs. These are categorized into types:

Application logs

Container logs generated by user applications running in the cluster, except infrastructure container applications.

Infrastructure logs

Container logs generated by infrastructure namespaces: openshift*, kube*, or default, as well as journald messages from nodes.

Audit logs

Logs generated by auditd, the node audit system, which are stored in the /var/log/audit/audit.log file, and logs from the auditd, kube-apiserver, openshift-apiserver services, as well as the ovn project if enabled.

About deploying the logging subsystem for Red Hat OpenShift

Administrators can deploy the logging subsystem by using the OKD web console or the OpenShift CLI (oc) to install the logging subsystem Operators. The Operators are responsible for deploying, upgrading, and maintaining the logging subsystem.

Administrators and application developers can view the logs of the projects for which they have view access.

Logging custom resources

You can configure your logging subsystem deployment with custom resource (CR) YAML files implemented by each Operator.

Red Hat Openshift Logging Operator:

ClusterLogging (CL) - After the Operators are installed, you create a ClusterLogging custom resource (CR) to schedule logging subsystem pods and other resources necessary to support the logging subsystem. The ClusterLogging CR deploys the collector and forwarder, which currently are both implemented by a daemonset running on each node. The Red Hat OpenShift Logging Operator watches the ClusterLogging CR and adjusts the logging deployment accordingly.
ClusterLogForwarder (CLF) - Generates collector configuration to forward logs per user configuration.

Loki Operator:

LokiStack - Controls the Loki cluster as log store and the web proxy with OpenShift Container Platform authentication integration to enforce multi-tenancy.

OpenShift Elasticsearch Operator:

These CRs are generated and managed by the Red Hat OpenShift Elasticsearch Operator. Manual changes cannot be made without being overwritten by the Operator.

ElasticSearch - Configure and deploy an Elasticsearch instance as the default log store.
Kibana - Configure and deploy Kibana instance to search, query and view logs.

About JSON OKD Logging

You can use JSON logging to configure the Log Forwarding API to parse JSON strings into a structured object. You can perform the following tasks:

Parse JSON logs
Configure JSON log data for Elasticsearch
Forward JSON logs to the Elasticsearch log store

About collecting and storing Kubernetes events

The OKD Event Router is a pod that watches Kubernetes events and logs them for collection by OKD Logging. You must manually deploy the Event Router.

For information, see About collecting and storing Kubernetes events.

About viewing the cluster dashboard

The OKD Logging dashboard contains charts that show details about your Elasticsearch instance at the cluster level. These charts help you diagnose and anticipate problems.

For information, see About viewing the cluster dashboard.

About troubleshooting OKD Logging

You can troubleshoot the logging issues by performing the following tasks:

Viewing logging status
Viewing the status of the log store
Understanding logging alerts
Collecting logging data for Red Hat Support
Troubleshooting for critical alerts

About exporting fields

The logging system exports fields. Exported fields are present in the log records and are available for searching from Elasticsearch and Kibana.

For information, see About exporting fields.

About the logging collector

The Cluster Logging Operator deploys a collector based on the ClusterLogForwarder resource specification. There are two collector options supported by this Operator: the Fluentd collector, and the Vector collector.

The Fluentd collector is now deprecated and will be removed in a future logging release.

The logging subsystem for Red Hat OpenShift collects container and node logs.

By default, the log collector uses the following sources:

journald for all system logs
/var/log/containers/*.log for all container logs

If you configure the log collector to collect audit logs, it gets them from /var/log/audit/audit.log.

The logging collector is a daemon set that deploys pods to each OKD node. System and infrastructure logs are generated by journald log messages from the operating system, the container runtime, and OKD. Application logs are generated by the CRI-O container engine. Fluentd collects the logs from these sources and forwards them internally or externally as you configure in OKD.

The container runtimes provide minimal information to identify the source of log messages: project, pod name, and container ID. This information is not sufficient to uniquely identify the source of the logs. If a pod with a given name and project is deleted before the log collector begins processing its logs, information from the API server, such as labels and annotations, might not be available. There might not be a way to distinguish the log messages from a similarly named pod and project or trace the logs to their source. This limitation means that log collection and normalization are considered best effort.

The available container runtimes provide minimal information to identify the source of log messages and do not guarantee unique individual log messages or that these messages can be traced to their source.

For information, see Configuring the logging collector.

About the log store

By default, OKD uses Elasticsearch (ES) to store log data. Optionally you can use the Log Forwarder API to forward logs to an external store. Several types of store are supported, including fluentd, rsyslog, kafka and others.

The logging subsystem Elasticsearch instance is optimized and tested for short term storage, approximately seven days. If you want to retain your logs over a longer term, it is recommended you move the data to a third-party storage system.

Elasticsearch organizes the log data from Fluentd into datastores, or indices, then subdivides each index into multiple pieces called shards, which it spreads across a set of Elasticsearch nodes in an Elasticsearch cluster. You can configure Elasticsearch to make copies of the shards, called replicas, which Elasticsearch also spreads across the Elasticsearch nodes. The ClusterLogging custom resource (CR) allows you to specify how the shards are replicated to provide data redundancy and resilience to failure. You can also specify how long the different types of logs are retained using a retention policy in the ClusterLogging CR.

The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes.

The Red Hat OpenShift Logging Operator and companion OpenShift Elasticsearch Operator ensure that each Elasticsearch node is deployed using a unique deployment that includes its own storage volume. You can use a ClusterLogging custom resource (CR) to increase the number of Elasticsearch nodes, as needed. See the Elasticsearch documentation for considerations involved in configuring storage.

A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host.

Role-based access control (RBAC) applied on the Elasticsearch indices enables the controlled access of the logs to the developers. Administrators can access all logs and developers can access only the logs in their projects.

For information, see Configuring the log store.

About logging visualization

OKD uses Kibana to display the log data collected by Fluentd and indexed by Elasticsearch.

Kibana is a browser-based console interface to query, discover, and visualize your Elasticsearch data through histograms, line graphs, pie charts, and other visualizations.

For information, see Configuring the log visualizer.

About event routing

The Event Router is a pod that watches OKD events so they can be collected by the logging subsystem for Red Hat OpenShift. The Event Router collects events from all projects and writes them to STDOUT. Fluentd collects those events and forwards them into the OKD Elasticsearch instance. Elasticsearch indexes the events to the infra index.

You must manually deploy the Event Router.

For information, see Collecting and storing Kubernetes events.

About Vector

Vector is a log collector offered as an alternative to Fluentd for the logging subsystem.

The following outputs are supported:

elasticsearch. An external Elasticsearch instance. The elasticsearch output can use a TLS connection.
kafka. A Kafka broker. The kafka output can use an unsecured or TLS connection.
loki. Loki, a horizontally scalable, highly available, multitenant log aggregation system.

Enabling Vector

Use the following steps to enable Vector on your OKD cluster.

Procedure

Edit the ClusterLogging custom resource (CR) in the openshift-logging project:
```
$ oc -n openshift-logging edit ClusterLogging instance
```
Add a logging.openshift.io/preview-vector-collector: enabled annotation to the ClusterLogging custom resource (CR).
Add vector as a collection type to the ClusterLogging custom resource (CR).

  apiVersion: "logging.openshift.io/v1"
  kind: "ClusterLogging"
  metadata:
    name: "instance"
    namespace: "openshift-logging"
    annotations:
      logging.openshift.io/preview-vector-collector: enabled
  spec:
    collection:
    logs:
      type: "vector"
      vector: {}

Additional resources

Vector Documentation

Collector features

Table 1. Log Sources
Feature	Fluentd	Vector
App container logs	✓	✓
App-specific routing	✓	✓
App-specific routing by namespace	✓	✓
Infra container logs	✓	✓
Infra journal logs	✓	✓
Kube API audit logs	✓	✓
OpenShift API audit logs	✓	✓
Open Virtual Network (OVN) audit logs	✓	✓

Table 2. Outputs
Feature	Fluentd	Vector
Elasticsearch v5-v7	✓	✓
Fluent forward	✓
Syslog RFC3164	✓	✓ (Logging 5.7+)
Syslog RFC5424	✓	✓ (Logging 5.7+)
Kafka	✓	✓
Cloudwatch	✓	✓
Loki	✓	✓
HTTP	✓	✓ (Logging 5.7+)

Table 3. Authorization and Authentication
Feature	Fluentd	Vector
Elasticsearch certificates	✓	✓
Elasticsearch username / password	✓	✓
Cloudwatch keys	✓	✓
Cloudwatch STS	✓	✓
Kafka certificates	✓	✓
Kafka username / password	✓	✓
Kafka SASL	✓	✓
Loki bearer token	✓	✓

Table 4. Normalizations and Transformations
Feature	Fluentd	Vector
Viaq data model - app	✓	✓
Viaq data model - infra	✓	✓
Viaq data model - infra(journal)	✓	✓
Viaq data model - Linux audit	✓	✓
Viaq data model - kube-apiserver audit	✓	✓
Viaq data model - OpenShift API audit	✓	✓
Viaq data model - OVN	✓	✓
Loglevel Normalization	✓	✓
JSON parsing	✓	✓
Structured Index	✓	✓
Multiline error detection	✓	✓
Multicontainer / split indices	✓	✓
Flatten labels	✓	✓
CLF static labels	✓	✓

Table 5. Tuning
Feature	Fluentd	Vector
Fluentd readlinelimit	✓
Fluentd buffer	✓
- chunklimitsize	✓
- totallimitsize	✓
- overflowaction	✓
- flushthreadcount	✓
- flushmode	✓
- flushinterval	✓
- retrywait	✓
- retrytype	✓
- retrymaxinterval	✓
- retrytimeout	✓

Table 6. Visibility
Feature	Fluentd	Vector
Metrics	✓	✓
Dashboard	✓	✓
Alerts	✓

Table 7. Miscellaneous
Feature	Fluentd	Vector
Global proxy support	✓	✓
x86 support	✓	✓
ARM support	✓	✓
IBM Power support	✓	✓
IBM Z support	✓	✓
IPv6 support	✓	✓
Log event buffering	✓
Disconnected Cluster	✓	✓