Release notes for Red Hat OpenShift Logging 5.3

Release notes for Red Hat OpenShift Logging 5.3

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.

Supported Versions

Table 1. OKD version support for Red Hat OpenShift Logging (RHOL)
OpenShift	4.7	4.8	4.9
RHOL 5.0	X	X
RHOL 5.1	X	X
RHOL 5.2	X	X	X
RHOL 5.3			X

OpenShift Logging 5.3.0

The following advisories are available for OpenShift Logging 5.3.x:

New features and enhancements

With this update, authorization requirements for Log Forwarding have been relaxed. Outputs may now be configured with SASL, username/password, or TLS.

Bug fixes

Before this update, application logs were not correctly configured to forward to the proper Cloudwatch stream with multi-line error detection enabled. (LOG-1939)
Before this update, a name change of the deployed collector in the 5.3 release caused the alert ‘fluentnodedown’ to generate. (LOG-1918)
Before this update, a regression introduced in a prior release configuration caused the collector to flush its buffered messages before shutdown, creating a delay the termination and restart of collector Pods. With this update, fluentd no longer flushes buffers at shutdown, resolving the issue. (LOG-1735)
Before this update, a regression introduced in a prior release intentionally disabled JSON message parsing. With this update, a log entry’s “level” value is set based on: a parsed JSON message that has a “level” field or by applying a regex against the message field to extract a match. (LOG-1199)

Known issues

If you forward logs to an external Elasticsearch server and then change a configured value in the pipeline secret, such as the username and password, the Fluentd forwarder loads the new secret but uses the old value to connect to an external Elasticsearch server. This issue happens because the Red Hat OpenShift Logging Operator does not currently monitor secrets for content changes. (LOG-1652)

As a workaround, if you change the secret, you can force the Fluentd pods to redeploy by entering:
```
$ oc delete pod -l component=collector
```

Deprecated and removed features

Some features available in previous releases have been deprecated or removed.

Deprecated functionality is still included in OpenShift Logging and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments.

Forwarding logs using the legacy Fluentd and legacy syslog methods have been removed

In OpenShift Logging 5.3, the legacy methods of forwarding logs to Syslog and Fluentd are removed. Bug fixes and support are provided through the end of the OpenShift Logging 5.2 life cycle. After which, no new feature enhancements are made.

Instead, use the following non-legacy methods:

Configuration mechanisms for legacy forwarding methods have been removed

In OpenShift Logging 5.3, the legacy configuration mechanism for log forwarding is removed: You cannot forward logs using the legacy Fluentd method and legacy Syslog method. Use the standard log forwarding methods instead.

OpenShift Logging 5.2.0

The following advisories are available for OpenShift Logging 5.2.x:

New features and enhancements

With this update, you can forward log data to Amazon CloudWatch, which provides application and infrastructure monitoring. For more information, see Forwarding logs to Amazon CloudWatch. (LOG-1173)
With this update, you can forward log data to Loki, a horizontally scalable, highly available, multi-tenant log aggregation system. For more information, see Forwarding logs to Loki. (LOG-684)
With this update, if you use the Fluentd forward protocol to forward log data over a TLS-encrypted connection, now you can use a password-encrypted private key file and specify the passphrase in the Cluster Log Forwarder configuration. For more information, see Forwarding logs using the Fluentd forward protocol. (LOG-1525)
This enhancement enables you to use a username and password to authenticate a log forwarding connection to an external Elasticsearch instance. For example, if you cannot use mutual TLS (mTLS) because a third-party operates the Elasticsearch instance, you can use HTTP or HTTPS and set a secret that contains the username and password. For more information, see Forwarding logs to an external Elasticsearch instance. (LOG-1022)
With this update, you can collect OVN network policy audit logs for forwarding to a logging server. For more information, see Collecting OVN network policy audit logs. (LOG-1526)
By default, the data model introduced in OKD 4.5 gave logs from different namespaces a single index in common. This change made it harder to see which namespaces produced the most logs.

The current release adds namespace metrics to the Logging dashboard in the OKD console. With these metrics, you can see which namespaces produce logs and how many logs each namespace produces for a given timestamp.

To see these metrics, open the Administrator perspective in the OKD web console, and navigate to Observe → Dashboards → Logging/Elasticsearch. (LOG-1680)
The current release, OpenShift Logging 5.2, enables two new metrics: For a given timestamp or duration, you can see the total logs produced or logged by individual containers, and the total logs collected by the collector. These metrics are labeled by namespace, pod, and container name so that you can see how many logs each namespace and pod collects and produces. (LOG-1213)

Bug fixes

Before this update, when the OpenShift Elasticsearch Operator created index management cronjobs, it added the POLICY_MAPPING environment variable twice, which caused the apiserver to report the duplication. This update fixes the issue so that the POLICY_MAPPING environment variable is set only once per cronjob, and there is no duplication for the apiserver to report. (LOG-1130)
Before this update, suspending an Elasticsearch cluster to zero nodes did not suspend the index-management cronjobs, which put these cronjobs into maximum backoff. Then, after unsuspending the Elasticsearch cluster, these cronjobs stayed halted due to maximum backoff reached. This update resolves the issue by suspending the cronjobs and the cluster. (LOG-1268)
Before this update, in the Logging dashboard in the OKD console, the list of top 10 log-producing containers was missing the “chart namespace” label and provided the incorrect metric name, fluentd_input_status_total_bytes_logged. With this update, the chart shows the namespace label and the correct metric name, log_logged_bytes_total. (LOG-1271)
Before this update, if an index management cronjob terminated with an error, it did not report the error exit code: instead, its job status was “complete.” This update resolves the issue by reporting the error exit codes of index management cronjobs that terminate with errors. (LOG-1273)
The priorityclasses.v1beta1.scheduling.k8s.io was removed in 1.22 and replaced by priorityclasses.v1.scheduling.k8s.io (v1beta1 was replaced by v1). Before this update, APIRemovedInNextReleaseInUse alerts were generated for priorityclasses because v1beta1 was still present . This update resolves the issue by replacing v1beta1 with v1. The alert is no longer generated. (LOG-1385)
Previously, the OpenShift Elasticsearch Operator and Red Hat OpenShift Logging Operator did not have the annotation that was required for them to appear in the OKD web console list of operators that can run in a disconnected environment. This update adds the operators.openshift.io/infrastructure-features: '["Disconnected"]' annotation to these two operators so that they appear in the list of operators that run in disconnected environments. (LOG-1420)
Before this update, Red Hat OpenShift Logging Operator pods were scheduled on CPU cores that were reserved for customer workloads on performance-optimized single-node clusters. With this update, cluster logging operator pods are scheduled on the correct CPU cores. (LOG-1440)
Before this update, some log entries had unrecognized UTF-8 bytes, which caused Elasticsearch to reject the messages and block the entire buffered payload. With this update, rejected payloads drop the invalid log entries and resubmit the remaining entries to resolve the issue. (LOG-1499)
Before this update, the kibana-proxy Pod sometimes entered the CrashLoopBackoff state and logged the following message Invalid configuration: cookie_secret must be 16, 24, or 32 bytes to create an AES cipher when pass_access_token == true or cookie_refresh != 0, but is 29 bytes. The exact actual number of bytes could vary. With this update, the generation of the Kibana session secret has been corrected, and the kibana-proxy Pod no longer enters a CrashLoopBackoff state due to this error. (LOG-1446)
Before this update, the AWS CloudWatch Fluentd plug-in logged its AWS API calls to the Fluentd log at all log levels, consuming additional OKD node resources. With this update, the AWS CloudWatch Fluentd plug-in logs AWS API calls only at the “debug” and “trace” log levels. This way, at the default “warn” log level, Fluentd does not consume extra node resources. (LOG-1071)
Before this update, the Elasticsearch OpenDistro security plug-in caused user index migrations to fail. This update resolves the issue by providing a newer version of the plug-in. Now, index migrations proceed without errors. (LOG-1276)
Before this update, in the Logging dashboard in the OKD console, the list of top 10 log-producing containers lacked data points. This update resolves the issue, and the dashboard displays all data points. (LOG-1353)
Before this update, if you were tuning the performance of the Fluentd log forwarder by adjusting the chunkLimitSize and totalLimitSize values, the Setting queued_chunks_limit_size for each buffer to message reported values that were too low. The current update fixes this issue so that this message reports the correct values. (LOG-1411)
Before this update, the Kibana OpenDistro security plug-in caused user index migrations to fail. This update resolves the issue by providing a newer version of the plug-in. Now, index migrations proceed without errors. (LOG-1558)
Before this update, using a namespace input filter prevented logs in that namespace from appearing in other inputs. With this update, logs are sent to all inputs that can accept them. (LOG-1570)
Before this update, a missing license file for the viaq/logerr dependency caused license scanners to abort without success. With this update, the viaq/logerr dependency is licensed under Apache 2.0 and the license scanners run successfully. (LOG-1590)
Before this update, an incorrect brew tag for curator5 within the elasticsearch-operator-bundle build pipeline caused the pull of an image pinned to a dummy SHA1. With this update, the build pipeline uses the logging-curator5-rhel8 reference for curator5, enabling index management cronjobs to pull the correct image from registry.redhat.io. (LOG-1624)
Before this update, an issue with the ServiceAccount permissions caused errors such as no permissions for [indices:admin/aliases/get]. With this update, a permission fix resolves the issue. (LOG-1657)
Before this update, the Custom Resource Definition (CRD) for the Red Hat OpenShift Logging Operator was missing the Loki output type, which caused the admission controller to reject the ClusterLogForwarder custom resource object. With this update, the CRD includes Loki as an output type so that administrators can configure ClusterLogForwarder to send logs to a Loki server. (LOG-1683)
Before this update, OpenShift Elasticsearch Operator reconciliation of the ServiceAccounts overwrote third-party-owned fields that contained secrets. This issue caused memory and CPU spikes due to frequent recreation of secrets. This update resolves the issue. Now, the OpenShift Elasticsearch Operator does not overwrite third-party-owned fields. (LOG-1714)
Before this update, in the ClusterLogging custom resource (CR) definition, if you specified a flush_interval value but did not set flush_mode to interval, the Red Hat OpenShift Logging Operator generated a Fluentd configuration. However, the Fluentd collector generated an error at runtime. With this update, the Red Hat OpenShift Logging Operator validates the ClusterLogging CR definition and only generates the Fluentd configuration if both fields are specified. (LOG-1723)

Known issues

If you forward logs to an external Elasticsearch server and then change a configured value in the pipeline secret, such as the username and password, the Fluentd forwarder loads the new secret but uses the old value to connect to an external Elasticsearch server. This issue happens because the Red Hat OpenShift Logging Operator does not currently monitor secrets for content changes. (LOG-1652)

As a workaround, if you change the secret, you can force the Fluentd pods to redeploy by entering:
```
$ oc delete pod -l component=collector
```

Deprecated and removed features

Some features available in previous releases have been deprecated or removed.

Forwarding logs using the legacy Fluentd and legacy syslog methods have been deprecated

From OKD 4.6 to the present, forwarding logs by using the following legacy methods have been deprecated and will be removed in a future release:

Forwarding logs using the legacy Fluentd method
Forwarding logs using the legacy syslog method

Instead, use the following non-legacy methods:

OpenShift Logging 5.1.0

The following advisories are available for OpenShift Logging 5.1.x:

New features and enhancements

OpenShift Logging 5.1 now supports OKD 4.7 and later running on:

IBM Power Systems
IBM Z and LinuxONE

This release adds improvements related to the following components and concepts.

As a cluster administrator, you can use Kubernetes pod labels to gather log data from an application and send it to a specific log store. You can gather log data by configuring the inputs[].application.selector.matchLabels element in the ClusterLogForwarder custom resource (CR) YAML file. You can also filter the gathered log data by namespace. (LOG-883)
This release adds the following new ElasticsearchNodeDiskWatermarkReached warnings to the OpenShift Elasticsearch Operator (EO):
- Elasticsearch Node Disk Low Watermark Reached
- Elasticsearch Node Disk High Watermark Reached
- Elasticsearch Node Disk Flood Watermark Reached
The alert applies the past several warnings when it predicts that an Elasticsearch node will reach the Disk Low Watermark, Disk High Watermark, or Disk Flood Stage Watermark thresholds in the next 6 hours. This warning period gives you time to respond before the node reaches the disk watermark thresholds. The warning messages also provide links to the troubleshooting steps, which you can follow to help mitigate the issue. The EO applies the past several hours of disk space data to a linear model to generate these warnings. (LOG-1100)
JSON logs can now be forwarded as JSON objects, rather than quoted strings, to either Red Hat’s managed Elasticsearch cluster or any of the other supported third-party systems. Additionally, you can now query individual fields from a JSON log message inside Kibana which increases the discoverability of specific logs. (LOG-785, LOG-1148)

Deprecated and removed features

Some features available in previous releases have been deprecated or removed.

Elasticsearch Curator has been removed

With this update, the Elasticsearch Curator has been removed and is no longer supported. Elasticsearch Curator helped you curate or manage your indices on OKD 4.4 and earlier. Instead of using Elasticsearch Curator, configure the log retention time.

Forwarding logs using the legacy Fluentd and legacy syslog methods have been deprecated

From OKD version 4.6 to the present, forwarding logs by using the legacy Fluentd and legacy syslog methods have been deprecated and will be removed in a future release. Use the standard non-legacy methods instead.

Bug fixes

Before this update, the ClusterLogForwarder CR did not show the input[].selector element after it had been created. With this update, when you specify a selector in the ClusterLogForwarder CR, it remains. Fixing this bug was necessary for LOG-883, which enables using pod label selectors to forward application log data. (LOG-1338)
Before this update, an update in the cluster service version (CSV) accidentally introduced resources and limits for the OpenShift Elasticsearch Operator container. Under specific conditions, this caused an out-of-memory condition that terminated the Elasticsearch Operator pod. This update fixes the issue by removing the CSV resources and limits for the Operator container. The Operator gets scheduled without issues. (LOG-1254)
Before this update, forwarding logs to Kafka using chained certificates failed with the following error message:

state=error: certificate verify failed (unable to get local issuer certificate)

Logs could not be forwarded to a Kafka broker with a certificate signed by an intermediate CA. This happened because the Fluentd Kafka plug-in could only handle a single CA certificate supplied in the ca-bundle.crt entry of the corresponding secret. This update fixes the issue by enabling the Fluentd Kafka plug-in to handle multiple CA certificates supplied in the ca-bundle.crt entry of the corresponding secret. Now, logs can be forwarded to a Kafka broker with a certificate signed by an intermediate CA. (LOG-1218, LOG-1216)
Before this update, while under load, Elasticsearch responded to some requests with an HTTP 500 error, even though there was nothing wrong with the cluster. Retrying the request was successful. This update fixes the issue by updating the index management cron jobs to be more resilient when they encounter temporary HTTP 500 errors. The updated index management cron jobs will first retry a request multiple times before failing. (LOG-1215)
Before this update, if you did not set the .proxy value in the cluster installation configuration, and then configured a global proxy on the installed cluster, a bug prevented Fluentd from forwarding logs to Elasticsearch. To work around this issue, in the proxy or cluster configuration, set the no_proxy value to .svc.cluster.local so it skips internal traffic. This update fixes the proxy configuration issue. If you configure the global proxy after installing an OKD cluster, Fluentd forwards logs to Elasticsearch. (LOG-1187, BZ#1915448)
Before this update, the logging collector created more socket connections than necessary. With this update, the logging collector reuses the existing socket connection to send logs. (LOG-1186)

Before this update, if a cluster administrator tried to add or remove storage from an Elasticsearch cluster, the OpenShift Elasticsearch Operator (EO) incorrectly tried to upgrade the Elasticsearch cluster, displaying scheduledUpgrade: "True", shardAllocationEnabled: primaries, and change the volumes. With this update, the EO does not try to upgrade the Elasticsearch cluster.

The EO status displays the following new status information to indicate when you have tried to make an unsupported change to the Elasticsearch storage that it has ignored:

StorageStructureChangeIgnored when you try to change between using ephemeral and persistent storage structures.
StorageClassNameChangeIgnored when you try to change the storage class name.
StorageSizeChangeIgnored when you try to change the storage size.

If you configure the ClusterLogging custom resource (CR) to switch from ephemeral to persistent storage, the EO creates a persistent volume claim (PVC) but does not create a persistent volume (PV). To clear the StorageStructureChangeIgnored status, you must revert the change to the ClusterLogging CR and delete the persistent volume claim (PVC).

(LOG-1351)

Before this update, if you redeployed a full Elasticsearch cluster, it got stuck in an unhealthy state, with one non-data node running and all other data nodes shut down. This happened because new certificates prevented the Elasticsearch Operator from scaling down the non-data nodes of the Elasticsearch cluster. With this update, Elasticsearch Operator can scale all the data and non-data nodes down and then back up again, so they load the new certificates. The Elasticsearch Operator can reach the new nodes after they load the new certificates. (LOG-1536)