Network Observability Operator release notes
The Network Observability Operator enables administrators to observe and analyze network traffic flows for OKD clusters.
These release notes track the development of the Network Observability Operator in the OKD.
For an overview of the Network Observability Operator, see About Network Observability Operator.
Network Observability Operator 1.2.0
The following advisory is available for the Network Observability Operator 1.2.0:
Preparing for the next update
The subscription of an installed Operator specifies an update channel that tracks and receives updates for the Operator. Until the 1.2 release of the Network Observability Operator, the only channel available was v1.0.x
. The 1.2 release of the Network Observability Operator introduces the stable
update channel for tracking and receiving updates. You must switch your channel from v1.0.x
to stable
to receive future Operator updates. The v1.0.x
channel is deprecated and planned for removal in a following release.
New features and enhancements
Histogram in Traffic Flows view
- You can now choose to show a histogram bar chart of flows over time. The histogram enables you to visualize the history of flows without hitting the Loki query limit. For more information, see Using the histogram.
Conversation tracking
- You can now query flows by Log Type, which enables grouping network flows that are part of the same conversation. For more information, see Working with conversations.
Network Observability health alerts
- The Network Observability Operator now creates automatic alerts if the
flowlogs-pipeline
is dropping flows because of errors at the write stage or if the Loki ingestion rate limit has been reached. For more information, see Viewing health information.
Bug fixes
Previously, after changing the
namespace
value in the FlowCollector spec, eBPF Agent pods running in the previous namespace were not appropriately deleted. Now, the pods running in the previous namespace are appropriately deleted. (NETOBSERV-774)Previously, after changing the
caCert.name
value in the FlowCollector spec (such as in Loki section), FlowLogs-Pipeline pods and Console plug-in pods were not restarted, therefore they were unaware of the configuration change. Now, the pods are restarted, so they get the configuration change. (NETOBSERV-772)Previously, network flows between pods running on different nodes were sometimes not correctly identified as being duplicates because they are captured by different network interfaces. This resulted in over-estimated metrics displayed in the console plug-in. Now, flows are correctly identified as duplicates, and the console plug-in displays accurate metrics. (NETOBSERV-755)
The “reporter” option in the console plug-in is used to filter flows based on the observation point of either source node or destination node. Previously, this option mixed the flows regardless of the node observation point. This was due to network flows being incorrectly reported as Ingress or Egress at the node level. Now, the network flow direction reporting is correct. The “reporter” option filters for source observation point, or destination observation point, as expected. (NETOBSERV-696)
Previously, for agents configured to send flows directly to the processor as gRPC+protobuf requests, the submitted payload could be too large and is rejected by the processors’ GRPC server. This occurred under very-high-load scenarios and with only some configurations of the agent. The agent logged an error message, such as: grpc: received message larger than max. As a consequence, there was information loss about those flows. Now, the gRPC payload is split into several messages when the size exceeds a threshold. As a result, the server maintains connectivity. (NETOBSERV-617)
Known issue
- In the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate transition periodically affects the
flowlogs-pipeline
pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate transition. (NETOBSERV-980)
Notable technical changes
- Previously, you could install the Network Observability Operator using a custom namespace. This release introduces the
conversion webhook
which changes theClusterServiceVersion
. Because of this change, all the available namespaces are no longer listed. Additionally, to enable Operator metrics collection, namespaces that are shared with other Operators, like theopenshift-operators
namespace, cannot be used. Now, the Operator must be installed in theopenshift-netobserv-operator
namespace. You cannot automatically upgrade to the new Operator version if you previously installed the Network Observability Operator using a custom namespace. If you previously installed the Operator using a custom namespace, you must delete the instance of the Operator that was installed and re-install your operator in theopenshift-netobserv-operator
namespace. It is important to note that custom namespaces, such as the commonly usednetobserv
namespace, are still possible for theFlowCollector
, Loki, Kafka, and other plug-ins. (NETOBSERV-907)(NETOBSERV-956)
Network Observability Operator 1.1.0
The following advisory is available for the Network Observability Operator 1.1.0:
The Network Observability Operator is now stable and the release channel is upgraded to v1.1.0
.
Bug fix
- Previously, unless the Loki
authToken
configuration was set toFORWARD
mode, authentication was no longer enforced, allowing any user who could connect to the OKD console in an OKD cluster to retrieve flows without authentication. Now, regardless of the LokiauthToken
mode, only cluster administrators can retrieve flows. (BZ#2169468)