- Troubleshooting Network Observability
- Using the must-gather tool
- Configuring network traffic menu entry in the OKD console
- Flowlogs-Pipeline does not consume network flows after installing Kafka
- Failing to see network flows from both
br-int
andbr-ex
interfaces - Network Observability controller manager pod runs out of memory
- Resource troubleshooting
- LokiStack rate limit errors
Troubleshooting Network Observability
To assist in troubleshooting Network Observability issues, you can perform some troubleshooting actions.
Using the must-gather tool
You can use the must-gather tool to collect information about the Network Observability Operator resources and cluster-wide resources, such as pod logs, FlowCollector
, and webhook
configurations.
Procedure
Navigate to the directory where you want to store the must-gather data.
Run the following command to collect cluster-wide must-gather resources:
$ oc adm must-gather
--image-stream=openshift/must-gather \
--image=quay.io/netobserv/must-gather
Configuring network traffic menu entry in the OKD console
Manually configure the network traffic menu entry in the OKD console when the network traffic menu entry is not listed in Observe menu in the OKD console.
Prerequisites
- You have installed OKD version 4.10 or newer.
Procedure
Check if the
spec.consolePlugin.register
field is set totrue
by running the following command:$ oc -n netobserv get flowcollector cluster -o yaml
Example output
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
name: cluster
spec:
consolePlugin:
register: false
Optional: Add the
netobserv-plugin
plugin by manually editing the Console Operator config:$ oc edit console.operator.openshift.io cluster
Example output
...
spec:
plugins:
- netobserv-plugin
...
Optional: Set the
spec.consolePlugin.register
field totrue
by running the following command:$ oc -n netobserv edit flowcollector cluster -o yaml
Example output
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
name: cluster
spec:
consolePlugin:
register: true
Ensure the status of console pods is
running
by running the following command:$ oc get pods -n openshift-console -l app=console
Restart the console pods by running the following command:
$ oc delete pods -n openshift-console -l app=console
Clear your browser cache and history.
Check the status of Network Observability plugin pods by running the following command:
$ oc get pods -n netobserv -l app=netobserv-plugin
Example output
NAME READY STATUS RESTARTS AGE
netobserv-plugin-68c7bbb9bb-b69q6 1/1 Running 0 21s
Check the logs of the Network Observability plugin pods by running the following command:
$ oc logs -n netobserv -l app=netobserv-plugin
Example output
time="2022-12-13T12:06:49Z" level=info msg="Starting netobserv-console-plugin [build version: , build date: 2022-10-21 15:15] at log level info" module=main
time="2022-12-13T12:06:49Z" level=info msg="listening on https://:9001" module=server
Flowlogs-Pipeline does not consume network flows after installing Kafka
If you deployed the flow collector first with deploymentModel: KAFKA
and then deployed Kafka, the flow collector might not connect correctly to Kafka. Manually restart the flow-pipeline pods where Flowlogs-pipeline does not consume network flows from Kafka.
Procedure
Delete the flow-pipeline pods to restart them by running the following command:
$ oc delete pods -n netobserv -l app=flowlogs-pipeline-transformer
Failing to see network flows from both br-int
and br-ex
interfaces
br-ex` and br-int
are virtual bridge devices operated at OSI layer 2. The eBPF agent works at the IP and TCP levels, layers 3 and 4 respectively. You can expect that the eBPF agent captures the network traffic passing through br-ex
and br-int
, when the network traffic is processed by other interfaces such as physical host or virtual pod interfaces. If you restrict the eBPF agent network interfaces to attach only to br-ex
and br-int
, you do not see any network flow.
Manually remove the part in the interfaces
or excludeInterfaces
that restricts the network interfaces to br-int
and br-ex
.
Procedure
Remove the
interfaces: [ 'br-int', 'br-ex' ]
field. This allows the agent to fetch information from all the interfaces. Alternatively, you can specify the Layer-3 interface for example,eth0
. Run the following command:$ oc edit -n netobserv flowcollector.yaml -o yaml
Example output
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
name: cluster
spec:
agent:
type: EBPF
ebpf:
interfaces: [ 'br-int', 'br-ex' ] (1)
1 Specifies the network interfaces.
Network Observability controller manager pod runs out of memory
You can increase memory limits for the Network Observability operator by patching the Cluster Service Version (CSV), where Network Observability controller manager pod runs out of memory.
Procedure
Run the following command to patch the CSV:
$ oc -n netobserv patch csv network-observability-operator.v1.0.0 --type='json' -p='[{"op": "replace", "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", value: "1Gi"}]'
Example output
clusterserviceversion.operators.coreos.com/network-observability-operator.v1.0.0 patched
Run the following command to view the updated CSV:
$ oc -n netobserv get csv network-observability-operator.v1.0.0 -o jsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}'
1Gi
Resource troubleshooting
LokiStack rate limit errors
A rate-limit placed on the Loki tenant can result in potential temporary loss of data and a 429 error: Per stream rate limit exceeded (limit:xMB/sec) while attempting to ingest for stream
. You might consider having an alert set to notify you of this error. For more information, see “Creating Loki rate limit alerts for the NetObserv dashboard” in the Additional resources of this section.
You can update the LokiStack CRD with the perStreamRateLimit
and perStreamRateLimitBurst
specifications, as shown in the following procedure.
Procedure
Navigate to Operators → Installed Operators, viewing All projects from the Project dropdown.
Look for Loki Operator, and select the LokiStack tab.
Create or edit an existing LokiStack instance using the YAML view to add the
perStreamRateLimit
andperStreamRateLimitBurst
specifications:apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: loki
namespace: netobserv
spec:
limits:
global:
ingestion:
perStreamRateLimit: 6 (1)
perStreamRateLimitBurst: 30 (2)
tenants:
mode: openshift-network
managementState: Managed
1 The default value for perStreamRateLimit
is3
.2 The default value for perStreamRateLimitBurst
is15
.Click Save.
Verification
Once you update the perStreamRateLimit
and perStreamRateLimitBurst
specifications, the pods in your cluster restart and the 429 rate-limit error no longer occurs.