Installing the Network Observability Operator

Installing Loki is a recommended prerequisite for using the Network Observability Operator. You can choose to use Network Observability without Loki, but there are some considerations for doing this, described in the previously linked section.

The Loki Operator integrates a gateway that implements multi-tenancy and authentication with Loki for data flow storage. The LokiStack resource manages Loki, which is a scalable, highly-available, multi-tenant log aggregation system, and a web proxy with OKD authentication. The LokiStack proxy uses OKD authentication to enforce multi-tenancy and facilitate the saving and indexing of data in Loki log stores.

The Loki Operator can also be used for Logging with the LokiStack. The Network Observability Operator requires a dedicated LokiStack separate from Logging.

Network Observability without Loki

You can use Network Observability without Loki by not performing the Loki installation steps and skipping directly to “Installing the Network Observability Operator”. If you only want to export flows to a Kafka consumer or IPFIX collector, or you only need dashboard metrics, then you do not need to install Loki or provide storage for Loki. Without Loki, there won’t be a Network Traffic panel under Observe, which means there is no overview charts, flow table, or topology. The following table compares available features with and without Loki:

Table 1. Comparison of feature availability with and without Loki
With LokiWithout Loki

Exporters

check solid

check solid

Flow-based metrics and dashboards

check solid

check solid

Traffic Flow Overview, Table and Topology views

check solid

x solid

Quick Filters

check solid

x solid

OKD console Network Traffic tab integration

check solid

x solid

Additional resources

Installing the Loki Operator

The Loki Operator versions 5.7+ are the supported Loki Operator versions for Network Observabilty; these versions provide the ability to create a LokiStack instance using the openshift-network tenant configuration mode and provide fully-automatic, in-cluster authentication and authorization support for Network Observability. There are several ways you can install Loki. One way is by using the OKD web console Operator Hub.

Prerequisites

  • Supported Log Store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)

  • OKD 4.10+

  • Linux Kernel 4.18+

Procedure

  1. In the OKD web console, click OperatorsOperatorHub.

  2. Choose Loki Operator from the list of available Operators, and click Install.

  3. Under Installation Mode, select All namespaces on the cluster.

Verification

  1. Verify that you installed the Loki Operator. Visit the OperatorsInstalled Operators page and look for Loki Operator.

  2. Verify that Loki Operator is listed with Status as Succeeded in all the projects.

To uninstall Loki, refer to the uninstallation process that corresponds with the method you used to install Loki. You might have remaining ClusterRoles and ClusterRoleBindings, data stored in object store, and persistent volume that must be removed.

Creating a secret for Loki storage

The Loki Operator supports a few log storage options, such as AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation. The following example shows how to create a secret for AWS S3 storage. The secret created in this example, loki-s3, is referenced in “Creating a LokiStack resource”. You can create this secret in the web console or CLI.

  1. Using the web console, navigate to the ProjectAll Projects dropdown and select Create Project. Name the project netobserv and click Create.

  2. Navigate to the Import icon, +, in the top right corner. Paste your YAML file into the editor.

    The following shows an example secret YAML file for S3 storage:

    1. apiVersion: v1
    2. kind: Secret
    3. metadata:
    4. name: loki-s3
    5. namespace: netobserv (1)
    6. stringData:
    7. access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK
    8. access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo=
    9. bucketnames: s3-bucket-name
    10. endpoint: https://s3.eu-central-1.amazonaws.com
    11. region: eu-central-1
    1The installation examples in this documentation use the same namespace, netobserv, across all components. You can optionally use a different namespace for the different components

Verification

  • Once you create the secret, you should see it listed under WorkloadsSecrets in the web console.

Additional resources

Creating a LokiStack custom resource

You can deploy a LokiStack using the web console or CLI to create a namespace, or new project.

Procedure

  1. Navigate to OperatorsInstalled Operators, viewing All projects from the Project dropdown.

  2. Look for Loki Operator. In the details, under Provided APIs, select LokiStack.

  3. Click Create LokiStack.

  4. Ensure the following fields are specified in either Form View or YAML view:

    1. apiVersion: loki.grafana.com/v1
    2. kind: LokiStack
    3. metadata:
    4. name: loki
    5. namespace: netobserv (1)
    6. spec:
    7. size: 1x.small
    8. storage:
    9. schemas:
    10. - version: v12
    11. effectiveDate: '2022-06-01'
    12. secret:
    13. name: loki-s3
    14. type: s3
    15. storageClassName: gp3 (2)
    16. tenants:
    17. mode: openshift-network
    1The installation examples in this documentation use the same namespace, netobserv, across all components. You can optionally use a different namespace.
    2Use a storage class name that is available on the cluster for ReadWriteOnce access mode. You can use oc get storageclasses to see what is available on your cluster.

    You must not reuse the same LokiStack that is used for cluster logging.

  5. Click Create.

Deployment Sizing

Sizing for Loki follows the format of N<x>._<size>_ where the value <N> is the number of instances and <size> specifies performance capabilities.

1x.extra-small is for demo purposes only, and is not supported.

Table 2. Loki Sizing
1x.extra-small1x.small1x.medium

Data transfer

Demo use only.

500GB/day

2TB/day

Queries per second (QPS)

Demo use only.

25-50 QPS at 200ms

25-75 QPS at 200ms

Replication factor

None

2

3

Total CPU requests

5 vCPUs

36 vCPUs

54 vCPUs

Total Memory requests

7.5Gi

63Gi

139Gi

Total Disk requests

150Gi

300Gi

450Gi

LokiStack ingestion limits and health alerts

The LokiStack instance comes with default settings according to the configured size. It is possible to override some of these settings, such as the ingestion and query limits. You might want to update them if you get Loki errors showing up in the Console plugin, or in flowlogs-pipeline logs. An automatic alert in the web console notifies you when these limits are reached.

Here is an example of configured limits:

  1. spec:
  2. limits:
  3. global:
  4. ingestion:
  5. ingestionBurstSize: 40
  6. ingestionRate: 20
  7. maxGlobalStreamsPerTenant: 25000
  8. queries:
  9. maxChunksPerQuery: 2000000
  10. maxEntriesLimitPerQuery: 10000
  11. maxQuerySeries: 3000

For more information about these settings, see the LokiStack API reference.

Configuring authorization and multi-tenancy

Define ClusterRole and ClusterRoleBinding. The netobserv-reader ClusterRole enables multi-tenancy and allows individual user access, or group access, to the flows stored in Loki. You can create a YAML file to define these roles.

Procedure

  1. Using the web console, click the Import icon, +.

  2. Drop your YAML file into the editor and click Create:

Example ClusterRole reader yaml

  1. apiVersion: rbac.authorization.k8s.io/v1
  2. kind: ClusterRole
  3. metadata:
  4. name: netobserv-reader (1)
  5. rules:
  6. - apiGroups:
  7. - 'loki.grafana.com'
  8. resources:
  9. - network
  10. resourceNames:
  11. - logs
  12. verbs:
  13. - 'get'
1This role can be used for multi-tenancy.

Example ClusterRole writer yaml

  1. apiVersion: rbac.authorization.k8s.io/v1
  2. kind: ClusterRole
  3. metadata:
  4. name: netobserv-writer
  5. rules:
  6. - apiGroups:
  7. - 'loki.grafana.com'
  8. resources:
  9. - network
  10. resourceNames:
  11. - logs
  12. verbs:
  13. - 'create'

Example ClusterRoleBinding yaml

  1. apiVersion: rbac.authorization.k8s.io/v1
  2. kind: ClusterRoleBinding
  3. metadata:
  4. name: netobserv-writer-flp
  5. roleRef:
  6. apiGroup: rbac.authorization.k8s.io
  7. kind: ClusterRole
  8. name: netobserv-writer
  9. subjects:
  10. - kind: ServiceAccount
  11. name: flowlogs-pipeline (1)
  12. namespace: netobserv
  13. - kind: ServiceAccount
  14. name: flowlogs-pipeline-transformer
  15. namespace: netobserv
1The flowlogs-pipeline writes to Loki. If you are using Kafka, this value is flowlogs-pipeline-transformer.

Enabling multi-tenancy in Network Observability

Multi-tenancy in the Network Observability Operator allows and restricts individual user access, or group access, to the flows stored in Loki. Access is enabled for project admins. Project admins who have limited access to some namespaces can access flows for only those namespaces.

Prerequisite

  • You have installed Loki Operator version 5.7

  • The FlowCollector spec.loki.authToken configuration must be set to FORWARD.

  • You must be logged in as a project administrator

Procedure

  1. Authorize reading permission to user1 by running the following command:

    1. $ oc adm policy add-cluster-role-to-user netobserv-reader user1

    Now, the data is restricted to only allowed user namespaces. For example, a user that has access to a single namespace can see all the flows internal to this namespace, as well as flows going from and to this namespace. Project admins have access to the Administrator perspective in the OKD console to access the Network Flows Traffic page.

Installing the Network Observability Operator

You can install the Network Observability Operator using the OKD web console Operator Hub. When you install the Operator, it provides the FlowCollector custom resource definition (CRD). You can set specifications in the web console when you create the FlowCollector.

The actual memory consumption of the Operator depends on your cluster size and the number of resources deployed. Memory consumption might need to be adjusted accordingly. For more information refer to “Network Observability controller manager pod runs out of memory” in the “Important Flow Collector configuration considerations” section.

Prerequisites

  • If you choose to use Loki, install the Loki Operator version 5.7+.

  • You must have cluster-admin privileges.

  • One of the following supported architectures is required: amd64, ppc64le, arm64, or s390x.

  • Any CPU supported by Red Hat Enterprise Linux (RHEL) 9.

  • Must be configured with OVN-Kubernetes or OpenShift SDN as the main network plugin, and optionally using secondary interfaces, such as Multus and SR-IOV.

This documentation assumes that your LokiStack instance name is loki. Using a different name requires additional configuration.

Procedure

  1. In the OKD web console, click OperatorsOperatorHub.

  2. Choose Network Observability Operator from the list of available Operators in the OperatorHub, and click Install.

  3. Select the checkbox Enable Operator recommended cluster monitoring on this Namespace.

  4. Navigate to OperatorsInstalled Operators. Under Provided APIs for Network Observability, select the Flow Collector link.

  5. Navigate to the Flow Collector tab, and click Create FlowCollector. Make the following selections in the form view:

    1. spec.agent.ebpf.Sampling: Specify a sampling size for flows. Lower sampling sizes will have higher impact on resource utilization. For more information, see the “FlowCollector API reference”, spec.agent.ebpf.

    2. If you are using Loki, set the following specifications:

      1. spec.loki.enable: Select the check box to enable storing flows in Loki.

      2. spec.loki.url: Since authentication is specified separately, this URL needs to be updated to [https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network](https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network). The first part of the URL, “loki”, must match the name of your LokiStack.

      3. spec.loki.authToken: Select the FORWARD value.

      4. spec.loki.statusUrl: Set this to [https://loki-query-frontend-http.netobserv.svc:3100/](https://loki-query-frontend-http.netobserv.svc:3100/). The first part of the URL, “loki”, must match the name of your LokiStack.

      5. spec.loki.tls.enable: Select the checkbox to enable TLS.

      6. spec.loki.statusTls: The enable value is false by default.

        For the first part of the certificate reference names: loki-gateway-ca-bundle, loki-ca-bundle, and loki-query-frontend-http,loki, must match the name of your LokiStack.

    3. Optional: If you are in a large-scale environment, consider configuring the FlowCollector with Kafka for forwarding data in a more resilient, scalable way. See “Configuring the Flow Collector resource with Kafka storage” in the “Important Flow Collector configuration considerations” section.

    4. Optional: Configure other optional settings before the next step of creating the FlowCollector. For example, if you choose not to use Loki, then you can configure exporting flows to Kafka or IPFIX. See “Export enriched network flow data to Kafka and IPFIX” and more in the “Important Flow Collector configuration considerations” section.

    5. Click Create.

Verification

To confirm this was successful, when you navigate to Observe you should see Network Traffic listed in the options.

In the absence of Application Traffic within the OKD cluster, default filters might show that there are “No results”, which results in no visual flow. Beside the filter selections, select Clear all filters to see the flow.

If you installed Loki using the Loki Operator, it is advised not to use querierUrl, as it can break the console access to Loki. If you installed Loki using another type of Loki installation, this does not apply.

Important Flow Collector configuration considerations

Once you create the FlowCollector instance, you can reconfigure it, but the pods are terminated and recreated again, which can be disruptive. Therefore, you can consider configuring the following options when creating the FlowCollector for the first time:

Additional resources

For more general information about Flow Collector specifications and the Network Observability Operator architecture and resource use, see the following resources:

Installing Kafka (optional)

The Kafka Operator is supported for large scale environments. Kafka provides high-throughput and low-latency data feeds for forwarding network flow data in a more resilient, scalable way. You can install the Kafka Operator as Red Hat AMQ Streams from the Operator Hub, just as the Loki Operator and Network Observability Operator were installed. Refer to “Configuring the FlowCollector resource with Kafka” to configure Kafka as a storage option.

To uninstall Kafka, refer to the uninstallation process that corresponds with the method you used to install.

Additional resources

Configuring the FlowCollector resource with Kafka.

Uninstalling the Network Observability Operator

You can uninstall the Network Observability Operator using the OKD web console Operator Hub, working in the OperatorsInstalled Operators area.

Procedure

  1. Remove the FlowCollector custom resource.

    1. Click Flow Collector, which is next to the Network Observability Operator in the Provided APIs column.

    2. Click the options menu kebab for the cluster and select Delete FlowCollector.

  2. Uninstall the Network Observability Operator.

    1. Navigate back to the OperatorsInstalled Operators area.

    2. Click the options menu kebab next to the Network Observability Operator and select Uninstall Operator.

    3. HomeProjects and select openshift-netobserv-operator

    4. Navigate to Actions and select Delete Project

  3. Remove the FlowCollector custom resource definition (CRD).

    1. Navigate to AdministrationCustomResourceDefinitions.

    2. Look for FlowCollector and click the options menu kebab.

    3. Select Delete CustomResourceDefinition.

      The Loki Operator and Kafka remain if they were installed and must be removed separately. Additionally, you might have remaining data stored in an object store, and a persistent volume that must be removed.