Loki

Loki

About the LokiStack

In logging subsystem documentation, LokiStack refers to the logging subsystem supported combination of Loki, and web proxy with OKD authentication integration. LokiStack’s proxy uses OKD authentication to enforce multi-tenancy. Loki refers to the log store as either the individual component or an external store.

Loki is a horizontally scalable, highly available, multi-tenant log aggregation system currently offered as an alternative to Elasticsearch as a log store for the logging subsystem. Elasticsearch indexes incoming log records completely during ingestion. Loki only indexes a few fixed labels during ingestion, and defers more complex parsing until after the logs have been stored. This means Loki can collect logs more quickly. As with Elasticsearch, you can query Loki using JSON paths or regular expressions.

Deployment Sizing

Sizing for Loki follows the format of N<x>._<size>_ where the value <N> is number of instances and <size> specifies performance capabilities.

1x.extra-small is for demo purposes only, and is not supported.

Table 1. Loki Sizing
	1x.extra-small	1x.small	1x.medium
Data transfer	Demo use only.	500GB/day	2TB/day
Queries per second (QPS)	Demo use only.	25-50 QPS at 200ms	25-75 QPS at 200ms
Replication factor	None	2	3
Total CPU requests	5 vCPUs	36 vCPUs	54 vCPUs
Total Memory requests	7.5Gi	63Gi	139Gi
Total Disk requests	150Gi	300Gi	450Gi

Supported API Custom Resource Definitions

LokiStack development is ongoing, not all APIs are supported currently supported.

CustomResourceDefinition (CRD)	ApiVersion	Support state
LokiStack	lokistack.loki.grafana.com/v1	Supported in 5.5
RulerConfig	rulerconfig.loki.grafana/v1beta1	Technology Preview
AlertingRule	alertingrule.loki.grafana/v1beta1	Technology Preview
RecordingRule	recordingrule.loki.grafana/v1beta1	Technology Preview

Usage of RulerConfig, AlertingRule and RecordingRule custom resource definitions (CRDs). is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Deploying the LokiStack

You can use the OKD web console to deploy the LokiStack.

Prerequisites

Logging subsystem for Red Hat OpenShift Operator 5.5 and later
Supported Log Store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)

Procedure

Install the Loki Operator Operator:
1. In the OKD web console, click Operators → OperatorHub.
2. Choose Loki Operator from the list of available Operators, and click Install.
3. Under Installation Mode, select All namespaces on the cluster.
4. Under Installed Namespace, select openshift-operators-redhat.
  
  You must specify the openshift-operators-redhat namespace. The openshift-operators namespace might contain Community Operators, which are untrusted and might publish a metric with the same name as an OKD metric, which would cause conflicts.
5. Select Enable operator recommended cluster monitoring on this namespace.
  
  This option sets the openshift.io/cluster-monitoring: "true" label in the Namespace object. You must select this option to ensure that cluster monitoring scrapes the openshift-operators-redhat namespace.
6. Select an Approval Strategy.
  - The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
  - The Manual strategy requires a user with appropriate credentials to approve the Operator update.
7. Click Install.
8. Verify that you installed the Loki Operator. Visit the Operators → Installed Operators page and look for Loki Operator.
9. Ensure that Loki Operator is listed with Status as Succeeded in all the projects.

Create a Secret YAML file that uses the access_key_id and access_key_secret fields to specify your AWS credentials and bucketnames, endpoint and region to define the object storage location. For example:

apiVersion: v1
kind: Secret
metadata:
  name: logging-loki-s3
  namespace: openshift-logging
stringData:
  access_key_id: AKIAIOSFODNN7EXAMPLE
  access_key_secret: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  bucketnames: s3-bucket-name
  endpoint: https://s3.eu-central-1.amazonaws.com
  region: eu-central-1

Create the LokiStack custom resource:

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  size: 1x.small
  storage:
    schemas:
    - version: v12
      effectiveDate: "2022-06-01"
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi (1)
  tenants:
    mode: openshift-logging

1	Or `gp2-csi`. Apply the configuration: `oc apply -f logging-loki.yaml`

Create or edit a ClusterLogging CR:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
spec:
  managementState: Managed
  logStore:
    type: lokistack
    lokistack:
      name: logging-loki
  collection:
    type: vector

Apply the configuration:
```
oc apply -f cr-lokistack.yaml
```

Enable the RedHat OpenShift Logging Console Plugin:
1. In the OKD web console, click Operators → Installed Operators.
2. Select the RedHat OpenShift Logging Operator.
3. Under Console plugin, click Disabled.
4. Select Enable and then Save. This change will restart the ‘openshift-console’ pods.
5. After the pods restart, you will receive a notification that a web console update is available, prompting you to refresh.
6. After refreshing the web console, click Observe from the left main menu. A new option for Logs will be available to you.

This plugin is only available on OKD 4.10 and later.

Enabling stream-based retention with Loki

With Logging version 5.6 and higher, you can configure retention policies based on log streams. Rules for these may be set globally, per tenant, or both. If you configure both, tenant rules apply before global rules.

To enable stream-based retention, create or edit the LokiStack custom resource (CR):

oc create -f <file-name>.yaml

You can refer to the examples below to configure your LokiStack CR.

Example global stream-based retention

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global: (1)
      retention: (2)
        days: 20
        streams:
        - days: 4
          priority: 1
          selector: '{kubernetes_namespace_name=~"test.+"}' (3)
        - days: 1
          priority: 1
          selector: '{log_type="infrastructure"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v11
    secret:
      name: logging-loki-s3
      type: aws
  storageClassName: standard
  tenants:
    mode: openshift-logging

1	Sets retention policy for all log streams. Note: This field does not impact the retention period for stored logs in object storage.
2	Retention is enabled in the cluster when this block is added to the CR.
3	Contains the LogQL query used to define the log stream.

Example per-tenant stream-based retention

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      retention:
        days: 20
    tenants: (1)
      application:
        retention:
          days: 1
          streams:
            - days: 4
              selector: '{kubernetes_namespace_name=~"test.+"}' (2)
      infrastructure:
        retention:
          days: 5
          streams:
            - days: 1
              selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v11
    secret:
      name: logging-loki-s3
      type: aws
  storageClassName: standard
  tenants:
    mode: openshift-logging

1	Sets retention policy by tenant. Valid tenant types are `application`, `audit`, and `infrastructure`.
2	Contains the LogQL query used to define the log stream. Then apply your configuration:

oc apply -f <file-name>.yaml

This is not for managing the retention for stored logs. Global retention periods for stored logs to a supported maximum of 30 days is configured with your object storage.

Forwarding logs to LokiStack

To configure log forwarding to the LokiStack gateway, you must create a ClusterLogging custom resource (CR).

Prerequisites

Logging subsystem for Red Hat OpenShift: 5.5 and later
Loki Operator Operator

Procedure

Create or edit a YAML file that defines the ClusterLogging custom resource (CR):

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
spec:
  managementState: Managed
  logStore:
    type: lokistack
    lokistack:
      name: logging-loki
  collection:
    type: vector

Troubleshooting Loki “entry out of order” errors

If your Fluentd forwards a large block of messages to a Loki logging system that exceeds the rate limit, Loki to generates “entry out of order” errors. To fix this issue, you update some values in the Loki server configuration file, loki.yaml.

loki.yaml is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.

Conditions

The ClusterLogForwarder custom resource is configured to forward logs to Loki.

Your system sends a block of messages that is larger than 2 MB to Loki, such as:

"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
.......
......
......
......
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}

When you enter oc logs -c fluentd, the Fluentd logs in your OpenShift Logging cluster show the following messages:

429 Too Many Requests Ingestion rate limit exceeded (limit: 8388608 bytes/sec) while attempting to ingest '2140' lines totaling '3285284' bytes
429 Too Many Requests Ingestion rate limit exceeded' or '500 Internal Server Error rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5277702 vs. 4194304)'

When you open the logs on the Loki server, they display entry out of order messages like these:

,\nentry with timestamp 2021-08-18 05:58:55.061936 +0000 UTC ignored, reason: 'entry out of order' for stream:
{fluentd_thread=\"flush_thread_0\", log_type=\"audit\"},\nentry with timestamp 2021-08-18 06:01:18.290229 +0000 UTC ignored, reason: 'entry out of order' for stream: {fluentd_thread="flush_thread_0", log_type="audit"}

Procedure

Update the following fields in the loki.yaml configuration file on the Loki server with the values shown here:
- grpc_server_max_recv_msg_size: 8388608
- chunk_target_size: 8388608
- ingestion_rate_mb: 8
- ingestion_burst_size_mb: 16
Apply the changes in loki.yaml to the Loki server.

Example loki.yaml file

auth_enabled: false
server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  grpc_server_max_recv_msg_size: 8388608
ingester:
  wal:
    enabled: true
    dir: /tmp/wal
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed
  chunk_target_size: 8388608
  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  max_transfer_retries: 0     # Chunk transfers disabled
schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h
storage_config:
  boltdb_shipper:
    active_index_directory: /tmp/loki/boltdb-shipper-active
    cache_location: /tmp/loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: filesystem
  filesystem:
    directory: /tmp/loki/chunks
compactor:
  working_directory: /tmp/loki/boltdb-shipper-compactor
  shared_store: filesystem
limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 12h
  ingestion_rate_mb: 8
  ingestion_burst_size_mb: 16
chunk_store_config:
  max_look_back_period: 0s
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s
ruler:
  storage:
    type: local
    local:
      directory: /tmp/loki/rules
  rule_path: /tmp/loki/rules-temp
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true

Additional resources

Configuring Loki

Logging with the LokiStack

Loki

About the LokiStack

Deployment Sizing

Supported API Custom Resource Definitions

Deploying the LokiStack

Enabling stream-based retention with Loki

Forwarding logs to LokiStack

Troubleshooting Loki “entry out of order” errors

Additional Resources