Loki

About the LokiStack

In logging subsystem documentation, LokiStack refers to the logging subsystem supported combination of Loki, and web proxy with OKD authentication integration. LokiStack’s proxy uses OKD authentication to enforce multi-tenancy. Loki refers to the log store as either the individual component or an external store.

Loki is a horizontally scalable, highly available, multi-tenant log aggregation system currently offered as an alternative to Elasticsearch as a log store for the logging subsystem. Elasticsearch indexes incoming log records completely during ingestion. Loki only indexes a few fixed labels during ingestion, and defers more complex parsing until after the logs have been stored. This means Loki can collect logs more quickly. As with Elasticsearch, you can query Loki using JSON paths or regular expressions.

Deployment Sizing

Sizing for Loki follows the format of N<x>.*<size>* where the value <N> is number of instances and <size> specifies performance capabilities.

Table 1. Loki Sizing
1x.extra-small1x.small1x.medium

Data transfer

Demo use only.

500GB/day

2TB/day

Queries per second (QPS)

Demo use only.

25-50 QPS at 200ms

25-75 QPS at 200ms

Replication factor

None

2

3

Total CPU requests

5 vCPUs

36 vCPUs

54 vCPUs

Total Memory requests

7.5Gi

63Gi

139Gi

Total Disk requests

150Gi

300Gi

450Gi

Supported API Custom Resource Definitions

LokiStack development is ongoing, not all APIs are supported currently supported.

CustomResourceDefinition (CRD)ApiVersionSupport state

LokiStack

lokistack.loki.grafana.com/v1

Supported in 5.5

RulerConfig

rulerconfig.loki.grafana/v1beta1

Technology Preview

AlertingRule

alertingrule.loki.grafana/v1beta1

Technology Preview

RecordingRule

recordingrule.loki.grafana/v1beta1

Technology Preview

Usage of RulerConfig, AlertingRule and RecordingRule custom resource definitions (CRDs). is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Deploying the LokiStack

You can use the OKD web console to deploy the LokiStack.

Prerequisites

  • Logging subsystem for Red Hat OpenShift Operator 5.5 and later

  • AWS S3 bucket for log storage

Procedure

  1. Install the LokiOperator Operator:

    1. In the OKD web console, click OperatorsOperatorHub.

    2. Choose LokiOperator from the list of available Operators, and click Install.

    3. Under Installation Mode, select All namespaces on the cluster.

    4. Under Installed Namespace, select openshift-operators-redhat.

      You must specify the openshift-operators-redhat namespace. The openshift-operators namespace might contain Community Operators, which are untrusted and might publish a metric with the same name as an OKD metric, which would cause conflicts.

    5. Select Enable operator recommended cluster monitoring on this namespace.

      This option sets the openshift.io/cluster-monitoring: "true" label in the Namespace object. You must select this option to ensure that cluster monitoring scrapes the openshift-operators-redhat namespace.

    6. Select an Approval Strategy.

      • The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.

      • The Manual strategy requires a user with appropriate credentials to approve the Operator update.

    7. Click Install.

    8. Verify that you installed the LokiOperator. Visit the OperatorsInstalled Operators page and look for LokiOperator.

    9. Ensure that LokiOperator is listed with Status as Succeeded in all the projects.

  2. Create a Secret YAML file that uses the access_key_id and access_key_secret fields to specify your base64-encoded AWS credentials. For example:

    1. apiVersion: v1
    2. kind: Secret
    3. metadata:
    4. name: logging-loki-s3
    5. namespace: openshift-logging
    6. data:
    7. access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK
    8. access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo=
    9. bucketnames: czMtYnVja2V0LW5hbWU= (1)
    10. endpoint: aHR0cHM6Ly9zMy5ldS1jZW50cmFsLTEuYW1hem9uYXdzLmNvbQ== (2)
    1Base64-encoded bucket name
    2Base64-encoded S3 API endpoint
  3. Create the LokiStack custom resource:

    1. apiVersion: loki.grafana.com/v1
    2. kind: LokiStack
    3. metadata:
    4. name: logging-loki
    5. namespace: openshift-logging
    6. spec:
    7. size: 1x.small
    8. storage:
    9. schemas:
    10. - version: v12
    11. effectiveDate: 2022-06-01
    12. secret:
    13. name: logging-loki-s3
    14. type: s3
    15. storageClassName: gp2
    16. tenants:
    17. mode: openshift-logging
    1. Apply the configuration:

      1. oc apply -f logging-loki.yaml
  4. Create or edit a Create or edit a ClusterLogging CR:

    1. apiVersion: logging.openshift.io/v1
    2. kind: ClusterLogging
    3. metadata:
    4. name: cr-lokistack
    5. namespace: openshift-logging
    6. spec:
    7. logStore:
    8. lokistack:
    9. name: logging-loki
    1. Apply the configuration:

      1. oc apply -f cr-lokistack.yaml
  5. Enable the RedHat OpenShift Logging Console Plugin:

    1. In the OKD web console, click OperatorsInstalled Operators.

    2. Select the RedHat OpenShift Logging Operator.

    3. Under Console plugin, click Disabled.

    4. Select Enable and then Save. This change will restart the ‘openshift-console’ pods.

    5. After the pods restart, you will receive a notification that a web console update is available, prompting you to refresh.

    6. After refreshing the web console, click Observe from the left main menu. A new option for Logs will be available to you.

This plugin is only available on OKD 4.10 and later. OKD

Forwarding logs to LokiStack

To configure log forwarding to the LokiStack gateway, you must create a ClusterLogging custom resource (CR).

Prerequisites

  • Logging subsystem for Red Hat OpenShift: 5.5 and later

  • LokiOperator Operator

Procedure

  1. Create or edit a YAML file that defines the ClusterLogging custom resource (CR):
  1. apiVersion: logging.openshift.io/v1
  2. kind: ClusterLogging
  3. metadata:
  4. name: instance
  5. namespace: openshift-logging
  6. spec:
  7. logStore:
  8. type: lokistack
  9. lokistack:
  10. name: lokistack-dev
  11. collection:
  12. type: vector

Troubleshooting Loki “entry out of order” errors

If your Fluentd forwards a large block of messages to a Loki logging system that exceeds the rate limit, Loki to generates “entry out of order” errors. To fix this issue, you update some values in the Loki server configuration file, loki.yaml.

loki.yaml is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.

Conditions

  • The ClusterLogForwarder custom resource is configured to forward logs to Loki.

  • Your system sends a block of messages that is larger than 2 MB to Loki, such as:

    1. "values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
    2. .......
    3. ......
    4. ......
    5. ......
    6. \"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}
  • When you enter oc logs -c fluentd, the Fluentd logs in your OpenShift Logging cluster show the following messages:

    1. 429 Too Many Requests Ingestion rate limit exceeded (limit: 8388608 bytes/sec) while attempting to ingest '2140' lines totaling '3285284' bytes
    2. 429 Too Many Requests Ingestion rate limit exceeded' or '500 Internal Server Error rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5277702 vs. 4194304)'
  • When you open the logs on the Loki server, they display entry out of order messages like these:

    1. ,\nentry with timestamp 2021-08-18 05:58:55.061936 +0000 UTC ignored, reason: 'entry out of order' for stream:
    2. {fluentd_thread=\"flush_thread_0\", log_type=\"audit\"},\nentry with timestamp 2021-08-18 06:01:18.290229 +0000 UTC ignored, reason: 'entry out of order' for stream: {fluentd_thread="flush_thread_0", log_type="audit"}

Procedure

  1. Update the following fields in the loki.yaml configuration file on the Loki server with the values shown here:

    • grpc_server_max_recv_msg_size: 8388608

    • chunk_target_size: 8388608

    • ingestion_rate_mb: 8

    • ingestion_burst_size_mb: 16

  2. Apply the changes in loki.yaml to the Loki server.

Example loki.yaml file

  1. auth_enabled: false
  2. server:
  3. http_listen_port: 3100
  4. grpc_listen_port: 9096
  5. grpc_server_max_recv_msg_size: 8388608
  6. ingester:
  7. wal:
  8. enabled: true
  9. dir: /tmp/wal
  10. lifecycler:
  11. address: 127.0.0.1
  12. ring:
  13. kvstore:
  14. store: inmemory
  15. replication_factor: 1
  16. final_sleep: 0s
  17. chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
  18. chunk_target_size: 8388608
  19. max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
  20. chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  21. max_transfer_retries: 0 # Chunk transfers disabled
  22. schema_config:
  23. configs:
  24. - from: 2020-10-24
  25. store: boltdb-shipper
  26. object_store: filesystem
  27. schema: v11
  28. index:
  29. prefix: index_
  30. period: 24h
  31. storage_config:
  32. boltdb_shipper:
  33. active_index_directory: /tmp/loki/boltdb-shipper-active
  34. cache_location: /tmp/loki/boltdb-shipper-cache
  35. cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
  36. shared_store: filesystem
  37. filesystem:
  38. directory: /tmp/loki/chunks
  39. compactor:
  40. working_directory: /tmp/loki/boltdb-shipper-compactor
  41. shared_store: filesystem
  42. limits_config:
  43. reject_old_samples: true
  44. reject_old_samples_max_age: 12h
  45. ingestion_rate_mb: 8
  46. ingestion_burst_size_mb: 16
  47. chunk_store_config:
  48. max_look_back_period: 0s
  49. table_manager:
  50. retention_deletes_enabled: false
  51. retention_period: 0s
  52. ruler:
  53. storage:
  54. type: local
  55. local:
  56. directory: /tmp/loki/rules
  57. rule_path: /tmp/loki/rules-temp
  58. alertmanager_url: http://localhost:9093
  59. ring:
  60. kvstore:
  61. store: inmemory
  62. enable_api: true

Additional resources

Collector features

Table 2. Log Sources
FeatureFluentdVector

App container logs

App-specific routing

App-specific routing by namespace

Infra container logs

Infra journal logs

Kube API audit logs

Openshift API audit logs

Open Virtual Network (OVN) audit logs

Table 3. Outputs
FeatureFluentdVector

Elasticsearch v5-v7

Fluent forward

Syslog RFC3164

Syslog RFC5424

Kafka

Cloudwatch

Loki

Table 4. Authorization and Authentication
FeatureFluentdVector

Elasticsearch certificates

Elasticsearch username / password

Cloudwatch keys

Cloudwatch STS

Kafka certificates

Kafka username / password

Kafka SASL

Loki bearer token

Table 5. Normalizations and Transformations
FeatureFluentdVector

Viaq data model - app

Viaq data model - infra

Viaq data model - infra(journal)

Viaq data model - Linux audit

Viaq data model - kube-apiserver audit

Viaq data model - OpenShift API audit

Viaq data model - OVN

Loglevel Normalization

JSON parsing

Structured Index

Multiline error detection

Multicontainer / split indices

Flatten labels

CLF static labels

Table 6. Tuning
FeatureFluentdVector

Fluentd readlinelimit

Fluentd buffer

- chunklimitsize

- totallimitsize

- overflowaction

- flushthreadcount

- flushmode

- flushinterval

- retrywait

- retrytype

- retrymaxinterval

- retrytimeout

Table 7. Visibility
FeatureFluentdVector

Metrics

Dashboard

Alerts

Table 8. Miscellaneous
FeatureFluentdVector

Global proxy support

x86 support

ARM support

PowerPC support

IBM Z support

IPV6 support

Log event buffering

Disconnected Cluster

Additional Resources