Loki
About the LokiStack
In logging subsystem documentation, LokiStack refers to the logging subsystem supported combination of Loki, and web proxy with OKD authentication integration. LokiStack’s proxy uses OKD authentication to enforce multi-tenancy. Loki refers to the log store as either the individual component or an external store.
Loki is a horizontally scalable, highly available, multi-tenant log aggregation system currently offered as an alternative to Elasticsearch as a log store for the logging subsystem. Elasticsearch indexes incoming log records completely during ingestion. Loki only indexes a few fixed labels during ingestion, and defers more complex parsing until after the logs have been stored. This means Loki can collect logs more quickly. As with Elasticsearch, you can query Loki using JSON paths or regular expressions.
Deployment Sizing
Sizing for Loki follows the format of N<x>.*<size>*
where the value <N>
is number of instances and <size>
specifies performance capabilities.
1x.extra-small | 1x.small | 1x.medium | |
---|---|---|---|
Data transfer | Demo use only. | 500GB/day | 2TB/day |
Queries per second (QPS) | Demo use only. | 25-50 QPS at 200ms | 25-75 QPS at 200ms |
Replication factor | None | 2 | 3 |
Total CPU requests | 5 vCPUs | 36 vCPUs | 54 vCPUs |
Total Memory requests | 7.5Gi | 63Gi | 139Gi |
Total Disk requests | 150Gi | 300Gi | 450Gi |
Supported API Custom Resource Definitions
LokiStack development is ongoing, not all APIs are supported currently supported.
CustomResourceDefinition (CRD) | ApiVersion | Support state |
---|---|---|
LokiStack | lokistack.loki.grafana.com/v1 | Supported in 5.5 |
RulerConfig | rulerconfig.loki.grafana/v1beta1 | Technology Preview |
AlertingRule | alertingrule.loki.grafana/v1beta1 | Technology Preview |
RecordingRule | recordingrule.loki.grafana/v1beta1 | Technology Preview |
Usage of For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/. |
Deploying the LokiStack
You can use the OKD web console to deploy the LokiStack.
Prerequisites
Logging subsystem for Red Hat OpenShift Operator 5.5 and later
AWS S3 bucket for log storage
Procedure
Install the
LokiOperator
Operator:In the OKD web console, click Operators → OperatorHub.
Choose LokiOperator from the list of available Operators, and click Install.
Under Installation Mode, select All namespaces on the cluster.
Under Installed Namespace, select openshift-operators-redhat.
You must specify the
openshift-operators-redhat
namespace. Theopenshift-operators
namespace might contain Community Operators, which are untrusted and might publish a metric with the same name as an OKD metric, which would cause conflicts.Select Enable operator recommended cluster monitoring on this namespace.
This option sets the
openshift.io/cluster-monitoring: "true"
label in the Namespace object. You must select this option to ensure that cluster monitoring scrapes theopenshift-operators-redhat
namespace.Select an Approval Strategy.
The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
The Manual strategy requires a user with appropriate credentials to approve the Operator update.
Click Install.
Verify that you installed the LokiOperator. Visit the Operators → Installed Operators page and look for LokiOperator.
Ensure that LokiOperator is listed with Status as Succeeded in all the projects.
Create a
Secret
YAML file that uses theaccess_key_id
andaccess_key_secret
fields to specify your base64-encoded AWS credentials. For example:apiVersion: v1
kind: Secret
metadata:
name: logging-loki-s3
namespace: openshift-logging
data:
access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK
access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo=
bucketnames: czMtYnVja2V0LW5hbWU= (1)
endpoint: aHR0cHM6Ly9zMy5ldS1jZW50cmFsLTEuYW1hem9uYXdzLmNvbQ== (2)
1 Base64-encoded bucket name 2 Base64-encoded S3 API endpoint Create the
LokiStack
custom resource:apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
size: 1x.small
storage:
schemas:
- version: v12
effectiveDate: 2022-06-01
secret:
name: logging-loki-s3
type: s3
storageClassName: gp2
tenants:
mode: openshift-logging
Apply the configuration:
oc apply -f logging-loki.yaml
Create or edit a Create or edit a
ClusterLogging
CR:apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
name: cr-lokistack
namespace: openshift-logging
spec:
logStore:
lokistack:
name: logging-loki
Apply the configuration:
oc apply -f cr-lokistack.yaml
Enable the RedHat OpenShift Logging Console Plugin:
In the OKD web console, click Operators → Installed Operators.
Select the RedHat OpenShift Logging Operator.
Under Console plugin, click Disabled.
Select Enable and then Save. This change will restart the ‘openshift-console’ pods.
After the pods restart, you will receive a notification that a web console update is available, prompting you to refresh.
After refreshing the web console, click Observe from the left main menu. A new option for Logs will be available to you.
This plugin is only available on OKD 4.10 and later. OKD |
Forwarding logs to LokiStack
To configure log forwarding to the LokiStack gateway, you must create a ClusterLogging custom resource (CR).
Prerequisites
Logging subsystem for Red Hat OpenShift: 5.5 and later
LokiOperator
Operator
Procedure
- Create or edit a YAML file that defines the
ClusterLogging
custom resource (CR):
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
name: instance
namespace: openshift-logging
spec:
logStore:
type: lokistack
lokistack:
name: lokistack-dev
collection:
type: vector
Troubleshooting Loki “entry out of order” errors
If your Fluentd forwards a large block of messages to a Loki logging system that exceeds the rate limit, Loki to generates “entry out of order” errors. To fix this issue, you update some values in the Loki server configuration file, loki.yaml
.
|
Conditions
The
ClusterLogForwarder
custom resource is configured to forward logs to Loki.Your system sends a block of messages that is larger than 2 MB to Loki, such as:
"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
.......
......
......
......
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}
When you enter
oc logs -c fluentd
, the Fluentd logs in your OpenShift Logging cluster show the following messages:429 Too Many Requests Ingestion rate limit exceeded (limit: 8388608 bytes/sec) while attempting to ingest '2140' lines totaling '3285284' bytes
429 Too Many Requests Ingestion rate limit exceeded' or '500 Internal Server Error rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5277702 vs. 4194304)'
When you open the logs on the Loki server, they display
entry out of order
messages like these:,\nentry with timestamp 2021-08-18 05:58:55.061936 +0000 UTC ignored, reason: 'entry out of order' for stream:
{fluentd_thread=\"flush_thread_0\", log_type=\"audit\"},\nentry with timestamp 2021-08-18 06:01:18.290229 +0000 UTC ignored, reason: 'entry out of order' for stream: {fluentd_thread="flush_thread_0", log_type="audit"}
Procedure
Update the following fields in the
loki.yaml
configuration file on the Loki server with the values shown here:grpc_server_max_recv_msg_size: 8388608
chunk_target_size: 8388608
ingestion_rate_mb: 8
ingestion_burst_size_mb: 16
Apply the changes in
loki.yaml
to the Loki server.
Example loki.yaml
file
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
grpc_server_max_recv_msg_size: 8388608
ingester:
wal:
enabled: true
dir: /tmp/wal
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
chunk_target_size: 8388608
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /tmp/loki/chunks
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 12h
ingestion_rate_mb: 8
ingestion_burst_size_mb: 16
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
ruler:
storage:
type: local
local:
directory: /tmp/loki/rules
rule_path: /tmp/loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: true
Additional resources
Collector features
Feature | Fluentd | Vector |
---|---|---|
App container logs | ✓ | ✓ |
App-specific routing | ✓ | ✓ |
App-specific routing by namespace | ✓ | ✓ |
Infra container logs | ✓ | ✓ |
Infra journal logs | ✓ | ✓ |
Kube API audit logs | ✓ | ✓ |
Openshift API audit logs | ✓ | ✓ |
Open Virtual Network (OVN) audit logs | ✓ | ✓ |
Feature | Fluentd | Vector |
---|---|---|
Elasticsearch v5-v7 | ✓ | ✓ |
Fluent forward | ✓ | |
Syslog RFC3164 | ✓ | |
Syslog RFC5424 | ✓ | |
Kafka | ✓ | ✓ |
Cloudwatch | ✓ | ✓ |
Loki | ✓ | ✓ |
Feature | Fluentd | Vector |
---|---|---|
Elasticsearch certificates | ✓ | ✓ |
Elasticsearch username / password | ✓ | ✓ |
Cloudwatch keys | ✓ | ✓ |
Cloudwatch STS | ✓ | |
Kafka certificates | ✓ | ✓ |
Kafka username / password | ✓ | ✓ |
Kafka SASL | ✓ | ✓ |
Loki bearer token | ✓ | ✓ |
Feature | Fluentd | Vector |
---|---|---|
Viaq data model - app | ✓ | ✓ |
Viaq data model - infra | ✓ | ✓ |
Viaq data model - infra(journal) | ✓ | ✓ |
Viaq data model - Linux audit | ✓ | ✓ |
Viaq data model - kube-apiserver audit | ✓ | ✓ |
Viaq data model - OpenShift API audit | ✓ | ✓ |
Viaq data model - OVN | ✓ | ✓ |
Loglevel Normalization | ✓ | ✓ |
JSON parsing | ✓ | ✓ |
Structured Index | ✓ | ✓ |
Multiline error detection | ✓ | |
Multicontainer / split indices | ✓ | ✓ |
Flatten labels | ✓ | ✓ |
CLF static labels | ✓ | ✓ |
Feature | Fluentd | Vector |
---|---|---|
Fluentd readlinelimit | ✓ | |
Fluentd buffer | ✓ | |
- chunklimitsize | ✓ | |
- totallimitsize | ✓ | |
- overflowaction | ✓ | |
- flushthreadcount | ✓ | |
- flushmode | ✓ | |
- flushinterval | ✓ | |
- retrywait | ✓ | |
- retrytype | ✓ | |
- retrymaxinterval | ✓ | |
- retrytimeout | ✓ |
Feature | Fluentd | Vector |
---|---|---|
Metrics | ✓ | ✓ |
Dashboard | ✓ | ✓ |
Alerts | ✓ |
Feature | Fluentd | Vector |
---|---|---|
Global proxy support | ✓ | ✓ |
x86 support | ✓ | ✓ |
ARM support | ✓ | ✓ |
PowerPC support | ✓ | ✓ |
IBM Z support | ✓ | ✓ |
IPV6 support | ✓ | ✓ |
Log event buffering | ✓ | |
Disconnected Cluster | ✓ | ✓ |