Aggregate Logging Sizing Guidelines

Overview

The Elasticsearch, Fluentd, and Kibana (EFK) stack aggregates logs from nodes and applications running inside your OKD installation. Once deployed it uses Fluentd to aggregate logs from all nodes, and pods into Elasticsearch (ES). It also provides a centralized Kibana web UI where users and administrators can create rich visualizations and dashboards with the aggregated data.

Installation

The general procedure for installing an aggregate logging stack in OKD is described in Aggregating Container Logs. There are some important things to keep in mind while going through the installation guide:

In order for the logging pods to spread evenly across your cluster, an empty node selector should be used when creating the project.

  1. $ oc adm new-project logging --node-selector=""

In conjunction with node labeling, which is done later, this controls pod placement across the logging project.

Elasticsearch (ES) should be deployed with a cluster size of at least three for resiliency to node failures. This is specified by setting the **openshift_logging_es_cluster_size** parameter in the inventory host file.

Refer to Ansible Variables for a full list of parameters.

Kibana requires a hostname that can be resolved from wherever the browser will be used to access it. For example, you might need to add a DNS alias for Kibana to your corporate name service in order to access Kibana from the web browser running on your laptop. Logging deployment creates a Route to Kibana on one of your “infra” nodes or wherever the OpenShift router is running. The Kibana hostname alias should point to this machine. This hostname is specified as the Ansible openshift_logging_kibana_hostname variable.

Installation can take some time depending on whether the images were already retrieved from the registry or not, and on the size of your cluster.

Inside the openshift-logging project, you can check your deployment with oc get all.

  1. $ oc get all
  2. NAME REVISION REPLICAS TRIGGERED BY
  3. logging-curator 1 1
  4. logging-es-6cvk237t 1 1
  5. logging-es-e5x4t4ai 1 1
  6. logging-es-xmwvnorv 1 1
  7. logging-kibana 1 1
  8. NAME DESIRED CURRENT AGE
  9. logging-curator-1 1 1 3d
  10. logging-es-6cvk237t-1 1 1 3d
  11. logging-es-e5x4t4ai-1 1 1 3d
  12. logging-es-xmwvnorv-1 1 1 3d
  13. logging-kibana-1 1 1 3d
  14. NAME HOST/PORT PATH SERVICE TERMINATION LABELS
  15. logging-kibana kibana.example.com logging-kibana reencrypt component=support,logging-infra=support,provider=openshift
  16. logging-kibana-ops kibana-ops.example.com logging-kibana-ops reencrypt component=support,logging-infra=support,provider=openshift
  17. NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  18. logging-es 172.24.155.177 <none> 9200/TCP 3d
  19. logging-es-cluster None <none> 9300/TCP 3d
  20. logging-es-ops 172.27.197.57 <none> 9200/TCP 3d
  21. logging-es-ops-cluster None <none> 9300/TCP 3d
  22. logging-kibana 172.27.224.55 <none> 443/TCP 3d
  23. logging-kibana-ops 172.25.117.77 <none> 443/TCP 3d
  24. NAME READY STATUS RESTARTS AGE
  25. logging-curator-1-6s7wy 1/1 Running 0 3d
  26. logging-deployer-un6ut 0/1 Completed 0 3d
  27. logging-es-6cvk237t-1-cnpw3 1/1 Running 0 3d
  28. logging-es-e5x4t4ai-1-v933h 1/1 Running 0 3d
  29. logging-es-xmwvnorv-1-adr5x 1/1 Running 0 3d
  30. logging-fluentd-156xn 1/1 Running 0 3d
  31. logging-fluentd-40biz 1/1 Running 0 3d
  32. logging-fluentd-8k847 1/1 Running 0 3d

You should end up with a similar setup to the following.

  1. $ oc get pods -o wide
  2. NAME READY STATUS RESTARTS AGE NODE
  3. logging-curator-1-6s7wy 1/1 Running 0 3d ip-172-31-24-239.us-west-2.compute.internal
  4. logging-deployer-un6ut 0/1 Completed 0 3d ip-172-31-6-152.us-west-2.compute.internal
  5. logging-es-6cvk237t-1-cnpw3 1/1 Running 0 3d ip-172-31-24-238.us-west-2.compute.internal
  6. logging-es-e5x4t4ai-1-v933h 1/1 Running 0 3d ip-172-31-24-235.us-west-2.compute.internal
  7. logging-es-xmwvnorv-1-adr5x 1/1 Running 0 3d ip-172-31-24-233.us-west-2.compute.internal
  8. logging-fluentd-156xn 1/1 Running 0 3d ip-172-31-24-241.us-west-2.compute.internal
  9. logging-fluentd-40biz 1/1 Running 0 3d ip-172-31-24-236.us-west-2.compute.internal
  10. logging-fluentd-8k847 1/1 Running 0 3d ip-172-31-24-237.us-west-2.compute.internal
  11. logging-fluentd-9a3qx 1/1 Running 0 3d ip-172-31-24-231.us-west-2.compute.internal
  12. logging-fluentd-abvgj 1/1 Running 0 3d ip-172-31-24-228.us-west-2.compute.internal
  13. logging-fluentd-bh74n 1/1 Running 0 3d ip-172-31-24-238.us-west-2.compute.internal
  14. ...
  15. ...

By default the amount of RAM allocated to each ES instance is 16GB. **openshift_logging_es_memory_limit** is the parameter used in the openshift-ansible host inventory file. Keep in mind that half of this value will be passed to the individual elasticsearch pods java processes heap size.

Learn more about installing EFK.

Large Clusters

At 100 nodes or more, it is recommended to first pre-pull the logging images from docker pull registry.redhat.io/openshift3/logging-fluentd:v3.11. After deploying the logging infrastructure pods (Elasticsearch, Kibana, and Curator), node labeling should be done in steps of 20 nodes at a time. For example:

Using a simple loop:

  1. $ while read node; do oc label nodes $node logging-infra-fluentd=true; done < 20_fluentd.lst

The following also works:

  1. $ oc label nodes 10.10.0.{100..119} logging-infra-fluentd=true

Labeling nodes in groups paces the DaemonSets used by OpenShift logging, helping to avoid contention on shared resources such as the image registry.

Check for the occurence of any “CrashLoopBackOff | ImagePullFailed | Error” issues. oc logs <pod>, oc describe pod <pod> and oc get event are helpful diagnostic commands.

Systemd-journald and rsyslog

In Red Hat Enterprise Linux (RHEL) 7 the systemd-journald.socket unit creates /dev/log during the boot process, and then passes input to systemd-journald.service. Every syslog() call goes to the journal.

The default rate limiting for systemd-journald causes some system logs to be dropped before Fluentd can read them. To prevent this add the following to the /etc/systemd/journald.conf file:

  1. # Disable rate limiting
  2. RateLimitInterval=1s
  3. RateLimitBurst=10000
  4. Storage=volatile
  5. Compress=no
  6. MaxRetentionSec=30s

Then restart the services.

  1. $ systemctl restart systemd-journald.service
  2. $ systemctl restart rsyslog.service

These settings account for the bursty nature of uploading in bulk.

After removing the rate limit, you may see increased CPU utilization on the system logging daemons as it processes any messages that would have previously been throttled.

Scaling up EFK Logging

If you do not indicate the desired scale at first deployment, the least disruptive way of adjusting your cluster is by re-running the Ansible logging playbook after updating the inventory file with an updated openshift_logging_es_cluster_size value. parameter. Refer to the Performing Administrative Elasticsearch Operations section for more in-depth information.

A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host, and setting the openshift_logging_es_number_of_replicas Ansible variable to a value of 1 or higher to create replicas.

Storage Considerations

An Elasticsearch index is a collection of shards and their corresponding replicas. This is how ES implements high availability internally, so there is little need to use hardware based mirroring RAID variants. RAID 0 can still be used to increase overall disk performance.

A persistent volume is added to each Elasticsearch deployment configuration. On OKD this is usually achieved through Persistent Volume Claims.

The PVCs is named based on the openshift_logging_es_pvc_prefix setting. Refer to Persistent Elasticsearch Storage for more details.

Fluentd ships any logs from systemd journal and /var/lib/docker/containers/*.log to Elasticsearch. Learn more.

Local SSD drives are recommended in order to achieve the best performance. In Red Hat Enterprise Linux (RHEL) 7, the deadline IO scheduler is the default for all block devices except SATA disks. For SATA disks, the default IO scheduler is cfq.

Sizing storage for ES is greatly dependent on how you optimize your indices. Therefore, consider how much data you need in advance and that you are aggregating application log data. Some Elasticsearch users have found that it is necessary to keep absolute storage consumption around 50% and below 70% at all times. This helps to avoid Elasticsearch becoming unresponsive during large merge operations.