In this guide, we recommend best practices for cluster-level logging and application logging.
Changes in Logging in Rancher v2.5
Before Rancher v2.5, logging in Rancher has historically been a pretty static integration. There were a fixed list of aggregators to choose from (ElasticSearch, Splunk, Kafka, Fluentd and Syslog), and only two configuration points to choose (Cluster-level and Project-level).
Logging in 2.5 has been completely overhauled to provide a more flexible experience for log aggregation. With the new logging feature, administrators and users alike can deploy logging that meets fine-grained collection criteria while offering a wider array of destinations and configuration options.
“Under the hood”, Rancher logging uses the Banzai Cloud logging operator. We provide manageability of this operator (and its resources), and tie that experience in with managing your Rancher clusters.
Cluster-level Logging
Cluster-wide Scraping
For some users, it is desirable to scrape logs from every container running in the cluster. This usually coincides with your security team’s request (or requirement) to collect all logs from all points of execution.
In this scenario, it is recommended to create at least two ClusterOutput objects - one for your security team (if you have that requirement), and one for yourselves, the cluster administrators. When creating these objects take care to choose an output endpoint that can handle the significant log traffic coming from the entire cluster. Also make sure to choose an appropriate index to receive all these logs.
Once you have created these ClusterOutput objects, create a ClusterFlow to collect all the logs. Do not define any Include or Exclude rules on this flow. This will ensure that all logs from across the cluster are collected. If you have two ClusterOutputs, make sure to send logs to both of them.
Kubernetes Components
ClusterFlows have the ability to collect logs from all containers on all hosts in the Kubernetes cluster. This works well in cases where those containers are part of a Kubernetes pod; however, RKE containers exist outside of the scope of Kubernetes.
Currently (as of v2.5.1) the logs from RKE containers are collected, but are not able to easily be filtered. This is because those logs do not contain information as to the source container (e.g. etcd
or kube-apiserver
).
A future release of Rancher will include the source container name which will enable filtering of these component logs. Once that change is made, you will be able to customize a ClusterFlow to retrieve only the Kubernetes component logs, and direct them to an appropriate output.
Application Logging
Best practice not only in Kubernetes but in all container-based applications is to direct application logs to stdout
/stderr
. The container runtime will then trap these logs and do something with them - typically writing them to a file. Depending on the container runtime (and its configuration), these logs can end up in any number of locations.
In the case of writing the logs to a file, Kubernetes helps by creating a /var/log/containers
directory on each host. This directory symlinks the log files to their actual destination (which can differ based on configuration or container runtime).
Rancher logging will read all log entries in /var/log/containers
, ensuring that all log entries from all containers (assuming a default configuration) will have the opportunity to be collected and processed.
Specific Log Files
Log collection only retrieves stdout
/stderr
logs from pods in Kubernetes. But what if we want to collect logs from other files that are generated by applications? Here, a log streaming sidecar (or two) may come in handy.
The goal of setting up a streaming sidecar is to take log files that are written to disk, and have their contents streamed to stdout
. This way, the Banzai Logging Operator can pick up those logs and send them to your desired output.
To set this up, edit your workload resource (e.g. Deployment) and add the following sidecar definition:
...
containers:
- args:
- -F
- /path/to/your/log/file.log
command:
- tail
image: busybox
name: stream-log-file-[name]
volumeMounts:
- mountPath: /path/to/your/log
name: mounted-log
...
This will add a container to your workload definition that will now stream the contents of (in this example) /path/to/your/log/file.log
to stdout
.
This log stream is then automatically collected according to any Flows or ClusterFlows you have setup. You may also wish to consider creating a Flow specifically for this log file by targeting the name of the container. See example:
...
spec:
match:
- select:
container_names:
- stream-log-file-name
...
General Best Practices
- Where possible, output structured log entries (e.g.
syslog
, JSON). This makes handling of the log entry easier as there are already parsers written for these formats. - Try to provide the name of the application that is creating the log entry, in the entry itself. This can make troubleshooting easier as Kubernetes objects do not always carry the name of the application as the object name. For instance, a pod ID may be something like
myapp-098kjhsdf098sdf98
which does not provide much information about the application running inside the container. - Except in the case of collecting all logs cluster-wide, try to scope your Flow and ClusterFlow objects tightly. This makes it easier to troubleshoot when problems arise, and also helps ensure unrelated log entries do not show up in your aggregator. An example of tight scoping would be to constrain a Flow to a single Deployment in a namespace, or perhaps even a single container within a Pod.
- Keep the log verbosity down except when troubleshooting. High log verbosity poses a number of issues, chief among them being noise: significant events can be drowned out in a sea of
DEBUG
messages. This is somewhat mitigated with automated alerting and scripting, but highly verbose logging still places an inordinate amount of stress on the logging infrastructure. - Where possible, try to provide a transaction or request ID with the log entry. This can make tracing application activity across multiple log sources easier, especially when dealing with distributed applications.