Use Loggie to Collect Container Logs
Before reading this article, it is recommended to refer to Log collection in Kubernetes.
How does Loggie Collect Container Logs?
Due to the good extensibility of Kubernetes, users can define their own CRDs to express their desired states, and develop Controllers with the help of some frameworks.
Based on this idea, what kind of logs a service needs to collect and what kind of log configuration is required are the expectations of users. This requires us to develop a log collection Controller to achieve.
Therefore, the user only needs to describe the logs of which Pods need to be collected and log path in pod in the CRD LogConfig.
The core architecture is shown in the following figure:
Loggie will perceive Pod and CRD events and dynamically update the configuration. At the same time, Loggie can find the corresponding file on the node for collection according to the Volume mounted on the log file path. In addition, according to the configuration, the Env/Annotation/Label on the Pod can be automatically added to the log as meta information.
At the same time, compared with the way of mounting all nodes to the same path for collection, it also solves the problem of not being able to fine-tune the configuration for a single service and collecting irrelevant logs.
Benefits are not limited to these. Loggie can perform corresponding adaptation and support based on Kubernetes in terms of dynamic configuration delivery and monitoring indicators.
CRD Instructions
Loggie currently has the following CRDs:
- LogConfig: namespace-level CRD, used to collect Pod container logs, which mainly includes the source configuration of the collection, as well as the associated sink and interceptor.
- ClusterLogConfig: cluster-level CRD, which indicates the cluster-level collection of Pod container logs, the collection of logs on nodes, and the distribution of general pipeline configuration for a Loggie cluster.
- Sink: a sink backend that needs to be associated in ClusterLogConfig/LogConfig.
- Interceptor: an interceptors group that needs to be associated in ClusterLogConfig/LogConfig.
The process using CRD is as follows:
Preparation
Architecture
What is the overall architecture?
- The Loggie Agent sends data directly to the backend such as Elasticsearch or other storage, etc.
- The Loggie Agent sends data to the Loggie Aggregator for transit processing, and then to other back-end storage
- The Loggie Agent sends data to Kafka, and oggie Aggregator consumes Kafka and sends data to the backend
…
This article only focuses on the acquisition side. If you need to deploy the Loggie Aggregator, please refer to Loggie Aggregator.
Before collecting container logs, make sure that the Loggie DaemonSet has been deployed in Kubernetes. Deploy Loggie in Kubernetes
Info
We recommend using the DaemonSet method to collect container logs, Loggie plan to supports the automatic injection of Loggie Sidecar to collect logs in the future. RoadMap
Usage
How do business pods mount logs?
It is recommended that if you are not sensitive to the possibility of log loss, such as operation and maintenance logs, you can use emptyDir. If the logs are important and cannot be lost, use hostPath and configure subPathExpr on volumeMount to realize path isolation.
Example
tomcat-emptydir.yml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tomcat
name: tomcat
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: tomcat
template:
metadata:
labels:
app: tomcat
spec:
containers:
- name: tomcat
image: tomcat
volumeMounts:
- mountPath: /usr/local/tomcat/logs
name: log
volumes:
- emptyDir: {}
name: log
tomcat-hostpath.yml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tomcat
name: tomcat
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: tomcat
template:
metadata:
labels:
app: tomcat
spec:
containers:
- env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: tomcat
name: tomcat
volumeMounts:
- mountPath: /log
name: datalog
subPathExpr: $(NAMESPACE)/$(POD_NAME)
volumes:
- hostPath:
path: /data/log
type: ""
name: datalog
Caution
Make sure that the log path mounted by the Pod is not shared by multiple Pods. For example, two Pods under a Deployment use the hostPath without subPathExpr to mount the log directory. If the two Pods are scheduled on the same node, they will be printed to the same log directory and file, which may cause collection exceptions.
We first create the tomcat Deployment of the above example for log collection.
kubectl apply -f tomcat-emptydir.yml
Next, we will collect the tomcat logs and send them to Elasticsearch. If you want to collect other container logs or send them to other backends, you only need to modify the configuration.
Deploy Elasticsearch and Kibana (optional)
Since this demo sends data to Elasticsearch, here we deploy Elasticsearch and Kibana.
If you already have Elasticsearch and Kibana in your environment, this step can be ignored. If you want to build a set of Elasticsearch and Kibana by yourself, here is the suggestion:
If there is no helm installed, you need to download helm。
Use the following command:
helm repo add elastic https://helm.elastic.co
helm install elasticsearch elastic/elasticsearch --set replicas=1
helm install kibana elastic/kibana
And:
kubectl port-forward service/kibana-kibana 5601:http
You can enter localhost:5601 in the browser to access the Kibana page.
Collect Container Logs
Create Sink
In order to indicate the Elasticsearch to which we are about to send logs, we need to configure the corresponding sink. There are two ways here:
If the entire cluster has only one storage backend, we can configure the defaults parameter in the global configuration file configMap. For details, please reference.
Use Sink CRD and refer to it in logConfig. This method can be extended to multiple backends. Different logConfigs can be configured to use different backend storage. In most cases, we recommend this method.
Create a sink as follows. Sink is a cluster-level CRD. It can be modified to other configurations in spec.sink.
Example
cat << EOF | kubectl apply -f -
apiVersion: loggie.io/v1beta1
kind: Sink
metadata:
name: default
spec:
sink: |
type: elasticsearch
index: "loggie"
hosts: ["elasticsearch-master.default.svc:9200"]
EOF
kubectl get sink default
to check if the creation is successful.
Create Interceptor (Optional)
Like Sink, there are two ways to configure interceptor:
In the global configuration file configMap, configure the defaults parameter. This means that all Loggie Agents use the default interceptors configuration.
Use the Interceptor CRD and be refered to in logConfig. This method will be more flexible and suitable for scenarios with certain requirements.
Loggie currently has three built-in interceptors metric
(sending monitoring metrics)、 maxbytes
(maximum event bytes limit)、 retry
. Even if we do not use the above two ways to configure, the three default interceptors will still be used automatically.
Caution
- In the global defaults, configuring interceptors will overwrite the existing built-in interceptors. So if you use global defaults to configure interceptors, please add all the built-in interceptors, unless you are sure you don’t need them.
- The logConfig configured using the interceptor CRD will not override the default and built-in interceptors, and will be added to the entire pipeline interceptors chain.
An example of creating an interceptor is as follows:
Example
cat << EOF | kubectl apply -f -
apiVersion: loggie.io/v1beta1
kind: Interceptor
metadata:
name: default
spec:
interceptors: |
- type: rateLimit
qps: 90000
EOF
Here we create an rateLimit
interceptor, which can be used to limit the current of sending logs. Use kubectl get interceptor default
or kubectl get icp default
to check whether the creation is successful.
Create logConfig
After creating the sink and interceptor, the most important thing is to create logConfig, indicating logs of which Pod we want to collect, and which logs of Pod to collect.
The spec.selector part of logConfig indicates the distribution scope of the log configuration. For collecting Pod logs, you need to configure type: pod
and labelSelector
to select the specified Pods. That is, collect the logs of these Pods. An example is as follows:
spec:
selector:
type: pod
labelSelector:
app: tomcat
In addition, we need to configure the file source in pipeline.sources, that is, which logs of Pods to collect.
pipeline:
sources: |
- type: file
name: mylog
paths:
- stdout
- /var/log/*.log
Note:
- Path
stdout
, indicating the standard output path of the collection container. - The log file filled in path is the path in the container. (When the configuration is finally generated, Loggie will automatically update the path in pod to the path on the node, without the user’s concern)
- Please use glob expression to fill in path.
- When
type: pod
, pipeline.name will be automatically generated by loggie as${namespace}-${logConfigName}
. - The final generated source.name will be
${podName}-${containerName}-${sourceName}
.
Finally, we use sinkRef and interceptorRef to reference the sink and interceptor created above. A logConfig example as follows:
Example
cat << EOF | kubectl apply -f -
apiVersion: loggie.io/v1beta1
kind: LogConfig
metadata:
name: tomcat
namespace: default
spec:
selector:
type: pod
labelSelector:
app: tomcat
pipeline:
sources: |
- type: file
name: common
paths:
- stdout
- /usr/local/tomcat/logs/*.log
sinkRef: default
interceptorRef: default
EOF
Use kubectl get logconfig tomcat
or kubectl get lgc tomcat
to check if the creation is successful. At the same time, you can use kubectl describe lgc tomcat
to check the status by viewing the events of logConfig. If there are events similar to the following, the configuration has been delivered successfully.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal syncSuccess 55s loggie/kind-control-plane Sync type pod [tomcat-684c698b66-hvztn] success
We can also use kubectl -nloggie logs -f ${loggie-name}
to check the log collection by viewing the logs of Loggie on specified node.
After the sending is successful, we can query the collected logs on Kibana.
Automatically Parse Container Stdout Raw Logs
Under normal circumstances, the standard output we collect is not the printed log content, but a entity encapsulated by the container runtime.
For example, the standard output of docker is in json form:
{"log":"I0610 08:29:07.698664 Waiting for caches to sync\n", "stream":"stderr", "time:"2021-06-10T08:29:07.698731204Z"}
The original log content printed by the business is stored in the log field.
The standard output of containerd looks like this:
2021-12-01T03:13:58.298476921Z stderr F INFO [main] Starting service [Catalina]
2021-12-01T03:13:58.298476921Z stderr F
is the prefix content added at runtime, and remaining is the original log.
Especially if we configure multi-line log conllection, because the collected log content is inconsistent with the original output log, it will cause problems with matching multiple lines of the collected standard output log.
Therefore, Loggie provides a one-key switch configuration. In the system configuration, set the parseStdout
to true.
parseStdout
config:
loggie:
discovery:
enabled: true
kubernetes:
parseStdout: true
Loggie will automatically add source codec to parse the original business log when rendering LogConfig.
Note:
It is only valid when the path is only
stdout
:
[valid]sources: |
- type: file
name: common
paths:
- stdout
[invalid]
sources: |
- type: file
name: common
paths:
- stdout
- /usr/local/tomcat/logs/*.log
The above needs to be changed to two sources: [valid]
sources: |
- type: file
name: stdout
paths:
- stdout
- type: file
name: tomcat
paths:
- /usr/local/tomcat/logs/*.log
Currently only the original log content will be retained into
body
, and the rest will be discarded.Automatic parsing of stdout is actually achieved by automatically adding source codec when LogConfig is rendered into Pipeline configuration.
Container Log Collection Without Mounting Volumes
Although we recommend using mounted volume (emptyDir/hostPath+subPathExpr) to hang out log files for collection by Loggie, there are still many cases where we cannot uniformly mount log path of Pods of the business. For example, some basic components cannot configure independent log volumes, or the business is simply unwilling to change the deployment configuration.
Loggie provides the ability to collect container logs without mounting, and can automatically identify and collect log files in the container root filesystem.
You only need to set the configuration in values.yml in helm chartrootFsCollectionEnabled
to true, and fill in the container runtime (docker/containerd) of the actual environment, as follows:
rootFsCollectionEnabled
config:
loggie:
discovery:
enabled: true
kubernetes:
containerRuntime: containerd
rootFsCollectionEnabled: false
After the modification is complete, helm upgrade
again.
The helm template will automatically add some additional mount paths and configurations. If you upgrade from a lower version, you need to modify the deployed Daemonset yaml. Please refer to issues #208 for specific principles.