Log Segmentation

Loggie can use transformer interceptor to segment and process the log, extract the log data in a structured manner, and process the extracted fields.
It is recommended to understand schema design of Loggie’s log data.

Requirements

The thing is to segment, parse, extract and process logs.

For example:

  1. 01-Dec-2021 03:13:58.298 INFO [main] Starting service [Catalina]

We may need to parse out the date and log level:

  1. {
  2. "time": "01-Dec-2021 03:13:58.298",
  3. "level": "INFO",
  4. "message": "[main] Starting service [Catalina]"
  5. }

This structured data is easy to filter and query when stored, or to sort according to the time in the log instead of the time when it is collected, or to filter according to the log level, which can facilitate the query of ERROR-level logs and so on. Similar requirements and usage scenarios may need not only the operation and maintenance logs like the tomcat logs above, or logs such as some business orders, etc.

Parsing and extraction of stdout logs

The following example only provides a reference idea for log segmentation. If you need to extract the original logs of the standard output of the container, please refer to Collect Container logs.

Configuration

Log segmentation can be performed on the Loggie Agent or on the Loggie Aggregator, depending on whether we need an Aggregator, and whether we want the CPU-intensive computation of log processing to be distributed on the Agent and undertaken by each node, or we want to do it in the Aggregator clusters.

The following uses the collection of access logs of the tomcat service as an example to show how to segment the access logs by fields.

For simplicity, the example uses the CRD instance to deliver configuration to the Agent, and uses the dev sink to directly output the processed results for display.

Create Tomcat Deployment

See Reference

Create Logconfig

Configure logconfig as follows:

Example

  1. apiVersion: loggie.io/v1beta1
  2. kind: LogConfig
  3. metadata:
  4. name: tomcat
  5. namespace: default
  6. spec:
  7. selector:
  8. labelSelector:
  9. app: tomcat
  10. type: pod
  11. pipeline:
  12. sources: |
  13. - type: file
  14. name: access
  15. paths:
  16. - /usr/local/tomcat/logs/localhost_access_log.*.txt
  17. interceptors: |
  18. - type: transformer
  19. actions:
  20. - action: regex(body)
  21. pattern: (?<ip>\S+) (?<id>\S+) (?<u>\S+) (?<time>\[.*?\]) (?<url>\".*?\") (?<status>\S+) (?<size>\S+)
  22. sink: |
  23. type: dev
  24. printEvents: true
  25. codec:
  26. type: json
  27. pretty: true

Here we configure the regex action in the transformer interceptors to perform regular extraction for access log.

The original access log looks like this:

  1. 10.244.0.1 - - [31/Aug/2022:03:13:40 +0000] "GET / HTTP/1.1" 404 683

After logs are processed by the transformer, we can kubectl -nloggie logs -f <loggie-pod-name> --tail=100 to view the output log.

An example of the converted event is as follows:

  1. {
  2. "status": "404",
  3. "size": "683",
  4. "fields": {
  5. "logconfig": "tomcat",
  6. "namespace": "test1",
  7. "nodename": "kind-control-plane",
  8. "podname": "tomcat-85c84988d8-frs4n",
  9. "containername": "tomcat"
  10. },
  11. "ip": "10.244.0.1",
  12. "id": "-",
  13. "u": "-",
  14. "time": "[31/Aug/2022:03:13:40 +0000]",
  15. "url": "\"GET / HTTP/1.1\""
  16. }