Logging

Available as of v1.2.0

It is important to know what is happening/has happened in the Harvester Cluster.

Harvester collects the cluster running log, kubernetes audit and event log right after the cluster is powered on, which is helpful for monitoring, logging, auditing and troubleshooting.

Harvester supports sending those logs to various types of log servers.

Logging - 图1note

The size of logging data is related to the cluster scale, workload and other factors. Harvester does not use persistent storage to store log data inside the cluster. Users need to set up a log server to receive logs accordingly.

The logging feature is now implemented with an addon and is disabled by default in new installations.

Users can enable/disable the rancher-logging addon from the Harvester UI after installation.

Users can also enable/disable the rancher-logging addon in their Harvester installation by customizing the harvester-configuration file.

For Harvester clusters upgraded from version v1.1.x, the logging feature is converted to an addon automatically and kept enabled as before.

High-level Architecture

The Banzai Cloud Logging operator now powers both Harvester and Rancher as an in-house logging solution.

Logging - 图2

In Harvester’s practice, the Logging, Audit and Event shares one architecture, the Logging is the infrastructure, while the Audit and Event are on top of it.

Logging

The Harvester logging infrastructure allows you to aggregate Harvester logs into an external service such as Graylog, Elasticsearch, Splunk, Grafana Loki and others.

Collected Logs

See below for a list logs that are collected:

  • Logs from all cluster Pods
  • Kernel logs from each node
  • Logs from select systemd services from each node
    • rke2-server
    • rke2-agent
    • rancherd
    • rancher-system-agent
    • wicked
    • iscsid

Logging - 图3note

Users are able to configure and modify where the aggregated logs are sent, as well as some basic filtering. It is not supported to change which logs are collected.

Configuring Log Resources

Underneath Banzai Cloud’s logging operator are fluentd and fluent-bit, which handle the log routing and collecting respectively. If desired, you can modify how many resources are dedicated to those components.

From UI

  1. Go to the Advanced > Addons page and select the rancher-logging addon.
  2. From the Fluentbit tab, change the resource requests and limits.
  3. From the Fluentd tab, change the resource requests and limits.
  4. Select Save when finished configuring the settings for the rancher-logging addon.

Logging - 图4

Logging - 图5note

The UI configuration is only visible when the rancher-logging addon is enabled.

From CLI

You can use the following kubectl command to change resource configurations for the rancher-logging addon: kubectl edit addons.harvesterhci.io -n cattle-logging-system rancher-logging.

The resource path and default values are as follows.

  1. apiVersion: harvesterhci.io/v1beta1
  2. kind: Addon
  3. metadata:
  4. name: rancher-logging
  5. namespace: cattle-logging-system
  6. spec:
  7. valuesContent: |
  8. fluentbit:
  9. resources:
  10. limits:
  11. cpu: 200m
  12. memory: 200Mi
  13. requests:
  14. cpu: 50m
  15. memory: 50Mi
  16. fluentd:
  17. resources:
  18. limits:
  19. cpu: 1000m
  20. memory: 800Mi
  21. requests:
  22. cpu: 100m
  23. memory: 200Mi

Logging - 图6note

You can still make configuration adjustments when the addon is disabled. However, these changes only take effect when you re-enable the addon.

Configuring Log Destinations

Logging is backed by the Banzai Cloud Logging Operator, and so is controlled by Flows/ClusterFlows and Outputs/ClusterOutputs. You can route and filter logs as you like by applying these CRDs to the Harvester cluster.

When applying new Ouptuts and Flows to the cluster, it can take some time for the logging operator to effectively apply them. So please allow a few minutes for the logs to start flowing.

Clustered vs Namespaced

One important thing to understand when routing logs is the difference between ClusterFlow vs Flow and ClusterOutput vs Output. The main difference between the clustered and non-clustered version of each is that the non-clustered versions are namespaced.

The biggest implication of this is that Flows can only access Outputs that are within the same namespace, but can still access any ClusterOutput.

For more information, see the documentation:

From UI

Logging - 图7note

UI images are for Output and Flow whose configuration process is almost identical to their clustered counterparts. Any differences will be noted in the steps below.

Creating Outputs
  1. Choose the option to create a new Output or ClusterOutput.
  2. If creating an Output, select the desired namespace.
  3. Add a name for the resources.
  4. Select the logging type.
  5. Select the logging output type.

Logging - 图8

  1. Configure the output buffer if necessary.

Logging - 图9

  1. Add any labels or annotations.

Logging - 图10

  1. Once done, click Create on the lower right.

Logging - 图11note

Depending on the output selected (Splunk, Elasticsearch, etc), there will be additional fields to specify in the form.

Output

The fields present in the Output form will change depending on the Output chosen, in order to expose the fields present for each output plugin.

Output Buffer

The Output Buffer editor allows you to describe how you want the output buffer to behave. You can find the documentation for the buffer fields here.

Labels & Annotations

You can append labels and annotations to the created resource.

Creating Flows
  1. Choose the option to create a new Flow or ClusterFlow.
  2. If creating a Flow, select the desired namespace.
  3. Add a name for the resource.
  4. Select any nodes whose logs to include or exclude.

Logging - 图12

  1. Select target Outputs and ClusterOutputs.

Logging - 图13

  1. Add any filters if desired.

Logging - 图14

  1. Once done, click Create on the lower left.
Matches

Matches allow you to filter which logs you want to include in the Flow. The form only allows you to include or exclude node logs, but if needed, you can add other match rules supported by the resource by selecting Edit as YAML.

For more information about the match directive, see Routing your logs with match directive.

Outputs

Outputs allow you to select one or more OutputRefs to send the aggregated logs to. When creating or editing a Flow / ClusterFlow, it is required that the user selects at least one Output.

Logging - 图15note

There must be at least one existing ClusterOutput or Output that can be attached to the flow, or you will not be able to create / edit the flow.

Filters

Filters allow you to transform, process, and mutate the logs. In the text edit, you will find descriptions of the supported filters, but for more information, you can visit the list of supported filters.

From CLI

To configure log routes via the command line, you only need to define the YAML files for the relevant resources:

  1. # elasticsearch-logging.yaml
  2. apiVersion: logging.banzaicloud.io/v1beta1
  3. kind: Output
  4. metadata:
  5. name: elasticsearch-example
  6. namespace: fleet-local
  7. labels:
  8. example-label: elasticsearch-example
  9. annotations:
  10. example-annotation: elasticsearch-example
  11. spec:
  12. elasticsearch:
  13. host: <url-to-elasticsearch-server>
  14. port: 9200
  15. ---
  16. apiVersion: logging.banzaicloud.io/v1beta1
  17. kind: Flow
  18. metadata:
  19. name: elasticsearch-example
  20. namespace: fleet-local
  21. spec:
  22. match:
  23. - select: {}
  24. globalOutputRefs:
  25. - elasticsearch-example

And then apply them:

  1. kubectl apply -f elasticsearch-logging.yaml
Referencing Secrets

There are 3 ways Banzai Cloud allows specifying secret values via yaml values.

The simplest is to use the value key, which is a simple string value for the desired secret. This method should only be used for testing and never in production:

  1. aws_key_id:
  2. value: "secretvalue"

The next is to use valueFrom, which allows referencing a specific value from a secret by a name and key pair:

  1. aws_key_id:
  2. valueFrom:
  3. secretKeyRef:
  4. name: <kubernetes-secret-name>
  5. key: <kubernetes-secret-key>

Some plugins require a file to read from rather than simply receiving a value from the secret (this is often the case for CA cert files). In these cases, you need to use mountFrom, which will mount the secret as a file to the underlying fluentd deployment and point the plugin to the file. The valueFrom and mountFrom object look the same:

  1. tls_cert_path:
  2. mountFrom:
  3. secretKeyRef:
  4. name: <kubernetes-secret-name>
  5. key: <kubernetes-secret-key>

For more information, you can find the related documentation here.

Example Outputs

Elasticsearch

For the simplest deployment, you can deploy Elasticsearch on your local system using docker:

  1. docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e xpack.security.enabled=false -e node.name=es01 -it docker.elastic.co/elasticsearch/elasticsearch:6.8.23

Make sure that you have set vm.max_map_count to be >= 262144 or the docker command above will fail. Once the Elasticsearch server is up, you can create the yaml file for the ClusterOutput and ClusterFlow:

  1. cat << EOF > elasticsearch-example.yaml
  2. apiVersion: logging.banzaicloud.io/v1beta1
  3. kind: ClusterOutput
  4. metadata:
  5. name: elasticsearch-example
  6. namespace: cattle-logging-system
  7. spec:
  8. elasticsearch:
  9. host: 192.168.0.119
  10. port: 9200
  11. buffer:
  12. timekey: 1m
  13. timekey_wait: 30s
  14. timekey_use_utc: true
  15. ---
  16. apiVersion: logging.banzaicloud.io/v1beta1
  17. kind: ClusterFlow
  18. metadata:
  19. name: elasticsearch-example
  20. namespace: cattle-logging-system
  21. spec:
  22. match:
  23. - select: {}
  24. globalOutputRefs:
  25. - elasticsearch-example
  26. EOF

And apply the file:

  1. kubectl apply -f elasticsearch-example.yaml

After allowing some time for the logging operator to apply the resources, you can test that the logs are flowing:

  1. $ curl localhost:9200/fluentd/_search
  2. {
  3. "took": 1,
  4. "timed_out": false,
  5. "_shards": {
  6. "total": 5,
  7. "successful": 5,
  8. "skipped": 0,
  9. "failed": 0
  10. },
  11. "hits": {
  12. "total": 11603,
  13. "max_score": 1,
  14. "hits": [
  15. {
  16. "_index": "fluentd",
  17. "_type": "fluentd",
  18. "_id": "yWHr0oMBXcBggZRJgagY",
  19. "_score": 1,
  20. "_source": {
  21. "stream": "stderr",
  22. "logtag": "F",
  23. "message": "I1013 02:29:43.020384 1 csi_handler.go:248] Attaching \"csi-974b4a6d2598d8a7a37b06d06557c428628875e077dabf8f32a6f3aa2750961d\"",
  24. "kubernetes": {
  25. "pod_name": "csi-attacher-5d4cc8cfc8-hd4nb",
  26. "namespace_name": "longhorn-system",
  27. "pod_id": "c63c2014-9556-40ce-a8e1-22c55de12e70",
  28. "labels": {
  29. "app": "csi-attacher",
  30. "pod-template-hash": "5d4cc8cfc8"
  31. },
  32. "annotations": {
  33. "cni.projectcalico.org/containerID": "857df09c8ede7b8dee786a8c8788e8465cca58f0b4d973c448ed25bef62660cf",
  34. "cni.projectcalico.org/podIP": "10.52.0.15/32",
  35. "cni.projectcalico.org/podIPs": "10.52.0.15/32",
  36. "k8s.v1.cni.cncf.io/network-status": "[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"10.52.0.15\"\n ],\n \"default\": true,\n \"dns\": {}\n}]",
  37. "k8s.v1.cni.cncf.io/networks-status": "[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"10.52.0.15\"\n ],\n \"default\": true,\n \"dns\": {}\n}]",
  38. "kubernetes.io/psp": "global-unrestricted-psp"
  39. },
  40. "host": "harvester-node-0",
  41. "container_name": "csi-attacher",
  42. "docker_id": "f10e4449492d4191376d3e84e39742bf077ff696acbb1e5f87c9cfbab434edae",
  43. "container_hash": "sha256:03e115718d258479ce19feeb9635215f98e5ad1475667b4395b79e68caf129a6",
  44. "container_image": "docker.io/longhornio/csi-attacher:v3.4.0"
  45. }
  46. }
  47. },
  48. ...
  49. ]
  50. }
  51. }

Graylog

You can follow the instructions here to deploy and view cluster logs via Graylog:

  1. apiVersion: logging.banzaicloud.io/v1beta1
  2. kind: ClusterFlow
  3. metadata:
  4. name: "all-logs-gelf-hs"
  5. namespace: "cattle-logging-system"
  6. spec:
  7. globalOutputRefs:
  8. - "example-gelf-hs"
  9. ---
  10. apiVersion: logging.banzaicloud.io/v1beta1
  11. kind: ClusterOutput
  12. metadata:
  13. name: "example-gelf-hs"
  14. namespace: "cattle-logging-system"
  15. spec:
  16. gelf:
  17. host: "192.168.122.159"
  18. port: 12202
  19. protocol: "udp"

Splunk

You can follow the instructions here to deploy and view cluster logs via Splunk.

  1. apiVersion: logging.banzaicloud.io/v1beta1
  2. kind: ClusterOutput
  3. metadata:
  4. name: harvester-logging-splunk
  5. namespace: cattle-logging-system
  6. spec:
  7. splunkHec:
  8. hec_host: 192.168.122.101
  9. hec_port: 8088
  10. insecure_ssl: true
  11. index: harvester-log-index
  12. hec_token:
  13. valueFrom:
  14. secretKeyRef:
  15. key: HECTOKEN
  16. name: splunk-hec-token2
  17. buffer:
  18. chunk_limit_size: 3MB
  19. timekey: 2m
  20. timekey_wait: 1m
  21. ---
  22. apiVersion: logging.banzaicloud.io/v1beta1
  23. kind: ClusterFlow
  24. metadata:
  25. name: harvester-logging-splunk
  26. namespace: cattle-logging-system
  27. spec:
  28. filters:
  29. - tag_normaliser: {}
  30. match:
  31. globalOutputRefs:
  32. - harvester-logging-splunk

Loki

You can follow the instructions in the logging HEP on deploying and viewing cluster logs via Grafana Loki.

  1. apiVersion: logging.banzaicloud.io/v1beta1
  2. kind: ClusterFlow
  3. metadata:
  4. name: harvester-loki
  5. namespace: cattle-logging-system
  6. spec:
  7. match:
  8. - select: {}
  9. globalOutputRefs:
  10. - harvester-loki
  11. ---
  12. apiVersion: logging.banzaicloud.io/v1beta1
  13. kind: ClusterOutput
  14. metadata:
  15. name: harvester-loki
  16. namespace: cattle-logging-system
  17. spec:
  18. loki:
  19. url: http://loki-stack.cattle-logging-system.svc:3100
  20. extra_labels:
  21. logOutput: harvester-loki

Audit

Harvester collects Kubernetes audit and is able to send the audit to various types of log servers.

The policy file to guide kube-apiserver is here.

Audit Definition

In kubernetes, the audit data is generated by kube-apiserver according to defined policy.

  1. ...
  2. Audit policy
  3. Audit policy defines rules about what events should be recorded and what data they should include. The audit policy object structure is defined in the audit.k8s.io API group. When an event is processed, it's compared against the list of rules in order. The first matching rule sets the audit level of the event. The defined audit levels are:
  4. None - don't log events that match this rule.
  5. Metadata - log request metadata (requesting user, timestamp, resource, verb, etc.) but not request or response body.
  6. Request - log event metadata and request body but not response body. This does not apply for non-resource requests.
  7. RequestResponse - log event metadata, request and response bodies. This does not apply for non-resource requests.

Audit Log Format

Audit Log Format in Kubernetes

Kubernetes apiserver logs audit with following JSON format into a local file.

  1. {
  2. "kind":"Event",
  3. "apiVersion":"audit.k8s.io/v1",
  4. "level":"Metadata",
  5. "auditID":"13d0bf83-7249-417b-b386-d7fc7c024583",
  6. "stage":"RequestReceived",
  7. "requestURI":"/apis/flowcontrol.apiserver.k8s.io/v1beta2/prioritylevelconfigurations?fieldManager=api-priority-and-fairness-config-producer-v1",
  8. "verb":"create",
  9. "user":{"username":"system:apiserver","uid":"d311c1fe-2d96-4e54-a01b-5203936e1046","groups":["system:masters"]},
  10. "sourceIPs":["::1"],
  11. "userAgent":"kube-apiserver/v1.24.7+rke2r1 (linux/amd64) kubernetes/e6f3597",
  12. "objectRef":{"resource":"prioritylevelconfigurations",
  13. "apiGroup":"flowcontrol.apiserver.k8s.io",
  14. "apiVersion":"v1beta2"},
  15. "requestReceivedTimestamp":"2022-10-19T18:55:07.244781Z",
  16. "stageTimestamp":"2022-10-19T18:55:07.244781Z"
  17. }

Audit Log Format before Being Sent to Log Servers

Harvester keeps the audit log unchanged before sending it to the log server.

Audit Log Output/ClusterOutput

To output audit related log, the Output/ClusterOutput requires the value of loggingRef to be harvester-kube-audit-log-ref.

When you configure from the Harvester dashboard, the field is added automatically.

Select type Audit Only from the Type drpo-down list.

Logging - 图16

When you configure from the CLI, please add the field manually.

Example:

  1. apiVersion: logging.banzaicloud.io/v1beta1
  2. kind: ClusterOutput
  3. metadata:
  4. name: "harvester-audit-webhook"
  5. namespace: "cattle-logging-system"
  6. spec:
  7. http:
  8. endpoint: "http://192.168.122.159:8096/"
  9. open_timeout: 3
  10. format:
  11. type: "json"
  12. buffer:
  13. chunk_limit_size: 3MB
  14. timekey: 2m
  15. timekey_wait: 1m
  16. loggingRef: harvester-kube-audit-log-ref # this reference is fixed and must be here

Audit Log Flow/ClusterFlow

To route audit related logs, the Flow/ClusterFlow requires the value of loggingRef to be harvester-kube-audit-log-ref.

When you configure from the Harvester dashboard, the field is added automatically.

Select type Audit.

Logging - 图17

When you config from the CLI, please add the field manually.

Example:

  1. apiVersion: logging.banzaicloud.io/v1beta1
  2. kind: ClusterFlow
  3. metadata:
  4. name: "harvester-audit-webhook"
  5. namespace: "cattle-logging-system"
  6. spec:
  7. globalOutputRefs:
  8. - "harvester-audit-webhook"
  9. loggingRef: harvester-kube-audit-log-ref # this reference is fixed and must be here

Harvester

Event

Harvester collects Kubernetes event and is able to send the event to various types of log servers.

Event Definition

Kubernetes events are objects that show you what is happening inside a cluster, such as what decisions were made by the scheduler or why some pods were evicted from the node. All core components and extensions (operators/controllers) may create events through the API Server.

Events have no direct relationship with log messages generated by the various components, and are not affected with the log verbosity level. When a component creates an event, it often emits a corresponding log message. Events are garbage collected by the API Server after a short time (typically after an hour), which means that they can be used to understand issues that are happening, but you have to collect them to investigate past events.

Events are the first thing to look at for application, as well as infrastructure operations when something is not working as expected. Keeping them for a longer period is essential if the failure is the result of earlier events, or when conducting post-mortem analysis.

Event Log Format

Event Log Format in Kubernetes

A kubernetes event example:

  1. {
  2. "apiVersion": "v1",
  3. "count": 1,
  4. "eventTime": null,
  5. "firstTimestamp": "2022-08-24T11:17:35Z",
  6. "involvedObject": {
  7. "apiVersion": "kubevirt.io/v1",
  8. "kind": "VirtualMachineInstance",
  9. "name": "vm-ide-1",
  10. "namespace": "default",
  11. "resourceVersion": "604601",
  12. "uid": "1bd4133f-5aa3-4eda-bd26-3193b255b480"
  13. },
  14. "kind": "Event",
  15. "lastTimestamp": "2022-08-24T11:17:35Z",
  16. "message": "VirtualMachineInstance defined.",
  17. "metadata": {
  18. "creationTimestamp": "2022-08-24T11:17:35Z",
  19. "name": "vm-ide-1.170e43cbdd833b62",
  20. "namespace": "default",
  21. "resourceVersion": "604626",
  22. "uid": "0114f4e7-1d4a-4201-b0e5-8cc8ede202f4"
  23. },
  24. "reason": "Created",
  25. "reportingComponent": "",
  26. "reportingInstance": "",
  27. "source": {
  28. "component": "virt-handler",
  29. "host": "harv1"
  30. },
  31. "type": "Normal"
  32. },

Event Log Format before Being Sent to Log Servers

Each event log has the format of: {"stream":"","logtag":"F","message":"","kubernetes":{""}}. The kubernetes event is in the field message.

  1. {
  2. "stream":"stdout",
  3. "logtag":"F",
  4. "message":"{
  5. \\"verb\\":\\"ADDED\\",
  6. \\"event\\":{\\"metadata\\":{\\"name\\":\\"vm-ide-1.170e446c3f890433\\",\\"namespace\\":\\"default\\",\\"uid\\":\\"0b44b6c7-b415-4034-95e5-a476fcec547f\\",\\"resourceVersion\\":\\"612482\\",\\"creationTimestamp\\":\\"2022-08-24T11:29:04Z\\",\\"managedFields\\":[{\\"manager\\":\\"virt-controller\\",\\"operation\\":\\"Update\\",\\"apiVersion\\":\\"v1\\",\\"time\\":\\"2022-08-24T11:29:04Z\\"}]},\\"involvedObject\\":{\\"kind\\":\\"VirtualMachineInstance\\",\\"namespace\\":\\"default\\",\\"name\\":\\"vm-ide-1\\",\\"uid\\":\\"1bd4133f-5aa3-4eda-bd26-3193b255b480\\",\\"apiVersion\\":\\"kubevirt.io/v1\\",\\"resourceVersion\\":\\"612477\\"},\\"reason\\":\\"SuccessfulDelete\\",\\"message\\":\\"Deleted PodDisruptionBudget kubevirt-disruption-budget-hmmgd\\",\\"source\\":{\\"component\\":\\"disruptionbudget-controller\\"},\\"firstTimestamp\\":\\"2022-08-24T11:29:04Z\\",\\"lastTimestamp\\":\\"2022-08-24T11:29:04Z\\",\\"count\\":1,\\"type\\":\\"Normal\\",\\"eventTime\\":null,\\"reportingComponent\\":\\"\\",\\"reportingInstance\\":\\"\\"}
  7. }",
  8. "kubernetes":{"pod_name":"harvester-default-event-tailer-0","namespace_name":"cattle-logging-system","pod_id":"d3453153-58c9-456e-b3c3-d91242580df3","labels":{"app.kubernetes.io/instance":"harvester-default-event-tailer","app.kubernetes.io/name":"event-tailer","controller-revision-hash":"harvester-default-event-tailer-747b9d4489","statefulset.kubernetes.io/pod-name":"harvester-default-event-tailer-0"},"annotations":{"cni.projectcalico.org/containerID":"aa72487922ceb4420ebdefb14a81f0d53029b3aec46ed71a8875ef288cde4103","cni.projectcalico.org/podIP":"10.52.0.178/32","cni.projectcalico.org/podIPs":"10.52.0.178/32","k8s.v1.cni.cncf.io/network-status":"[{\\n \\"name\\": \\"k8s-pod-network\\",\\n \\"ips\\": [\\n \\"10.52.0.178\\"\\n ],\\n \\"default\\": true,\\n \\"dns\\": {}\\n}]","k8s.v1.cni.cncf.io/networks-status":"[{\\n \\"name\\": \\"k8s-pod-network\\",\\n \\"ips\\": [\\n \\"10.52.0.178\\"\\n ],\\n \\"default\\": true,\\n \\"dns\\": {}\\n}]","kubernetes.io/psp":"global-unrestricted-psp"},"host":"harv1","container_name":"harvester-default-event-tailer-0","docker_id":"455064de50cc4f66e3dd46c074a1e4e6cfd9139cb74d40f5ba00b4e3e2a7ab2d","container_hash":"docker.io/banzaicloud/eventrouter@sha256:6353d3f961a368d95583758fa05e8f4c0801881c39ed695bd4e8283d373a4262","container_image":"docker.io/banzaicloud/eventrouter:v0.1.0"}
  9. }

Event Log Output/ClusterOutput

Events share the Output/ClusterOutput with Logging.

Select Logging/Event from the Type drop-down list.

Logging - 图18

Event Log Flow/ClusterFlow

Compared with the normal Logging Flow/ClusterFlow, the Event related Flow/ClusterFlow, has one more match field with the value of event-tailer.

When you configure from the Harvester dashboard, the field is added automatically.

Select Event from the Type drop-down list.

Logging - 图19

When you configure from the CLI, please add the field manually.

Example:

  1. apiVersion: logging.banzaicloud.io/v1beta1
  2. kind: ClusterFlow
  3. metadata:
  4. name: harvester-event-webhook
  5. namespace: cattle-logging-system
  6. spec:
  7. filters:
  8. - tag_normaliser: {}
  9. match:
  10. - select:
  11. labels:
  12. app.kubernetes.io/name: event-tailer
  13. globalOutputRefs:
  14. - harvester-event-webhook