Running background tasks on nodes automatically with daemon sets

As an administrator, you can create and use daemon sets to run replicas of a pod on specific or all nodes in an OKD cluster.

A daemon set ensures that all (or some) nodes run a copy of a pod. As nodes are added to the cluster, pods are added to the cluster. As nodes are removed from the cluster, those pods are removed through garbage collection. Deleting a daemon set will clean up the pods it created.

You can use daemon sets to create shared storage, run a logging pod on every node in your cluster, or deploy a monitoring agent on every node.

For security reasons, only cluster administrators can create daemon sets.

For more information on daemon sets, see the Kubernetes documentation.

Daemon set scheduling is incompatible with project’s default node selector. If you fail to disable it, the daemon set gets restricted by merging with the default node selector. This results in frequent pod recreates on the nodes that got unselected by the merged node selector, which in turn puts unwanted load on the cluster.

Scheduled by default scheduler

A daemon set ensures that all eligible nodes run a copy of a pod. Normally, the node that a pod runs on is selected by the Kubernetes scheduler. However, previously daemon set pods are created and scheduled by the daemon set controller. That introduces the following issues:

  • Inconsistent pod behavior: Normal pods waiting to be scheduled are created and in Pending state, but daemon set pods are not created in Pending state. This is confusing to the user.

  • Pod preemption is handled by default scheduler. When preemption is enabled, the daemon set controller will make scheduling decisions without considering pod priority and preemption.

The ScheduleDaemonSetPods feature, enabled by default in OKD, lets you to schedule daemon sets using the default scheduler instead of the daemon set controller, by adding the NodeAffinity term to the daemon set pods, instead of the spec.nodeName term. The default scheduler is then used to bind the pod to the target host. If node affinity of the daemon set pod already exists, it is replaced. The daemon set controller only performs these operations when creating or modifying daemon set pods, and no changes are made to the spec.template of the daemon set.

  1. nodeAffinity:
  2. requiredDuringSchedulingIgnoredDuringExecution:
  3. nodeSelectorTerms:
  4. - matchFields:
  5. - key: metadata.name
  6. operator: In
  7. values:
  8. - target-host-name

In addition, a node.kubernetes.io/unschedulable:NoSchedule toleration is added automatically to daemon set pods. The default scheduler ignores unschedulable Nodes when scheduling daemon set pods.

Creating daemonsets

When creating daemon sets, the nodeSelector field is used to indicate the nodes on which the daemon set should deploy replicas.

Prerequisites

  • Before you start using daemon sets, disable the default project-wide node selector in your namespace, by setting the namespace annotation openshift.io/node-selector to an empty string:

    1. $ oc patch namespace myproject -p \
    2. '{"metadata": {"annotations": {"openshift.io/node-selector": ""}}}'
  • If you are creating a new project, overwrite the default node selector:

    1. `oc adm new-project <name> --node-selector=""`.

Procedure

To create a daemon set:

  1. Define the daemon set yaml file:

    1. apiVersion: apps/v1
    2. kind: DaemonSet
    3. metadata:
    4. name: hello-daemonset
    5. spec:
    6. selector:
    7. matchLabels:
    8. name: hello-daemonset (1)
    9. template:
    10. metadata:
    11. labels:
    12. name: hello-daemonset (2)
    13. spec:
    14. nodeSelector: (3)
    15. role: worker
    16. containers:
    17. - image: openshift/hello-openshift
    18. imagePullPolicy: Always
    19. name: registry
    20. ports:
    21. - containerPort: 80
    22. protocol: TCP
    23. resources: {}
    24. terminationMessagePath: /dev/termination-log
    25. serviceAccount: default
    26. terminationGracePeriodSeconds: 10
    1The label selector that determines which pods belong to the daemon set.
    2The pod template’s label selector. Must match the label selector above.
    3The node selector that determines on which nodes pod replicas should be deployed. A matching label must be present on the node.
  2. Create the daemon set object:

    1. $ oc create -f daemonset.yaml
  3. To verify that the pods were created, and that each node has a pod replica:

    1. Find the daemonset pods:

      1. $ oc get pods

      Example output

      1. hello-daemonset-cx6md 1/1 Running 0 2m
      2. hello-daemonset-e3md9 1/1 Running 0 2m
    2. View the pods to verify the pod has been placed onto the node:

      1. $ oc describe pod/hello-daemonset-cx6md|grep Node

      Example output

      1. Node: openshift-node01.hostname.com/10.14.20.134
      1. $ oc describe pod/hello-daemonset-e3md9|grep Node

      Example output

      1. Node: openshift-node02.hostname.com/10.14.20.137
  • If you update a daemon set pod template, the existing pod replicas are not affected.

  • If you delete a daemon set and then create a new daemon set with a different template but the same label selector, it recognizes any existing pod replicas as having matching labels and thus does not update them or create new replicas despite a mismatch in the pod template.

  • If you change node labels, the daemon set adds pods to nodes that match the new labels and deletes pods from nodes that do not match the new labels.

To update a daemon set, force new pod replicas to be created by deleting the old replicas or nodes.