Advanced Scheduling and Taints and Tolerations
You are viewing documentation for a release that is no longer supported. The latest supported version of version 3 is [3.11]. For the most recent version 4, see [4]
You are viewing documentation for a release that is no longer supported. The latest supported version of version 3 is [3.11]. For the most recent version 4, see [4]
Overview
Taints and tolerations allow the node to control which pods should (or should not) be scheduled on them.
Taints and Tolerations
A taint allows a node to refuse pod to be scheduled unless that pod has a matching toleration.
You apply taints to a node through the node specification (NodeSpec
) and apply tolerations to a pod through the pod specification (PodSpec
). A taint on a node instructs the node to repel all pods that do not tolerate the taint.
Taints and tolerations consist of a key, value, and effect. An operator allows you to leave one of these parameters empty.
Parameter | Description | ||||||
---|---|---|---|---|---|---|---|
| The | ||||||
| The | ||||||
| The effect is one of the following:
| ||||||
|
|
A toleration matches a taint:
If the
operator
parameter is set toEqual
:the
key
parameters are the same;the
value
parameters are the same;the
effect
parameters are the same.
If the
operator
parameter is set toExists
:the
key
parameters are the same;the
effect
parameters are the same.
Using Multiple Taints
You can put multiple taints on the same node and multiple tolerations on the same pod. OKD processes multiple taints and tolerations as follows:
Process the taints for which the pod has a matching toleration.
The remaining unmatched taints have the indicated effects on the pod:
If there is at least one unmatched taint with effect
NoSchedule
, OKD cannot schedule a pod onto that node.If there is no unmatched taint with effect
NoSchedule
but there is at least one unmatched taint with effectPreferNoSchedule
, OKD tries to not schedule the pod onto the node.If there is at least one unmatched taint with effect
NoExecute
, OKD evicts the pod from the node (if it is already running on the node), or the pod is not scheduled onto the node (if it is not yet running on the node).Pods that do not tolerate the taint are evicted immediately.
Pods that tolerate the taint without specifying
tolerationSeconds
in their toleration specification remain bound forever.Pods that tolerate the taint with a specified
tolerationSeconds
remain bound for the specified amount of time.
For example:
The node has the following taints:
$ oc adm taint nodes node1 key1=value1:NoSchedule
$ oc adm taint nodes node1 key1=value1:NoExecute
$ oc adm taint nodes node1 key2=value2:NoSchedule
The pod has the following tolerations:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
In this case, the pod cannot be scheduled onto the node, because there is no toleration matching the third taint. The pod continues running if it is already running on the node when the taint is added, because the third taint is the only one of the three that is not tolerated by the pod.
Adding a Taint to an Existing Node
You add a taint to a node using the oc adm taint
command with the parameters described in the Taint and toleration components table:
$ oc adm taint nodes <node-name> <key>=<value>:<effect>
For example:
$ oc adm taint nodes node1 key1=value1:NoExecute
The example places a taint on node1
that has key key1
, value value1
, and taint effect NoExecute
.
Adding a Toleration to a Pod
To add a toleration to a pod, edit the pod specification to include a tolerations
section:
Sample pod configuration file with Equal
operator
tolerations:
- key: "key1" (1)
operator: "Equal" (1)
value: "value1" (1)
effect: "NoExecute" (1)
tolerationSeconds: 3600 (2)
1 | The toleration parameters, as described in the Taint and toleration components table. |
2 | The tolerationSeconds parameter specifies how long a pod can remain bound to a node before being evicted. See Using Toleration Seconds to Delay Pod Evictions below. |
Sample pod configuration file with Exists
operator
tolerations:
- key: "key1"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 3600
Both of these tolerations match the taint created by the oc adm taint
command above. A pod with either toleration would be able to schedule onto node1
.
Using Toleration Seconds to Delay Pod Evictions
You can specify how long a pod can remain bound to a node before being evicted by specifying the tolerationSeconds
parameter in the pod specification. If a taint with the NoExecute
effect is added to a node, any pods that do not tolerate the taint are evicted immediately (pods that do tolerate the taint are not evicted). However, if a pod that to be evicted has the tolerationSeconds
parameter, the pod is not evicted until that time period expires.
For example:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
tolerationSeconds: 3600
Here, if this pod is running but does not have a matching taint, the pod stays bound to the node for 3,600 seconds and then be evicted. If the taint is removed before that time, the pod is not evicted.
Setting a Default Value for Toleration Seconds
This plug-in sets the default forgiveness toleration for pods, to tolerate the node.alpha.kubernetes.io/not-ready:NoExecute
and node.alpha.kubernetes.io/unreachable:NoExecute
taints for five minutes.
If the pod configuration provided by the user already has either toleration, the default is not added.
To enable Default Toleration Seconds:
Modify the master configuration file (/etc/origin/master/master-config.yaml) to Add
DefaultTolerationSeconds
to the admissionConfig section:admissionConfig:
pluginConfig:
DefaultTolerationSeconds:
configuration:
kind: DefaultAdmissionConfig
apiVersion: v1
disable: false
Restart OpenShift for the changes to take effect:
# master-restart api
# master-restart controllers
Verify that the default was added:
Create a pod:
$ oc create -f </path/to/file>
For example:
$ oc create -f hello-pod.yaml
pod "hello-pod" created
Check the pod tolerations:
$ oc describe pod <pod-name> |grep -i toleration
For example:
$ oc describe pod hello-pod |grep -i toleration
Tolerations: node.alpha.kubernetes.io/not-ready=:Exists:NoExecute for 300s
Preventing Pod Eviction for Node Problems
OKD can be configured to represent node unreachable and node not ready conditions as taints. This allows per-pod specification of how long to remain bound to a node that becomes unreachable or not ready, rather than using the default of five minutes.
When the Taint Based Evictions feature is enabled, the taints are automatically added by the node controller and the normal logic for evicting pods from Ready
nodes is disabled.
If a node enters a not ready state, the
node.alpha.kubernetes.io/not-ready:NoExecute
taint is added and pods cannot be scheduled on the node. Existing pods remain for the toleration seconds period.If a node enters a not reachable state, the
node.alpha.kubernetes.io/unreachable:NoExecute
taint is added and pods cannot be scheduled on the node. Existing pods remain for the toleration seconds period.
To enable Taint Based Evictions:
Modify the master configuration file (/etc/origin/master/master-config.yaml) to add the following to the
kubernetesMasterConfig
section:kubernetesMasterConfig:
controllerArguments:
feature-gates:
- "TaintBasedEvictions=true"
Check that the taint is added to a node:
oc describe node $node | grep -i taint
Taints: node.alpha.kubernetes.io/not-ready:NoExecute
Restart OpenShift for the changes to take effect:
# master-restart api
# master-restart controllers
Add a toleration to pods:
tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
or
tolerations:
- key: "node.alpha.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
To maintain the existing rate limiting behavior of pod evictions due to node problems, the system adds the taints in a rate-limited way. This prevents massive pod evictions in scenarios such as the master becoming partitioned from the nodes. |
Daemonsets and Tolerations
DaemonSet pods are created with NoExecute
tolerations for node.alpha.kubernetes.io/unreachable
and node.alpha.kubernetes.io/not-ready
with no tolerationSeconds
to ensure that DaemonSet pods are never evicted due to these problems, even when the Default Toleration Seconds feature is disabled.
Examples
Taints and tolerations are a flexible way to steer pods away from nodes or evict pods that should not be running on a node. A few of typical scenrios are:
Dedicating a Node for a User
You can specify a set of nodes for exclusive use by a particular set of users.
To specify dedicated nodes:
Add a taint to those nodes:
For example:
$ oc adm taint nodes node1 dedicated=groupName:NoSchedule
Add a corresponding toleration to the pods by writing a custom admission controller.
Only the pods with the tolerations are allowed to use the dedicated nodes.
Binding a User to a Node
You can configure a node so that particular users can use only the dedicated nodes.
To configure a node so that users can use only that node:
Add a taint to those nodes:
For example:
$ oc adm taint nodes node1 dedicated=groupName:NoSchedule
Add a corresponding toleration to the pods by writing a custom admission controller.
The admission controller should add a node affinity to require that the pods can only schedule onto nodes labeled with the
key:value
label (dedicated=groupName
).Add a label similar to the taint (such as the
key:value
label) to the dedicated nodes.
Nodes with Special Hardware
In a cluster where a small subset of nodes have specialized hardware (for example GPUs), you can use taints and tolerations to keep pods that do not need the specialized hardware off of those nodes, leaving the nodes for pods that do need the specialized hardware. You can also require pods that need specialized hardware to use specific nodes.
To ensure pods are blocked from the specialized hardware:
Taint the nodes that have the specialized hardware using one of the following commands:
$ oc adm taint nodes <node-name> disktype=ssd:NoSchedule
$ oc adm taint nodes <node-name> disktype=ssd:PreferNoSchedule
Adding a corresponding toleration to pods that use the special hardware using an admission controller.
For example, the admission controller could use some characteristic(s) of the pod to determine that the pod should be allowed to use the special nodes by adding a toleration.
To ensure pods can only use the specialized hardware, you need some additional mechanism. For example, you could label the nodes that have the special hardware and use node affinity on the pods that need the hardware.