Kubernetes Scheduler
In Kubernetes, scheduling refers to making sure that PodsThe smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster.are matched to NodesA node is a worker machine in Kubernetes. so thatKubeletAn agent that runs on each node in the cluster. It makes sure that containers are running in a pod. can run them.
Scheduling overview
A scheduler watches for newly created Pods that have no Node assigned. Forevery Pod that the scheduler discovers, the scheduler becomes responsiblefor finding the best Node for that Pod to run on. The scheduler reachesthis placement decision taking into account the scheduling principlesdescribed below.
If you want to understand why Pods are placed onto a particular Node,or if you’re planning to implement a custom scheduler yourself, thispage will help you learn about scheduling.
kube-scheduler
kube-scheduleris the default scheduler for Kubernetes and runs as part of thecontrol planeThe container orchestration layer that exposes the API and interfaces to define, deploy, and manage the lifecycle of containers..kube-scheduler is designed so that, if you want and need to, you canwrite your own scheduling component and use that instead.
For every newly created pods or other unscheduled pods, kube-schedulerselects a optimal node for them to run on. However, every container inpods has different requirements for resources and every pod also hasdifferent requirements. Therefore, existing nodes need to be filteredaccording to the specific scheduling requirements.
In a cluster, Nodes that meet the scheduling requirements for a Podare called feasible nodes. If none of the nodes are suitable, the podremains unscheduled until the scheduler is able to place it.
The scheduler finds feasible Nodes for a Pod and then runs a set offunctions to score the feasible Nodes and picks a Node with the highestscore among the feasible ones to run the Pod. The scheduler then notifiesthe API server about this decision in a process called binding.
Factors that need taken into account for scheduling decisions includeindividual and collective resource requirements, hardware / software /policy constraints, affinity and anti-affinity specifications, datalocality, inter-workload interference, and so on.
Scheduling with kube-scheduler
kube-scheduler selects a node for the pod in a 2-step operation:
Filtering
Scoring
The filtering step finds the set of Nodes where it’s feasible toschedule the Pod. For example, the PodFitsResources filter checks whether acandidate Node has enough available resource to meet a Pod’s specificresource requests. After this step, the node list contains any suitableNodes; often, there will be more than one. If the list is empty, thatPod isn’t (yet) schedulable.
In the scoring step, the scheduler ranks the remaining nodes to choosethe most suitable Pod placement. The scheduler assigns a score to each Nodethat survived filtering, basing this score on the active scoring rules.
Finally, kube-scheduler assigns the Pod to the Node with the highest ranking.If there is more than one node with equal scores, kube-scheduler selectsone of these at random.
Default policies
kube-scheduler has a default set of scheduling policies.
Filtering
PodFitsHostPorts
: Checks if a Node has free ports (the network protocol kind)for the Pod ports the Pod is requesting.PodFitsHost
: Checks if a Pod specifies a specific Node by its hostname.PodFitsResources
: Checks if the Node has free resources (eg, CPU and Memory)to meet the requirement of the Pod.PodMatchNodeSelector
: Checks if a Pod’s Node SelectorAllows users to filter a list of resources based on labels.matches the Node’s label(s)Tags objects with identifying attributes that are meaningful and relevant to users..NoVolumeZoneConflict
: Evaluate if the VolumesA directory containing data, accessible to the containers in a pod.that a Pod requests are available on the Node, given the failure zone restrictions forthat storage.NoDiskConflict
: Evaluates if a Pod can fit on a Node due to the volumes it requests,and those that are already mounted.MaxCSIVolumeCount
: Decides how many CSIThe Container Storage Interface (CSI) defines a standard interface to expose storage systems to containers.volumes should be attached, and whether that’s over a configured limit.CheckNodeMemoryPressure
: If a Node is reporting memory pressure, and there’s noconfigured exception, the Pod won’t be scheduled there.CheckNodePIDPressure
: If a Node is reporting that process IDs are scarce, andthere’s no configured exception, the Pod won’t be scheduled there.CheckNodeDiskPressure
: If a Node is reporting storage pressure (a filesystem thatis full or nearly full), and there’s no configured exception, the Pod won’t bescheduled there.CheckNodeCondition
: Nodes can report that they have a completely full filesystem,that networking isn’t available or that kubelet is otherwise not ready to run Pods.If such a condition is set for a Node, and there’s no configured exception, the Podwon’t be scheduled there.PodToleratesNodeTaints
: checks if a Pod’s tolerationsA core object consisting of three required properties: key, value, and effect. Tolerations enable the scheduling of pods on nodes or node groups that have a matching taint.can tolerate the Node’s taintsA core object consisting of three required properties: key, value, and effect. Taints prevent the scheduling of pods on nodes or node groups..CheckVolumeBinding
: Evaluates if a Pod can fit due to the volumes it requests.This applies for both bound and unboundPVCsClaims storage resources defined in a PersistentVolume so that it can be mounted as a volume in a container..
Scoring
SelectorSpreadPriority
: Spreads Pods across hosts, considering Pods thatbelong to the same ServiceA way to expose an application running on a set of Pods as a network service.,StatefulSetManages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. orReplicaSetReplicaSet ensures that a specified number of Pod replicas are running at one time.InterPodAffinityPriority
: Computes a sum by iterating through the elementsof weightedPodAffinityTerm and adding “weight” to the sum if the correspondingPodAffinityTerm is satisfied for that node; the node(s) with the highest sumare the most preferred.LeastRequestedPriority
: Favors nodes with fewer requested resources. In otherwords, the more Pods that are placed on a Node, and the more resources thosePods use, the lower the ranking this policy will give.MostRequestedPriority
: Favors nodes with most requested resources. This policywill fit the scheduled Pods onto the smallest number of Nodes needed to run youroverall set of workloads.RequestedToCapacityRatioPriority
: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.BalancedResourceAllocation
: Favors nodes with balanced resource usage.NodePreferAvoidPodsPriority
: Prioritizes nodes according to the node annotationscheduler.alpha.kubernetes.io/preferAvoidPods
. You can use this to hint thattwo different Pods shouldn’t run on the same Node.NodeAffinityPriority
: Prioritizes nodes according to node affinity schedulingpreferences indicated in PreferredDuringSchedulingIgnoredDuringExecution.You can read more about this in Assigning Pods to Nodes.TaintTolerationPriority
: Prepares the priority list for all the nodes, based onthe number of intolerable taints on the node. This policy adjusts a node’s ranktaking that list into account.ImageLocalityPriority
: Favors nodes that already have thecontainer imagesStored instance of a container that holds a set of software needed to run an application. for thatPod cached locally.ServiceSpreadingPriority
: For a given Service, this policy aims to make sure thatthe Pods for the Service run on different nodes. It favours scheduling onto nodesthat don’t have Pods for the service already assigned there. The overall outcome isthat the Service becomes more resilient to a single Node failure.CalculateAntiAffinityPriorityMap
: This policy helps implementpod anti-affinity.EqualPriorityMap
: Gives an equal weight of one to all nodes.
What's next
- Read about scheduler performance tuning
- Read about Pod topology spread constraints
- Read the reference documentation for kube-scheduler
- Learn about configuring multiple schedulers
- Learn about topology management policies
- Learn about Pod Overhead
Feedback
Was this page helpful?
Thanks for the feedback. If you have a specific, answerable question about how to use Kubernetes, ask it onStack Overflow.Open an issue in the GitHub repo if you want toreport a problemorsuggest an improvement.