Disruptions
This guide is for application owners who want to buildhighly available applications, and thus need to understandwhat types of Disruptions can happen to Pods.
It is also for Cluster Administrators who want to perform automatedcluster actions, like upgrading and autoscaling clusters.
Voluntary and Involuntary Disruptions
Pods do not disappear until someone (a person or a controller) destroys them, orthere is an unavoidable hardware or system software error.
We call these unavoidable cases involuntary disruptions toan application. Examples are:
- a hardware failure of the physical machine backing the node
- cluster administrator deletes VM (instance) by mistake
- cloud provider or hypervisor failure makes VM disappear
- a kernel panic
- the node disappears from the cluster due to cluster network partition
- eviction of a pod due to the node being out-of-resources.
Except for the out-of-resources condition, all these conditionsshould be familiar to most users; they are not specificto Kubernetes.
We call other cases voluntary disruptions. These include bothactions initiated by the application owner and those initiated by a ClusterAdministrator. Typical application owner actions include:
- deleting the deployment or other controller that manages the pod
- updating a deployment’s pod template causing a restart
- directly deleting a pod (e.g. by accident)
Cluster Administrator actions include:
- Draining a node for repair or upgrade.
- Draining a node from a cluster to scale the cluster down (learn aboutCluster Autoscaling).
- Removing a pod from a node to permit something else to fit on that node.
These actions might be taken directly by the cluster administrator, or by automationrun by the cluster administrator, or by your cluster hosting provider.
Ask your cluster administrator or consult your cloud provider or distribution documentationto determine if any sources of voluntary disruptions are enabled for your cluster.If none are enabled, you can skip creating Pod Disruption Budgets.
Caution: Not all voluntary disruptions are constrained by Pod Disruption Budgets. For example, deleting deployments or pods bypasses Pod Disruption Budgets.
Dealing with Disruptions
Here are some ways to mitigate involuntary disruptions:
- Ensure your pod requests the resources it needs.
- Replicate your application if you need higher availability. (Learn about running replicatedstatelessand stateful applications.)
- For even higher availability when running replicated applications,spread applications across racks (usinganti-affinity)or across zones (if using amulti-zone cluster.)
The frequency of voluntary disruptions varies. On a basic Kubernetes cluster, there areno voluntary disruptions at all. However, your cluster administrator or hosting providermay run some additional services which cause voluntary disruptions. For example,rolling out node software updates can cause voluntary disruptions. Also, some implementationsof cluster (node) autoscaling may cause voluntary disruptions to defragment and compact nodes.Your cluster administrator or hosting provider should have documented what level of voluntarydisruptions, if any, to expect.
Kubernetes offers features to help run highly available applications at the sametime as frequent voluntary disruptions. We call this set of featuresDisruption Budgets.
How Disruption Budgets Work
An Application Owner can create a PodDisruptionBudget
object (PDB) for each application.A PDB limits the number of pods of a replicated application that are down simultaneously fromvoluntary disruptions. For example, a quorum-based application wouldlike to ensure that the number of replicas running is never brought below thenumber needed for a quorum. A web front end might want toensure that the number of replicas serving load never falls below a certainpercentage of the total.
Cluster managers and hosting providers should use tools whichrespect Pod Disruption Budgets by calling the Eviction APIinstead of directly deleting pods or deployments. Examples are the kubectl drain
commandand the Kubernetes-on-GCE cluster upgrade script (cluster/gce/upgrade.sh
).
When a cluster administrator wants to drain a nodethey use the kubectl drain
command. That tool tries to evict allthe pods on the machine. The eviction request may be temporarily rejected,and the tool periodically retries all failed requests until all podsare terminated, or until a configurable timeout is reached.
A PDB specifies the number of replicas that an application can tolerate having, relative to howmany it is intended to have. For example, a Deployment which has a .spec.replicas: 5
issupposed to have 5 pods at any given time. If its PDB allows for there to be 4 at a time,then the Eviction API will allow voluntary disruption of one, but not two pods, at a time.
The group of pods that comprise the application is specified using a label selector, the sameas the one used by the application’s controller (deployment, stateful-set, etc).
The “intended” number of pods is computed from the .spec.replicas
of the pods controller.The controller is discovered from the pods using the .metadata.ownerReferences
of the object.
PDBs cannot prevent involuntary disruptions fromoccurring, but they do count against the budget.
Pods which are deleted or unavailable due to a rolling upgrade to an application do countagainst the disruption budget, but controllers (like deployment and stateful-set)are not limited by PDBs when doing rolling upgrades – the handling of failuresduring application updates is configured in the controller spec.(Learn about updating a deployment.)
When a pod is evicted using the eviction API, it is gracefully terminated (seeterminationGracePeriodSeconds
in PodSpec.)
PDB Example
Consider a cluster with 3 nodes, node-1
through node-3
.The cluster is running several applications. One of them has 3 replicas initially calledpod-a
, pod-b
, and pod-c
. Another, unrelated pod without a PDB, called pod-x
, is also shown.Initially, the pods are laid out as follows:
node-1 | node-2 | node-3 |
---|---|---|
pod-a available | pod-b available | pod-c available |
pod-x available |
All 3 pods are part of a deployment, and they collectively have a PDB which requiresthere be at least 2 of the 3 pods to be available at all times.
For example, assume the cluster administrator wants to reboot into a new kernel version to fix a bug in the kernel.The cluster administrator first tries to drain node-1
using the kubectl drain
command.That tool tries to evict pod-a
and pod-x
. This succeeds immediately.Both pods go into the terminating
state at the same time.This puts the cluster in this state:
node-1 draining | node-2 | node-3 |
---|---|---|
pod-a terminating | pod-b available | pod-c available |
pod-x terminating |
The deployment notices that one of the pods is terminating, so it creates a replacementcalled pod-d
. Since node-1
is cordoned, it lands on another node. Something hasalso created pod-y
as a replacement for pod-x
.
(Note: for a StatefulSet, pod-a
, which would be called something like pod-0
, would needto terminate completely before its replacement, which is also called pod-0
but has adifferent UID, could be created. Otherwise, the example applies to a StatefulSet as well.)
Now the cluster is in this state:
node-1 draining | node-2 | node-3 |
---|---|---|
pod-a terminating | pod-b available | pod-c available |
pod-x terminating | pod-d starting | pod-y |
At some point, the pods terminate, and the cluster looks like this:
node-1 drained | node-2 | node-3 |
---|---|---|
pod-b available | pod-c available | |
pod-d starting | pod-y |
At this point, if an impatient cluster administrator tries to drain node-2
ornode-3
, the drain command will block, because there are only 2 availablepods for the deployment, and its PDB requires at least 2. After some time passes, pod-d
becomes available.
The cluster state now looks like this:
node-1 drained | node-2 | node-3 |
---|---|---|
pod-b available | pod-c available | |
pod-d available | pod-y |
Now, the cluster administrator tries to drain node-2
.The drain command will try to evict the two pods in some order, saypod-b
first and then pod-d
. It will succeed at evicting pod-b
.But, when it tries to evict pod-d
, it will be refused because that would leave onlyone pod available for the deployment.
The deployment creates a replacement for pod-b
called pod-e
.Because there are not enough resources in the cluster to schedulepod-e
the drain will again block. The cluster may end up in thisstate:
node-1 drained | node-2 | node-3 | no node |
---|---|---|---|
pod-b available | pod-c available | pod-e pending | |
pod-d available | pod-y |
At this point, the cluster administrator needs toadd a node back to the cluster to proceed with the upgrade.
You can see how Kubernetes varies the rate at which disruptionscan happen, according to:
- how many replicas an application needs
- how long it takes to gracefully shutdown an instance
- how long it takes a new instance to start up
- the type of controller
- the cluster’s resource capacity
Separating Cluster Owner and Application Owner Roles
Often, it is useful to think of the Cluster Managerand Application Owner as separate roles with limited knowledgeof each other. This separation of responsibilitiesmay make sense in these scenarios:
- when there are many application teams sharing a Kubernetes cluster, andthere is natural specialization of roles
- when third-party tools or services are used to automate cluster management
Pod Disruption Budgets support this separation of roles by providing aninterface between the roles.
If you do not have such a separation of responsibilities in your organization,you may not need to use Pod Disruption Budgets.
How to perform Disruptive Actions on your Cluster
If you are a Cluster Administrator, and you need to perform a disruptive action on allthe nodes in your cluster, such as a node or system software upgrade, here are some options:
- Accept downtime during the upgrade.
- Failover to another complete replica cluster.
- No downtime, but may be costly both for the duplicated nodesand for human effort to orchestrate the switchover.
- Write disruption tolerant applications and use PDBs.
- No downtime.
- Minimal resource duplication.
- Allows more automation of cluster administration.
- Writing disruption-tolerant applications is tricky, but the work to tolerate voluntarydisruptions largely overlaps with work to support autoscaling and toleratinginvoluntary disruptions.
What's next
Follow steps to protect your application by configuring a Pod Disruption Budget.
Learn more about draining nodes
Feedback
Was this page helpful?
Thanks for the feedback. If you have a specific, answerable question about how to use Kubernetes, ask it onStack Overflow.Open an issue in the GitHub repo if you want toreport a problemorsuggest an improvement.