PodUnavailableBudget

FEATURE STATE: Kruise v0.10.0

在诸多Voluntary Disruption 场景中 Kubernetes Pod Disruption Budget 通过限制同时中断的Pod数量,来保证应用的高可用性。然而,PDB只能防控通过 Eviction API 来触发的Pod Disruption,例如:kubectl drain驱逐node上面的所有Pod。

但在如下voluntary disruption场景中,即便有kubernetes PDB防护依然将会导致业务中断、服务降级:

  1. 应用owner通过deployment正在进行版本升级,与此同时集群管理员由于机器资源利用率过低正在进行node缩容。
  2. 中间件团队利用sidecarSet正在原地升级集群中的sidecar版本(例如:ServiceMesh envoy),同时HPA正在对同一批应用进行缩容。
  3. 应用owner和中间件团队利用cloneSet、sidecarSet原地升级的能力,正在对同一批Pod进行升级。

在上面这些 kubernetes PDB 无法很好防护的场景中,Kruise PodUnavailableBudget 通过对Pod Mutating Webhook的拦截,能够覆盖更多的Voluntary Disruption场景,进而提供应用更加强大的防护能力。

API定义

  1. apiVersion: policy.kruise.io/v1alpha1
  2. kind: PodUnavailableBudget
  3. metadata:
  4. name: web-server-pub
  5. namespace: web
  6. spec:
  7. targetRef:
  8. apiVersion: apps.kruise.io/v1alpha1
  9. # cloneset, deployment, statefulset etc.
  10. kind: CloneSet
  11. name: web-server
  12. # selector label query over pods managed by the budget
  13. # selector and TargetReference are mutually exclusive, targetRef is priority to take effect.
  14. # selector is commonly used in scenarios where applications are deployed using multiple workloads,
  15. # and targetRef is used for protection against a single workload.
  16. # selector:
  17. # matchLabels:
  18. # app: web-server
  19. # maximum number of Pods unavailable for the current cloneset, the example is cloneset.replicas(5) * 60% = 3
  20. # maxUnavailable and minAvailable are mutually exclusive, maxUnavailable is priority to take effect
  21. maxUnavailable: 60%
  22. # Minimum number of Pods available for the current cloneset, the example is cloneset.replicas(5) * 40% = 2
  23. # minAvailable: 40%
  24. -----------------------
  25. apiVersion: apps.kruise.io/v1alpha1
  26. kind: CloneSet
  27. metadata:
  28. labels:
  29. app: web-server
  30. name: web-server
  31. namespace: web
  32. spec:
  33. replicas: 5
  34. selector:
  35. matchLabels:
  36. app: web-server
  37. template:
  38. metadata:
  39. labels:
  40. app: web-server
  41. spec:
  42. containers:
  43. - name: nginx
  44. image: nginx:alpine

支持自定义Workload

FEATURE STATE: Kruise v1.2.0

很多公司为满足复杂性更高的应用部署需求,往往会通过实现定制化Workload的方式来管理业务Pod。从kruise v1.2.0开始,pub能够防护实现了scale子资源的自定义Workload,如下防护Argo-Rollout:

  1. apiVersion: policy.kruise.io/v1alpha1
  2. kind: PodUnavailableBudget
  3. metadata:
  4. name: rollouts-demo
  5. spec:
  6. targetRef:
  7. apiVersion: argoproj.io/v1alpha1
  8. kind: Rollout
  9. name: rollouts-demo
  10. minAvailable: 80%

Implementation

PUB实现原理如下,详细设计请参考:Pub Proposal

PodUnavailableBudget

Comparison with Kubernetes native PDB

Kubernetes PDB是通过Eviction API接口来实现Pod安全防护,而Kruise PDB则是拦截了Pod Validating Request来实现诸多Voluntary Disruption场景的防护能力。 Kruise PUB包含了PDB的所有能力(防护Pod Eviction),业务可以根据需要两者同时使用,也可以单独使用Kruise PUB(推荐方式)。

feature-gates

PodUnavailableBudget Pod安全防护默认是关闭的,如果要开启请通过设置 feature-gates PodUnavailableBudgetDeleteGatePodUnavailableBudgetUpdateGate.

  1. $ helm install kruise https://... --set featureGates="PodUnavailableBudgetDeleteGate=true\,PodUnavailableBudgetUpdateGate=true"

PodUnavailableBudget Status

  1. # kubectl describe podunavailablebudgets web-server-pub
  2. Name: web-server-pub
  3. Kind: PodUnavailableBudget
  4. Status:
  5. unavailableAllowed: 3 # unavailableAllowed number of pod unavailable that are currently allowed
  6. currentAvailable: 5 # currentAvailable current number of available pods
  7. desiredAvailable: 2 # desiredAvailable minimum desired number of available pods
  8. totalReplicas: 5 # totalReplicas total number of pods counted by this PUB