UnitedDeployment

This controller provides a new way to manage pods in multi-domain by using multiple workloads. A high level description about this workload can be found in this blog post.

Different domains in one Kubernetes cluster are represented by multiple groups of nodes identified by labels. UnitedDeployment controller provisions one type of workload for each group of with corresponding matching NodeSelector, so that the pods created by individual workload will be scheduled to the target domain.

Each workload managed by UnitedDeployment is called a subset. Each domain should at least provide the capacity to run the replicas number of pods. Currently StatefulSet, Advanced StatefulSet, CloneSet and Deployment are the supported workloads.

API definition: https://github.com/openkruise/kruise/blob/master/apis/apps/v1alpha1/uniteddeployment_types.go

The below sample yaml presents a UnitedDeployment which manages three StatefulSet instances in three domains. The total number of managed pods is 6.

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: UnitedDeployment
  3. metadata:
  4. name: sample-ud
  5. spec:
  6. replicas: 6
  7. revisionHistoryLimit: 10
  8. selector:
  9. matchLabels:
  10. app: sample
  11. template:
  12. # statefulSetTemplate or advancedStatefulSetTemplate or cloneSetTemplate or deploymentTemplate
  13. statefulSetTemplate:
  14. metadata:
  15. labels:
  16. app: sample
  17. spec:
  18. selector:
  19. matchLabels:
  20. app: sample
  21. template:
  22. metadata:
  23. labels:
  24. app: sample
  25. spec:
  26. containers:
  27. - image: nginx:alpine
  28. name: nginx
  29. topology:
  30. subsets:
  31. - name: subset-a
  32. nodeSelectorTerm:
  33. matchExpressions:
  34. - key: node
  35. operator: In
  36. values:
  37. - zone-a
  38. replicas: 1
  39. - name: subset-b
  40. nodeSelectorTerm:
  41. matchExpressions:
  42. - key: node
  43. operator: In
  44. values:
  45. - zone-b
  46. replicas: 50%
  47. - name: subset-c
  48. nodeSelectorTerm:
  49. matchExpressions:
  50. - key: node
  51. operator: In
  52. values:
  53. - zone-c
  54. updateStrategy:
  55. manualUpdate:
  56. partitions:
  57. subset-a: 0
  58. subset-b: 0
  59. subset-c: 0
  60. type: Manual
  61. ...

Capacity Planning For Subsets (MaxReplicas)

FEATURE STATE: Kruise v1.5.1

UnitedDeployment offer the option to define the MaxReplicas for each subset, allowing you to effectively manage your resource allocation. For example, assuming there is an application that typically runs with a maximum of 4 replicas on regular nodes. However, if the number of replicas exceeds 4, the exceeded Pods will automatically scale them to elastic nodes.

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: UnitedDeployment
  3. metadata:
  4. name: sample-ud
  5. spec:
  6. replicas: 5
  7. selector:
  8. matchLabels:
  9. app: sample
  10. template:
  11. # statefulSetTemplate or advancedStatefulSetTemplate or cloneSetTemplate or deploymentTemplate
  12. cloneSetTemplate:
  13. ......
  14. topology:
  15. subsets:
  16. - name: normal-nodes
  17. maxReplicas: 4
  18. ......
  19. - name: elastic-nodes
  20. maxReplicas: null
  21. ......

The UnitedDeployment controller follows the following rules for scaling each subset if you set MaxReplicas:

  1. When scaling up, the UnitedDeployment controller considers the order specified in the subsets list;
  2. When scaling down, it obeys the reverse order of scaling up.

Please Note the following:

  1. You can NOT set both MaxReplicas and Replicas for any subset simultaneously.
  2. If MaxReplicas is left empty (null), there are no limitations imposed on the number of replicas for that particular subset.
  3. To prevent situations where all MaxReplicas requirements are met and no subsets can be scaled up, it is crucial to have at least one subset with an empty(null) MaxReplicas value.

Customize pod configuration of subset

FEATURE STATE: Kruise v1.5.0

Since v1.5.0, one can customize pod spec field other than nodeSelectorTerm and tolerations, e.g. env, resources.

Note: it is not recommended to customize subset image since it may cause chaos into update function.

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: UnitedDeployment
  3. metadata:
  4. name: sample-ud
  5. spec:
  6. replicas: 6
  7. revisionHistoryLimit: 10
  8. selector:
  9. matchLabels:
  10. app: sample
  11. template:
  12. # statefulSetTemplate or advancedStatefulSetTemplate or cloneSetTemplate or deploymentTemplate
  13. statefulSetTemplate:
  14. ...
  15. topology:
  16. subsets:
  17. - name: subset-a
  18. ...
  19. # patch container resources, env:
  20. patch:
  21. spec:
  22. containers:
  23. - name: main
  24. resources:
  25. limits:
  26. cpu: "2"
  27. memory: 800Mi
  28. env:
  29. - name: subset
  30. value: subset-a
  31. - name: subset-b
  32. ...
  33. # patch container resources, env:
  34. patch:
  35. spec:
  36. containers:
  37. - name: main
  38. resources:
  39. limits:
  40. cpu: "2"
  41. memory: 800Mi
  42. env:
  43. - name: subset
  44. value: subset-b

HPA UnitedDeployment

FEATURE STATE: Kruise v1.5.0

Horizontal Pod Autoscaler can support Custom Resource workload which has scale subresource. Since v1.5.0 you can HPA UnitedDeployment directly, as follows:

  1. apiVersion: autoscaling/v2beta1
  2. kind: HorizontalPodAutoscaler
  3. metadata:
  4. name: example-hpa
  5. namespace: default
  6. spec:
  7. minReplicas: 1
  8. maxReplicas: 3
  9. metrics:
  10. - resource:
  11. name: cpu
  12. targetAverageUtilization: 2
  13. type: Resource
  14. scaleTargetRef:
  15. apiVersion: apps.kruise.io/v1alpha1
  16. kind: UnitedDeployment
  17. name: sample-ud

Pod Distribution Management

This controller provides spec.topology to describe the pod distribution specification.

  1. // Topology defines the spread detail of each subset under UnitedDeployment.
  2. // A UnitedDeployment manages multiple homogeneous workloads which are called subset.
  3. // Each of subsets under the UnitedDeployment is described in Topology.
  4. type Topology struct {
  5. // Contains the details of each subset. Each element in this array represents one subset
  6. // which will be provisioned and managed by UnitedDeployment.
  7. // +optional
  8. Subsets []Subset `json:"subsets,omitempty"`
  9. }
  10. // Subset defines the detail of a subset.
  11. type Subset struct {
  12. // Indicates subset name as a DNS_LABEL, which will be used to generate
  13. // subset workload name prefix in the format '<deployment-name>-<subset-name>-'.
  14. // Name should be unique between all of the subsets under one UnitedDeployment.
  15. Name string `json:"name"`
  16. // Indicates the node selector to form the subset. Depending on the node selector,
  17. // pods provisioned could be distributed across multiple groups of nodes.
  18. // A subset's nodeSelectorTerm is not allowed to be updated.
  19. // +optional
  20. NodeSelectorTerm corev1.NodeSelectorTerm `json:"nodeSelectorTerm,omitempty"`
  21. // Indicates the tolerations the pods under this subset have.
  22. // A subset's tolerations is not allowed to be updated.
  23. // +optional
  24. Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
  25. // Indicates the number of the pod to be created under this subset. Replicas could also be
  26. // percentage like '10%', which means 10% of UnitedDeployment replicas of pods will be distributed
  27. // under this subset. If nil, the number of replicas in this subset is determined by controller.
  28. // Controller will try to keep all the subsets with nil replicas have average pods.
  29. // +optional
  30. Replicas *intstr.IntOrString `json:"replicas,omitempty"`
  31. }

topology.subsets specifies the desired group of subsets. A subset added to or removed from this array will be created or deleted by controller during reconcile. Each subset workload is created based on the description of UnitedDeployment spec.template. subset provides the necessary topology information to create a subset workload. Each subset has a unique name. A subset workload is created with the name prefix in format of <UnitedDeployment-name>-<Subset-name>-. Each subset will also be configured with the nodeSelector. When provisioning a StatefulSet subset, the nodeSelector will be added to the StatefulSet’s podTemplate, so that the Pods of the StatefulSet will be created with the expected node affinity.

By default, UnitedDeployment’s Pods are evenly distributed across all subsets. There are two scenarios the controller does not follow this policy:

The first one is to customize the distribution policy by indicating subset.replicas. A valid subset.replicas could be integer to represent a real replicas of pods or string in format of percentage like ‘40%’ to represent a fixed proportion of pods. Once a subset.replicas is given, the controller is going to reconcile to make sure each subset has the expected replicas. The subsets with empty subset.replicas will divide the remaining replicas evenly.

The other scenario is that the indicated subset replicas policy becomes invalid. For example, the UnitedDeployment’s spec.replicas is decremented to be less than the sum of all subset.replicas. In this case, the indicated subset.replicas is ineffective and the controller will automatically scale each subset’s replicas to match the total replicas number. The controller will try its best to apply this adjustment smoothly.

Pod Update Management

When spec.template is updated, a upgrade progress will be triggered. New template will be patch to each subset workload, which triggers subset controller to update their pods. Furthermore, if subset workload supports partition, like StatefulSet, AdvancedStatefulSet is also able to provide Manual update strategy.

  1. // UnitedDeploymentUpdateStrategy defines the update performance
  2. // when template of UnitedDeployment is changed.
  3. type UnitedDeploymentUpdateStrategy struct {
  4. // Type of UnitedDeployment update strategy.
  5. // Default is Manual.
  6. // +optional
  7. Type UpdateStrategyType `json:"type,omitempty"`
  8. // Includes all of the parameters a Manual update strategy needs.
  9. // +optional
  10. ManualUpdate *ManualUpdate `json:"manualUpdate,omitempty"`
  11. }
  12. // ManualUpdate is a update strategy which allows users to control the update progress
  13. // by providing the partition of each subset.
  14. type ManualUpdate struct {
  15. // Indicates number of subset partition.
  16. // +optional
  17. Partitions map[string]int32 `json:"partitions,omitempty"`
  18. }

Manual update strategy allows users to control the update progress by indicating the partition of each subset. The controller will pass the partition to each subset.