Cluster Accurate Scheduler Estimator For Rescheduling

Users could divide their replicas of a workload into different clusters in terms of available resources of member clusters. When some clusters are lack of resources, scheduler would not assign excessive replicas into these clusters by calling karmada-scheduler-estimator.

Prerequisites

Karmada has been installed

We can install Karmada by referring to quick-start, or directly run hack/local-up-karmada.sh script which is also used to run our E2E cases.

Member cluster component is ready

Ensure that all member clusters have been joined and their corresponding karmada-scheduler-estimator is installed into karmada-host.

You could check by using the following command:

  1. # check whether the member cluster has been joined
  2. $ kubectl get cluster
  3. NAME VERSION MODE READY AGE
  4. member1 v1.19.1 Push True 11m
  5. member2 v1.19.1 Push True 11m
  6. member3 v1.19.1 Pull True 5m12s
  7. # check whether the karmada-scheduler-estimator of a member cluster has been working well
  8. $ kubectl --context karmada-host get pod -n karmada-system | grep estimator
  9. karmada-scheduler-estimator-member1-696b54fd56-xt789 1/1 Running 0 77s
  10. karmada-scheduler-estimator-member2-774fb84c5d-md4wt 1/1 Running 0 75s
  11. karmada-scheduler-estimator-member3-5c7d87f4b4-76gv9 1/1 Running 0 72s
  • If the cluster has not been joined, you could use hack/deploy-agent-and-estimator.sh to deploy both karmada-agent and karmada-scheduler-estimator.
  • If the cluster has been joined already, you could use hack/deploy-scheduler-estimator.sh to only deploy karmada-scheduler-estimator.

Scheduler option ‘—enable-scheduler-estimator’

After all member clusters have been joined and estimators are all ready, please specify the option --enable-scheduler-estimator=true to enable scheduler estimator.

  1. # edit the deployment of karmada-scheduler
  2. kubectl --context karmada-host edit -n karmada-system deployments.apps karmada-scheduler

And then add the option --enable-scheduler-estimator=true into the command of container karmada-scheduler.

Example

Now we could divide the replicas into different member clusters. Note that propagationPolicy.spec.replicaScheduling.replicaSchedulingType must be Divided and propagationPolicy.spec.replicaScheduling.replicaDivisionPreference must be Aggregated. The scheduler will try to divide the replicas aggregately in terms of all available resources of member clusters.

  1. apiVersion: policy.karmada.io/v1alpha1
  2. kind: PropagationPolicy
  3. metadata:
  4. name: aggregated-policy
  5. spec:
  6. resourceSelectors:
  7. - apiVersion: apps/v1
  8. kind: Deployment
  9. name: nginx
  10. placement:
  11. clusterAffinity:
  12. clusterNames:
  13. - member1
  14. - member2
  15. - member3
  16. replicaScheduling:
  17. replicaSchedulingType: Divided
  18. replicaDivisionPreference: Aggregated
  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: nginx
  5. labels:
  6. app: nginx
  7. spec:
  8. replicas: 5
  9. selector:
  10. matchLabels:
  11. app: nginx
  12. template:
  13. metadata:
  14. labels:
  15. app: nginx
  16. spec:
  17. containers:
  18. - image: nginx
  19. name: nginx
  20. ports:
  21. - containerPort: 80
  22. name: web-1
  23. resources:
  24. requests:
  25. cpu: "1"
  26. memory: 2Gi

You will find all replicas have been assigned to as few clusters as possible.

  1. $ kubectl get deployments.apps
  2. NAME READY UP-TO-DATE AVAILABLE AGE
  3. nginx 5/5 5 5 2m16s
  4. $ kubectl get rb nginx-deployment -o=custom-columns=NAME:.metadata.name,CLUSTER:.spec.clusters
  5. NAME CLUSTER
  6. nginx-deployment [map[name:member1 replicas:5] map[name:member2] map[name:member3]]

After that, we change the resource request of the deployment to a large number and have a try again.

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: nginx
  5. labels:
  6. app: nginx
  7. spec:
  8. replicas: 5
  9. selector:
  10. matchLabels:
  11. app: nginx
  12. template:
  13. metadata:
  14. labels:
  15. app: nginx
  16. spec:
  17. containers:
  18. - image: nginx
  19. name: nginx
  20. ports:
  21. - containerPort: 80
  22. name: web-1
  23. resources:
  24. requests:
  25. cpu: "100"
  26. memory: 200Gi

As any node of member clusters does not have so many cpu and memory resources, we will find workload scheduling failed.

  1. $ kubectl get deployments.apps
  2. NAME READY UP-TO-DATE AVAILABLE AGE
  3. nginx 0/5 0 0 2m20s
  4. $ kubectl get rb nginx-deployment -o=custom-columns=NAME:.metadata.name,CLUSTER:.spec.clusters
  5. NAME CLUSTER
  6. nginx-deployment <none>