GangScheduling

简介

Koord-dscheduler 提供了 Gang Scheduling 满足 All-or-Nothing 调度需求。用户可以声明最小资源集合数,只有当已经完成调度资源数超过前面声明当前最小资源集合数才能触发节点绑定。 同时提供 StrictNonStrict 两个参数用于控制资源累积过程,区别于其他社区方案将提供 two-level Gang 描述用于更好匹配真实场景。

设置

前置条件

  • Kubernetes >= 1.18
  • Koordinator >= 0.70

安装

请确保 Kubernetes 集群已经安装 Koordinator 组件,如果没有安装,请参阅 安装

配置

GangScheduling 特性默认开启,无需修改 koord-scheduler 配置进行开启。

GangScheduling 使用手册

快速开始

Gang CRD 方式

1.创建 pod-group 资源

  1. apiVersion: scheduling.sigs.k8s.io/v1alpha1
  2. kind: PodGroup
  3. metadata:
  4. name: gang-example
  5. namespace: default
  6. spec:
  7. scheduleTimeoutSeconds: 100
  8. minMember: 2
  1. $ kubectl get pgs -n default
  2. NAME AGE
  3. gang-example 13s

2.创建子资源 pod1

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: pod-example1
  5. namespace: default
  6. labels:
  7. pod-group.scheduling.sigs.k8s.io: gang-example
  8. spec:
  9. schedulerName: koord-scheduler
  10. containers:
  11. - command:
  12. - sleep
  13. - 365d
  14. image: busybox
  15. imagePullPolicy: IfNotPresent
  16. name: curlimage
  17. resources:
  18. limits:
  19. cpu: 40m
  20. memory: 40Mi
  21. requests:
  22. cpu: 40m
  23. memory: 40Mi
  24. terminationMessagePath: /dev/termination-log
  25. terminationMessagePolicy: File
  26. restartPolicy: Always
  1. $ kubectl get pod -n default
  2. NAME READY STATUS RESTARTS AGE
  3. pod-example1 0/1 Pending 0 7s

3.创建子资源 pod2

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: pod-example2
  5. namespace: default
  6. labels:
  7. pod-group.scheduling.sigs.k8s.io: gang-example
  8. spec:
  9. schedulerName: koord-scheduler
  10. containers:
  11. - command:
  12. - sleep
  13. - 365d
  14. image: busybox
  15. imagePullPolicy: IfNotPresent
  16. name: curlimage
  17. resources:
  18. limits:
  19. cpu: 40m
  20. memory: 40Mi
  21. requests:
  22. cpu: 40m
  23. memory: 40Mi
  24. terminationMessagePath: /dev/termination-log
  25. terminationMessagePolicy: File
  26. restartPolicy: Always
  1. $ kubectl get pod -n default
  2. NAME READY STATUS RESTARTS AGE
  3. pod-example1 1/1 Running 0 53s
  4. pod-example2 1/1 Running 0 5s
  1. $ kubectl get pg gang-example -n default -o yaml
  1. apiVersion: scheduling.sigs.k8s.io/v1alpha1
  2. kind: PodGroup
  3. metadata:
  4. creationTimestamp: "2022-10-09T09:08:17Z"
  5. generation: 6
  6. spec:
  7. minMember: 1
  8. scheduleTimeoutSeconds: 100
  9. status:
  10. phase: Running
  11. running: 2
  12. scheduled: 2

Pod Annotaion 方式

1.创建子资源 pod1

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: pod-example1
  5. namespace: default
  6. annotations:
  7. gang.scheduling.koordinator.sh/name: "gang-example"
  8. gang.scheduling.koordinator.sh/min-available: "2"
  9. spec:
  10. schedulerName: koord-scheduler
  11. containers:
  12. - command:
  13. - sleep
  14. - 365d
  15. image: busybox
  16. imagePullPolicy: IfNotPresent
  17. name: curlimage
  18. resources:
  19. limits:
  20. cpu: 40m
  21. memory: 40Mi
  22. requests:
  23. cpu: 40m
  24. memory: 40Mi
  25. terminationMessagePath: /dev/termination-log
  26. terminationMessagePolicy: File
  27. restartPolicy: Always
  1. $ kubectl get pod -n default
  2. NAME READY STATUS RESTARTS AGE
  3. pod-example1 0/1 Pending 0 7s

2.创建子资源 pod2

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: pod-example2
  5. namespace: default
  6. annotations:
  7. gang.scheduling.koordinator.sh/name: "gang-example"
  8. gang.scheduling.koordinator.sh/min-available: "2"
  9. spec:
  10. schedulerName: koord-scheduler
  11. containers:
  12. - command:
  13. - sleep
  14. - 365d
  15. image: busybox
  16. imagePullPolicy: IfNotPresent
  17. name: curlimage
  18. resources:
  19. limits:
  20. cpu: 40m
  21. memory: 40Mi
  22. requests:
  23. cpu: 40m
  24. memory: 40Mi
  25. terminationMessagePath: /dev/termination-log
  26. terminationMessagePolicy: File
  27. restartPolicy: Always
  1. $ kubectl get pod -n default
  2. NAME READY STATUS RESTARTS AGE
  3. pod-example1 1/1 Running 0 53s
  4. pod-example2 1/1 Running 0 5s
  1. $ kubectl get pg gang-example -n default -o yaml
  1. apiVersion: scheduling.sigs.k8s.io/v1alpha1
  2. kind: PodGroup
  3. metadata:
  4. creationTimestamp: "2022-10-09T09:08:17Z"
  5. generation: 6
  6. spec:
  7. minMember: 1
  8. scheduleTimeoutSeconds: 100
  9. status:
  10. phase: Running
  11. running: 2
  12. scheduled: 2

Gang 调度调试接口:

  1. $ kubectl -n koordinator-system get lease koord-scheduler --no-headers | awk '{print $2}' | cut -d'_' -f1 | xargs -I {} kubectl -n koordinator-system get pod {} -o wide --no-headers | awk '{print $6}'
  2. 10.244.0.64
  3. $ curl 10.244.0.64:10251/apis/v1/plugins/Coscheduling/gang/default/gang-example
  1. {
  2. "boundChildren": {
  3. "default/pod-example1": {},
  4. "default/pod-example2": {}
  5. },
  6. "children": {
  7. "default/pod-example1": {},
  8. "default/pod-example2": {}
  9. },
  10. "childrenScheduleRoundMap": {
  11. "default/pod-example1": 2,
  12. "default/pod-example2": 2
  13. },
  14. "createTime": "2022-10-09T07:31:53Z",
  15. "gangFrom": "GangFromPodAnnotation",
  16. "gangGroup": null,
  17. "hasGangInit": true,
  18. "minRequiredNumber": 2,
  19. "mode": "Strict",
  20. "name": "default/gang-example",
  21. "onceResourceSatisfied": true,
  22. "scheduleCycle": 2,
  23. "scheduleCycleValid": true,
  24. "totalChildrenNum": 2,
  25. "waitTime": 600000000000,
  26. "waitingForBindChildren": {}
  27. }

Gang 调度高级配置

1.PodGroup Annotation 方式

  1. apiVersion: scheduling.sigs.k8s.io/v1alpha1
  2. kind: PodGroup
  3. metadata:
  4. name: gang-example1
  5. namespace: default
  6. annotations:
  7. gang.scheduling.koordinator.sh/total-number: "3"
  8. gang.scheduling.koordinator.sh/mode: "NonStrict"
  9. gang.scheduling.koordinator.sh/groups: "[\"default/gang-example1\", \"default/gang-example2\"]"
  10. spec:
  11. scheduleTimeoutSeconds: 100
  12. minMember: 2
  • gang.scheduling.koordinator.sh/total-number 用于配置 gang 内子资源总数。如果未配置,则使用 minMember 配置。
  • gang.scheduling.koordinator.sh/mode 用于配置 Gang 调度失败处理策略。支持 Strict\NonStrict 两种模式,默认为 Strict
  • gang.scheduling.koordinator.sh/groups 用于配置支持多个 gang 为一组完成 Gang 调度,用于支持多个 gang 之间有依赖关系的场景。

2.Pod Annotation 方式

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: pod-example2
  5. namespace: default
  6. annotations:
  7. gang.scheduling.koordinator.sh/name: "gang-example1"
  8. gang.scheduling.koordinator.sh/min-available: "2"
  9. gang.scheduling.koordinator.sh/total-number: "3"
  10. gang.scheduling.koordinator.sh/mode: "Strict\NonStrict"
  11. gang.scheduling.koordinator.sh/groups: "[\"default/gang-example1\", \"default/gang-example2\"]"
  12. gang.scheduling.koordinator.sh/waiting-time: "100s"
  13. spec:
  14. schedulerName: koord-scheduler
  15. containers:
  16. - command:
  17. - sleep
  18. - 365d
  19. image: busybox
  20. imagePullPolicy: IfNotPresent
  21. name: curlimage
  22. resources:
  23. limits:
  24. cpu: 40m
  25. memory: 40Mi
  26. requests:
  27. cpu: 40m
  28. memory: 40Mi
  29. terminationMessagePath: /dev/termination-log
  30. terminationMessagePolicy: File
  31. restartPolicy: Always
  • gang.scheduling.koordinator.sh/total-number 用于配置 gang 内子资源总数。如果未配置,则使用 gang.scheduling.koordinator.sh/min-available 配置。
  • gang.scheduling.koordinator.sh/mode 用于配置 Gang 调度失败处理策略。支持 Strict\NonStrict 两种模式,默认为 Strict
  • gang.scheduling.koordinator.sh/groups 用于配置支持多个 gang 为一组完成 Gang 调度,用于支持多个 gang 之间有依赖关系的场景。
  • gang.scheduling.koordinator.sh/waiting-time 用于配置自第一个 Pod 进入 Permit 阶段依赖的最大等待时间。

调度器高级配置

您可以在 helm 中修改 koord-scheduler-config.yaml 来调整 Coscheduling 配置,如下所示:

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: koord-scheduler-config
  5. namespace: {{ .Values.installation.namespace }}
  6. data:
  7. koord-scheduler-config: |
  8. apiVersion: kubescheduler.config.k8s.io/v1beta2
  9. kind: KubeSchedulerConfiguration
  10. leaderElection:
  11. leaderElect: true
  12. resourceLock: leases
  13. resourceName: koord-scheduler
  14. resourceNamespace: {{ .Values.installation.namespace }}
  15. profiles:
  16. - pluginConfig:
  17. - name: Coscheduling
  18. args:
  19. apiVersion: kubescheduler.config.k8s.io/v1beta2
  20. kind: CoschedulingArgs`
  21. defaultTimeout: 600s
  22. controllerWorkers: 1
  23. - name: ElasticQuota
  24. ...