Configure Multiple Schedulers

Kubernetes ships with a default scheduler that is described here. If the default scheduler does not suit your needs you can implement your own scheduler. Moreover, you can even run multiple schedulers simultaneously alongside the default scheduler and instruct Kubernetes what scheduler to use for each of your pods. Let’s learn how to run multiple schedulers in Kubernetes with an example.

A detailed description of how to implement a scheduler is outside the scope of this document. Please refer to the kube-scheduler implementation in pkg/scheduler in the Kubernetes source directory for a canonical example.

Before you begin

You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:

To check the version, enter kubectl version.

Package the scheduler

Package your scheduler binary into a container image. For the purposes of this example, you can use the default scheduler (kube-scheduler) as your second scheduler. Clone the Kubernetes source code from GitHub and build the source.

  1. git clone https://github.com/kubernetes/kubernetes.git
  2. cd kubernetes
  3. make

Create a container image containing the kube-scheduler binary. Here is the Dockerfile to build the image:

  1. FROM busybox
  2. ADD ./_output/local/bin/linux/amd64/kube-scheduler /usr/local/bin/kube-scheduler

Save the file as Dockerfile, build the image and push it to a registry. This example pushes the image to Google Container Registry (GCR). For more details, please read the GCR documentation. Alternatively you can also use the docker hub. For more details refer to the docker hub documentation.

  1. docker build -t gcr.io/my-gcp-project/my-kube-scheduler:1.0 . # The image name and the repository
  2. gcloud docker -- push gcr.io/my-gcp-project/my-kube-scheduler:1.0 # used in here is just an example

Define a Kubernetes Deployment for the scheduler

Now that you have your scheduler in a container image, create a pod configuration for it and run it in your Kubernetes cluster. But instead of creating a pod directly in the cluster, you can use a Deployment for this example. A Deployment manages a Replica Set which in turn manages the pods, thereby making the scheduler resilient to failures. Here is the deployment config. Save it as my-scheduler.yaml:

  1. admin/sched/my-scheduler.yaml
  1. apiVersion: v1
  2. kind: ServiceAccount
  3. metadata:
  4. name: my-scheduler
  5. namespace: kube-system
  6. ---
  7. apiVersion: rbac.authorization.k8s.io/v1
  8. kind: ClusterRoleBinding
  9. metadata:
  10. name: my-scheduler-as-kube-scheduler
  11. subjects:
  12. - kind: ServiceAccount
  13. name: my-scheduler
  14. namespace: kube-system
  15. roleRef:
  16. kind: ClusterRole
  17. name: system:kube-scheduler
  18. apiGroup: rbac.authorization.k8s.io
  19. ---
  20. apiVersion: rbac.authorization.k8s.io/v1
  21. kind: ClusterRoleBinding
  22. metadata:
  23. name: my-scheduler-as-volume-scheduler
  24. subjects:
  25. - kind: ServiceAccount
  26. name: my-scheduler
  27. namespace: kube-system
  28. roleRef:
  29. kind: ClusterRole
  30. name: system:volume-scheduler
  31. apiGroup: rbac.authorization.k8s.io
  32. ---
  33. apiVersion: rbac.authorization.k8s.io/v1
  34. kind: RoleBinding
  35. metadata:
  36. name: my-scheduler-extension-apiserver-authentication-reader
  37. namespace: kube-system
  38. roleRef:
  39. kind: Role
  40. name: extension-apiserver-authentication-reader
  41. apiGroup: rbac.authorization.k8s.io
  42. subjects:
  43. - kind: ServiceAccount
  44. name: my-scheduler
  45. namespace: kube-system
  46. ---
  47. apiVersion: v1
  48. kind: ConfigMap
  49. metadata:
  50. name: my-scheduler-config
  51. namespace: kube-system
  52. data:
  53. my-scheduler-config.yaml: |
  54. apiVersion: kubescheduler.config.k8s.io/v1beta2
  55. kind: KubeSchedulerConfiguration
  56. profiles:
  57. - schedulerName: my-scheduler
  58. leaderElection:
  59. leaderElect: false
  60. ---
  61. apiVersion: apps/v1
  62. kind: Deployment
  63. metadata:
  64. labels:
  65. component: scheduler
  66. tier: control-plane
  67. name: my-scheduler
  68. namespace: kube-system
  69. spec:
  70. selector:
  71. matchLabels:
  72. component: scheduler
  73. tier: control-plane
  74. replicas: 1
  75. template:
  76. metadata:
  77. labels:
  78. component: scheduler
  79. tier: control-plane
  80. version: second
  81. spec:
  82. serviceAccountName: my-scheduler
  83. containers:
  84. - command:
  85. - /usr/local/bin/kube-scheduler
  86. - --config=/etc/kubernetes/my-scheduler/my-scheduler-config.yaml
  87. image: gcr.io/my-gcp-project/my-kube-scheduler:1.0
  88. livenessProbe:
  89. httpGet:
  90. path: /healthz
  91. port: 10259
  92. scheme: HTTPS
  93. initialDelaySeconds: 15
  94. name: kube-second-scheduler
  95. readinessProbe:
  96. httpGet:
  97. path: /healthz
  98. port: 10259
  99. scheme: HTTPS
  100. resources:
  101. requests:
  102. cpu: '0.1'
  103. securityContext:
  104. privileged: false
  105. volumeMounts:
  106. - name: config-volume
  107. mountPath: /etc/kubernetes/my-scheduler
  108. hostNetwork: false
  109. hostPID: false
  110. volumes:
  111. - name: config-volume
  112. configMap:
  113. name: my-scheduler-config

In the above manifest, you use a KubeSchedulerConfiguration to customize the behavior of your scheduler implementation. This configuration has been passed to the kube-scheduler during initialization with the --config option. The my-scheduler-config ConfigMap stores the configuration file. The Pod of themy-scheduler Deployment mounts the my-scheduler-config ConfigMap as a volume.

In the aforementioned Scheduler Configuration, your scheduler implementation is represented via a KubeSchedulerProfile.

Note:

To determine if a scheduler is responsible for scheduling a specific Pod, the spec.schedulerName field in a PodTemplate or Pod manifest must match the schedulerName field of the KubeSchedulerProfile. All schedulers running in the cluster must have unique names.

Also, note that you create a dedicated service account my-scheduler and bind the ClusterRole system:kube-scheduler to it so that it can acquire the same privileges as kube-scheduler.

Please see the kube-scheduler documentation for detailed description of other command line arguments and Scheduler Configuration reference for detailed description of other customizable kube-scheduler configurations.

Run the second scheduler in the cluster

In order to run your scheduler in a Kubernetes cluster, create the deployment specified in the config above in a Kubernetes cluster:

  1. kubectl create -f my-scheduler.yaml

Verify that the scheduler pod is running:

  1. kubectl get pods --namespace=kube-system
  1. NAME READY STATUS RESTARTS AGE
  2. ....
  3. my-scheduler-lnf4s-4744f 1/1 Running 0 2m
  4. ...

You should see a “Running” my-scheduler pod, in addition to the default kube-scheduler pod in this list.

Enable leader election

To run multiple-scheduler with leader election enabled, you must do the following:

Update the following fields for the KubeSchedulerConfiguration in the my-scheduler-config ConfigMap in your YAML file:

  • leaderElection.leaderElect to true
  • leaderElection.resourceNamespace to <lock-object-namespace>
  • leaderElection.resourceName to <lock-object-name>

Note:

The control plane creates the lock objects for you, but the namespace must already exist. You can use the kube-system namespace.

If RBAC is enabled on your cluster, you must update the system:kube-scheduler cluster role. Add your scheduler name to the resourceNames of the rule applied for endpoints and leases resources, as in the following example:

  1. kubectl edit clusterrole system:kube-scheduler
  1. admin/sched/clusterrole.yaml
  1. apiVersion: rbac.authorization.k8s.io/v1
  2. kind: ClusterRole
  3. metadata:
  4. annotations:
  5. rbac.authorization.kubernetes.io/autoupdate: "true"
  6. labels:
  7. kubernetes.io/bootstrapping: rbac-defaults
  8. name: system:kube-scheduler
  9. rules:
  10. - apiGroups:
  11. - coordination.k8s.io
  12. resources:
  13. - leases
  14. verbs:
  15. - create
  16. - apiGroups:
  17. - coordination.k8s.io
  18. resourceNames:
  19. - kube-scheduler
  20. - my-scheduler
  21. resources:
  22. - leases
  23. verbs:
  24. - get
  25. - update
  26. - apiGroups:
  27. - ""
  28. resourceNames:
  29. - kube-scheduler
  30. - my-scheduler
  31. resources:
  32. - endpoints
  33. verbs:
  34. - delete
  35. - get
  36. - patch
  37. - update

Specify schedulers for pods

Now that your second scheduler is running, create some pods, and direct them to be scheduled by either the default scheduler or the one you deployed. In order to schedule a given pod using a specific scheduler, specify the name of the scheduler in that pod spec. Let’s look at three examples.

  • Pod spec without any scheduler name

    1. admin/sched/pod1.yaml
    1. apiVersion: v1
    2. kind: Pod
    3. metadata:
    4. name: no-annotation
    5. labels:
    6. name: multischeduler-example
    7. spec:
    8. containers:
    9. - name: pod-with-no-annotation-container
    10. image: registry.k8s.io/pause:2.0

    When no scheduler name is supplied, the pod is automatically scheduled using the default-scheduler.

    Save this file as pod1.yaml and submit it to the Kubernetes cluster.

    1. kubectl create -f pod1.yaml
  • Pod spec with default-scheduler

    1. admin/sched/pod2.yaml

    ``` apiVersion: v1 kind: Pod metadata: name: annotation-default-scheduler labels:

    1. name: multischeduler-example

    spec: schedulerName: default-scheduler containers:

    • name: pod-with-default-annotation-container image: registry.k8s.io/pause:2.0
  1. ```
  2. A scheduler is specified by supplying the scheduler name as a value to `spec.schedulerName`. In this case, we supply the name of the default scheduler which is `default-scheduler`.
  3. Save this file as `pod2.yaml` and submit it to the Kubernetes cluster.
  4. ```
  5. kubectl create -f pod2.yaml
  6. ```
  • Pod spec with my-scheduler

    1. admin/sched/pod3.yaml

    ``` apiVersion: v1 kind: Pod metadata: name: annotation-second-scheduler labels:

    1. name: multischeduler-example

    spec: schedulerName: my-scheduler containers:

    • name: pod-with-second-annotation-container image: registry.k8s.io/pause:2.0
  1. ```
  2. In this case, we specify that this pod should be scheduled using the scheduler that we deployed - `my-scheduler`. Note that the value of `spec.schedulerName` should match the name supplied for the scheduler in the `schedulerName` field of the mapping `KubeSchedulerProfile`.
  3. Save this file as `pod3.yaml` and submit it to the Kubernetes cluster.
  4. ```
  5. kubectl create -f pod3.yaml
  6. ```
  7. Verify that all three pods are running.
  8. ```
  9. kubectl get pods
  10. ```

Verifying that the pods were scheduled using the desired schedulers

In order to make it easier to work through these examples, we did not verify that the pods were actually scheduled using the desired schedulers. We can verify that by changing the order of pod and deployment config submissions above. If we submit all the pod configs to a Kubernetes cluster before submitting the scheduler deployment config, we see that the pod annotation-second-scheduler remains in “Pending” state forever while the other two pods get scheduled. Once we submit the scheduler deployment config and our new scheduler starts running, the annotation-second-scheduler pod gets scheduled as well.

Alternatively, you can look at the “Scheduled” entries in the event logs to verify that the pods were scheduled by the desired schedulers.

  1. kubectl get events

You can also use a custom scheduler configuration or a custom container image for the cluster’s main scheduler by modifying its static pod manifest on the relevant control plane nodes.