Crane-scheduler

Scheduling pods based on actual node load

Overview

Crane-scheduler is a collection of scheduler plugins based on scheduler framework, including:

Get Started

Install Prometheus

Make sure your kubernetes cluster has Prometheus installed. If not, please refer to Install Prometheus.

Configure Prometheus Rules

Configure the rules of Prometheus to get expected aggregated data:

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: PrometheusRule
  3. metadata:
  4. name: example-record
  5. spec:
  6. groups:
  7. - name: cpu_mem_usage_active
  8. interval: 30s
  9. rules:
  10. - record: cpu_usage_active
  11. expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100)
  12. - record: mem_usage_active
  13. expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)
  14. - name: cpu-usage-5m
  15. interval: 5m
  16. rules:
  17. - record: cpu_usage_max_avg_1h
  18. expr: max_over_time(cpu_usage_avg_5m[1h])
  19. - record: cpu_usage_max_avg_1d
  20. expr: max_over_time(cpu_usage_avg_5m[1d])
  21. - name: cpu-usage-1m
  22. interval: 1m
  23. rules:
  24. - record: cpu_usage_avg_5m
  25. expr: avg_over_time(cpu_usage_active[5m])
  26. - name: mem-usage-5m
  27. interval: 5m
  28. rules:
  29. - record: mem_usage_max_avg_1h
  30. expr: max_over_time(mem_usage_avg_5m[1h])
  31. - record: mem_usage_max_avg_1d
  32. expr: max_over_time(mem_usage_avg_5m[1d])
  33. - name: mem-usage-1m
  34. interval: 1m
  35. rules:
  36. - record: mem_usage_avg_5m
  37. expr: avg_over_time(mem_usage_active[5m])

!!! warning “️Troubleshooting”

  1. The sampling interval of Prometheus must be less than 30 seconds, otherwise the above rules(such as cpu_usage_active) may not take effect.

Install Crane-scheduler

There are two options:

  • Install Crane-scheduler as a second scheduler
  • Replace native Kube-scheduler with Crane-scheduler

Install Crane-scheduler as a second scheduler

\=== “Main”

  1. ```bash
  2. helm repo add crane https://gocrane.github.io/helm-charts
  3. helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
  1. \=== Mirror
  1. helm repo add crane https://finops-helm.pkg.coding.net/gocrane/gocrane
  2. helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
  1. #### Replace native Kube-scheduler with Crane-scheduler
  2. 1. Backup `/etc/kubernetes/manifests/kube-scheduler.yaml`

cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/

  1. 2. Modify configfile of kube-scheduler(`scheduler-config.yaml`) to enable Dynamic scheduler plugin and configure plugin args:

apiVersion: kubescheduler.config.k8s.io/v1beta2 kind: KubeSchedulerConfiguration … profiles:

  • schedulerName: default-scheduler plugins: filter: enabled:
    • name: Dynamic score: enabled:
    • name: Dynamic weight: 3 pluginConfig:
      • name: Dynamic args: policyConfigPath: /etc/kubernetes/policy.yaml … ```
  1. Create /etc/kubernetes/policy.yaml, using as scheduler policy of Dynamic plugin:
  1. apiVersion: scheduler.policy.crane.io/v1alpha1
  2. kind: DynamicSchedulerPolicy
  3. spec:
  4. syncPolicy:
  5. ##cpu usage
  6. - name: cpu_usage_avg_5m
  7. period: 3m
  8. - name: cpu_usage_max_avg_1h
  9. period: 15m
  10. - name: cpu_usage_max_avg_1d
  11. period: 3h
  12. ##memory usage
  13. - name: mem_usage_avg_5m
  14. period: 3m
  15. - name: mem_usage_max_avg_1h
  16. period: 15m
  17. - name: mem_usage_max_avg_1d
  18. period: 3h
  19. predicate:
  20. ##cpu usage
  21. - name: cpu_usage_avg_5m
  22. maxLimitPecent: 0.65
  23. - name: cpu_usage_max_avg_1h
  24. maxLimitPecent: 0.75
  25. ##memory usage
  26. - name: mem_usage_avg_5m
  27. maxLimitPecent: 0.65
  28. - name: mem_usage_max_avg_1h
  29. maxLimitPecent: 0.75
  30. priority:
  31. ##cpu usage
  32. - name: cpu_usage_avg_5m
  33. weight: 0.2
  34. - name: cpu_usage_max_avg_1h
  35. weight: 0.3
  36. - name: cpu_usage_max_avg_1d
  37. weight: 0.5
  38. ##memory usage
  39. - name: mem_usage_avg_5m
  40. weight: 0.2
  41. - name: mem_usage_max_avg_1h
  42. weight: 0.3
  43. - name: mem_usage_max_avg_1d
  44. weight: 0.5
  45. hotValue:
  46. - timeRange: 5m
  47. count: 5
  48. - timeRange: 1m
  49. count: 2
  1. Modify kube-scheduler.yaml and replace kube-scheduler image with Crane-scheduler:
  1. ...
  2. image: docker.io/gocrane/crane-scheduler:0.0.23
  3. ...
  1. Install crane-scheduler-controller:

\=== “Main”

  1. ```bash
  2. kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/rbac.yaml
  3. kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/deployment.yaml
  1. \=== Mirror
  1. kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/rbac.yaml
  2. kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/deployment.yaml
  1. ### Schedule Pods With Crane-scheduler
  2. Test Crane-scheduler with following example:

apiVersion: apps/v1 kind: Deployment metadata: name: cpu-stress spec: selector: matchLabels: app: cpu-stress replicas: 1 template: metadata: labels: app: cpu-stress spec: schedulerName: crane-scheduler hostNetwork: true tolerations:

  1. - key: node.kubernetes.io/network-unavailable
  2. operator: Exists
  3. effect: NoSchedule
  4. containers:
  5. - name: stress
  6. image: docker.io/gocrane/stress:latest
  7. command: ["stress", "-c", "1"]
  8. resources:
  9. requests:
  10. memory: "1Gi"
  11. cpu: "1"
  12. limits:
  13. memory: "1Gi"
  14. cpu: "1"
  1. !!! Note Change `crane-scheduler` to `default-scheduler` if `crane-scheduler` is used as default.
  2. There will be the following event if the test pod is successfully scheduled:

Type Reason Age From Message


Normal Scheduled 28s crane-scheduler Successfully assigned default/cpu-stress-7669499b57-zmrgb to vm-162-247-ubuntu ```