Crane-scheduler

Crane-scheduler 介绍

Crane-scheduler 是一组基于scheduler framework的调度插件, 包含:

开始

安装 Prometheus

确保你的 Kubernetes 集群已安装 Prometheus。如果没有,请参考Install Prometheus.

配置 Prometheus 规则

配置 Prometheus 的规则以获取预期的聚合数据:

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: PrometheusRule
  3. metadata:
  4. name: example-record
  5. spec:
  6. groups:
  7. - name: cpu_mem_usage_active
  8. interval: 30s
  9. rules:
  10. - record: cpu_usage_active
  11. expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100)
  12. - record: mem_usage_active
  13. expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)
  14. - name: cpu-usage-5m
  15. interval: 5m
  16. rules:
  17. - record: cpu_usage_max_avg_1h
  18. expr: max_over_time(cpu_usage_avg_5m[1h])
  19. - record: cpu_usage_max_avg_1d
  20. expr: max_over_time(cpu_usage_avg_5m[1d])
  21. - name: cpu-usage-1m
  22. interval: 1m
  23. rules:
  24. - record: cpu_usage_avg_5m
  25. expr: avg_over_time(cpu_usage_active[5m])
  26. - name: mem-usage-5m
  27. interval: 5m
  28. rules:
  29. - record: mem_usage_max_avg_1h
  30. expr: max_over_time(mem_usage_avg_5m[1h])
  31. - record: mem_usage_max_avg_1d
  32. expr: max_over_time(mem_usage_avg_5m[1d])
  33. - name: mem-usage-1m
  34. interval: 1m
  35. rules:
  36. - record: mem_usage_avg_5m
  37. expr: avg_over_time(mem_usage_active[5m])

!!! warning “️Troubleshooting”

  1. Prometheus 的采样间隔必须小于30秒,不然可能会导致规则无法正常生效。如:`cpu_usage_active`

安装 Crane-scheduler

有两种选择:

  • 安装 Crane-scheduler 作为第二个调度器
  • 用 Crane-scheduler 替换原生 Kube-scheduler

安装 Crane-scheduler 作为第二个调度器

\=== “Main”

  1. ```bash
  2. helm repo add crane https://gocrane.github.io/helm-charts
  3. helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
  1. \=== Mirror
  1. helm repo add crane https://finops-helm.pkg.coding.net/gocrane/gocrane
  2. helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
  1. #### 用 Crane-scheduler 替换原生 Kube-scheduler
  2. 1. 备份`/etc/kubernetes/manifests/kube-scheduler.yaml`

cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/

  1. 2. 通过修改 kube-scheduler 的配置文件(`scheduler-config.yaml` ) 启用动态调度插件并配置插件参数:

apiVersion: kubescheduler.config.k8s.io/v1beta2 kind: KubeSchedulerConfiguration … profiles:

  • schedulerName: default-scheduler plugins: filter: enabled:
    • name: Dynamic score: enabled:
    • name: Dynamic weight: 3 pluginConfig:
      • name: Dynamic args: policyConfigPath: /etc/kubernetes/policy.yaml … ```
  1. 新建/etc/kubernetes/policy.yaml,用作动态插件的调度策略:
  1. apiVersion: scheduler.policy.crane.io/v1alpha1
  2. kind: DynamicSchedulerPolicy
  3. spec:
  4. syncPolicy:
  5. ##cpu usage
  6. - name: cpu_usage_avg_5m
  7. period: 3m
  8. - name: cpu_usage_max_avg_1h
  9. period: 15m
  10. - name: cpu_usage_max_avg_1d
  11. period: 3h
  12. ##memory usage
  13. - name: mem_usage_avg_5m
  14. period: 3m
  15. - name: mem_usage_max_avg_1h
  16. period: 15m
  17. - name: mem_usage_max_avg_1d
  18. period: 3h
  19. predicate:
  20. ##cpu usage
  21. - name: cpu_usage_avg_5m
  22. maxLimitPecent: 0.65
  23. - name: cpu_usage_max_avg_1h
  24. maxLimitPecent: 0.75
  25. ##memory usage
  26. - name: mem_usage_avg_5m
  27. maxLimitPecent: 0.65
  28. - name: mem_usage_max_avg_1h
  29. maxLimitPecent: 0.75
  30. priority:
  31. ##cpu usage
  32. - name: cpu_usage_avg_5m
  33. weight: 0.2
  34. - name: cpu_usage_max_avg_1h
  35. weight: 0.3
  36. - name: cpu_usage_max_avg_1d
  37. weight: 0.5
  38. ##memory usage
  39. - name: mem_usage_avg_5m
  40. weight: 0.2
  41. - name: mem_usage_max_avg_1h
  42. weight: 0.3
  43. - name: mem_usage_max_avg_1d
  44. weight: 0.5
  45. hotValue:
  46. - timeRange: 5m
  47. count: 5
  48. - timeRange: 1m
  49. count: 2
  1. 修改kube-scheduler.yaml并用 Crane-scheduler的镜像替换 kube-scheduler 镜像:
  1. ...
  2. image: docker.io/gocrane/crane-scheduler:0.0.23
  3. ...
  1. 安装crane-scheduler-controller: === “Main”

    1. kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/rbac.yaml
    2. kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/deployment.yaml

\=== “Mirror”

  1. ```bash
  2. kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/rbac.yaml
  3. kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/deployment.yaml
  1. ### 使用 Crane-scheduler 调度 Pod
  2. 使用以下示例测试 Crane-scheduler

apiVersion: apps/v1 kind: Deployment metadata: name: cpu-stress spec: selector: matchLabels: app: cpu-stress replicas: 1 template: metadata: labels: app: cpu-stress spec: schedulerName: crane-scheduler hostNetwork: true tolerations:

  1. - key: node.kubernetes.io/network-unavailable
  2. operator: Exists
  3. effect: NoSchedule
  4. containers:
  5. - name: stress
  6. image: docker.io/gocrane/stress:latest
  7. command: ["stress", "-c", "1"]
  8. resources:
  9. requests:
  10. memory: "1Gi"
  11. cpu: "1"
  12. limits:
  13. memory: "1Gi"
  14. cpu: "1"
  1. !!! Note

如果想将crane-scheduler用作默认调度器,请将crane-scheduler更改为default-scheduler

  1. 如果测试 pod 调度成功,将会有以下事件:

Type Reason Age From Message


Normal Scheduled 28s crane-scheduler Successfully assigned default/cpu-stress-7669499b57-zmrgb to vm-162-247-ubuntu ```