Extreme Elastic Schedule Solution Based on HPA and WorkloadSpread

Since 0.10.0 version,OpenKruise have proposed a multi-domain CRD with by-pass architecture, namely, WorkloadSpread. WorkloadSpread allows a Workload to distribute its Pods to different node, zone, even different clusters and providers, as well as to apply differential configurations in different domains.This CRD can give Workloads the ability of multi-domain scatter, elastic schedule and fine management in a non-intrusive manner.

In this page, we will take a simple web application as an example to help users build an automatic extreme elastic scheduling solution, combining with WorkloadSpread, KEDA, Prometheus and Alibaba Cloud Elastic Instances (ECI).

Introduction

Architecture

The architecture of this solution is as follows: arch

Special Note:

  • In the solution, the HPA configuration is managed by KEDA. KEDA is an enhanced autoscaling component based on HPA. Compared with the native HPA, KEDA has much richer user-defined metrics.

  • We take a trick that the metrics of Nginx instead of Web Pod are collected, because we want to reuse the open-source Nginx-Prometheus-Exporter to simplify this solution. It’s easier to use this exporter to explore the number of https links and other metrics. Most importantly, the traffic entering the Web Pod must go through the Nginx Ingress. Therefore, we are going to directly use the metrics of Nginx, and combine KEDA to implement the automatic scale feature.

  • At least version 1.21 is required by WorkloadSpread to manage Deployment, but ACK Kubernetes clusters currently supports up to version 1.20. Therefore, we have to take CloneSet as an example in this architecture.

Goals

Our goal is to fully automate the following actions:

  • When the traffic exceeded the threshold within a certain time window (the traffic here is defined as the smooth number of http connections per second, which can be defined according to actual needs), it will scale up replicas automatically;

    • When scaling up, the higher priority will be given to the fixed resource pool to schedule pod. When the fixed resource pool is insufficient or reached the MaxReplicas limit, the Pods will be automatically scheduled to the elastic resource pool;
  • When the traffic is lower than the threshold, it will scale down replicas automatically;

    • When scaling down, the Pods in the elastic resource pool will be deleted first.

Dependency Installation

We use a ACK Kubernetes Cluster with 3 ECS nodes and 1 Virtual-Kubelet (VK) node. ECS nodes correspond to the fixed resource pool, and VK node corresponds to the elastic resource pool.

  1. $ k get node
  2. NAME STATUS ROLES AGE VERSION
  3. us-west-1.192.168.0.47 Ready <none> 153d v1.20.11-aliyun.1
  4. us-west-1.192.168.0.48 Ready <none> 153d v1.20.11-aliyun.1
  5. us-west-1.192.168.0.49 Ready <none> 153d v1.20.11-aliyun.1
  6. virtual-kubelet-us-west-1a Ready agent 19d v1.20.11-aliyun.1

Installing OpenKruise

More details can be found in official installation document. We recommend installing the latest version OpenKruise.

Installing KEDA

KEDA is a Kubernetes-based event driven autoscaling component. It provides event driven scale for any container running in Kubernetes.

  1. $ helm repo add kedacore https://kedacore.github.io/charts
  2. $ helm repo update
  3. $ kubectl create namespace keda
  4. $ helm install keda kedacore/keda --namespace keda

Installing Ingress-Nginx-Controller

Firstly,Creating namespace:

  1. $ kubectl create ns ingress-nginx

Because this exporter needs to access the Nginx status API to get the number of http connections information, it is necessary to apply a ConfigMap related to the Nginx configuration before the installation, so as to expose the Nginx status API for the consumption by Nginx-Prometheus-Exporter:

  1. apiVersion: v1
  2. data:
  3. allow-snippet-annotations: "true"
  4. http-snippet: |
  5. server {
  6. listen 8080;
  7. server_name _ ;
  8. location /stub_status {
  9. stub_status on;
  10. }
  11. location / {
  12. return 404;
  13. }
  14. }
  15. kind: ConfigMap
  16. metadata:
  17. annotations:
  18. meta.helm.sh/release-name: ingress-nginx
  19. meta.helm.sh/release-namespace: ingress-nginx
  20. labels:
  21. app.kubernetes.io/component: controller
  22. app.kubernetes.io/instance: ingress-nginx
  23. app.kubernetes.io/managed-by: Helm
  24. app.kubernetes.io/name: ingress-nginx
  25. app.kubernetes.io/version: 1.1.0
  26. helm.sh/chart: ingress-nginx-4.0.13
  27. name: ingress-nginx-controller
  28. namespace: ingress-nginx

Prepare a values Yaml file to expose port 8080 when applying Ingress-Nginx controller deployment:

  1. # values.yaml
  2. controller:
  3. containerPort:
  4. http: 80
  5. https: 443
  6. status: 8080

installing Ingress-Nginx controller:

  1. $ helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --values values.yaml

80 and 443 ports provide services for external users via LoadBalancer type service, whereas the 8080 port is only used by internal exporter. Because the exporter and Prometheus can be deployed in the cluster, and they only provides services internally, therefore, the ClusterIP type service should be used to connect to the Nginx 8080 port, making it exposed only within the cluster:

  1. kind: Service
  2. apiVersion: v1
  3. metadata:
  4. name: ingress-nginx-controller-8080
  5. namespace: ingress-nginx
  6. spec:
  7. selector:
  8. app.kubernetes.io/component: controller
  9. app.kubernetes.io/instance: ingress-nginx
  10. app.kubernetes.io/name: ingress-nginx
  11. type: ClusterIP
  12. ports:
  13. - name: myapp
  14. port: 8080
  15. targetPort: status

Installing Nginx-Prometheus-Exporter

The status data exposed by Nginx does not follow the standard of Prometheus, so an exporter component is required for the data collection and format conversion. Here, we use Nginx-Prometheus-Exporter, which is provided by nginx community:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: ingress-nginx-exporter
  5. namespace: ingress-nginx
  6. labels:
  7. app: ingress-nginx-exporter
  8. spec:
  9. selector:
  10. matchLabels:
  11. app: ingress-nginx-exporter
  12. strategy:
  13. rollingUpdate:
  14. maxSurge: 1
  15. maxUnavailable: 1
  16. type: RollingUpdate
  17. template:
  18. metadata:
  19. labels:
  20. app: ingress-nginx-exporter
  21. spec:
  22. containers:
  23. - image: nginx/nginx-prometheus-exporter:0.10
  24. imagePullPolicy: IfNotPresent
  25. args:
  26. - -nginx.scrape-uri=http://ingress-nginx-controller-8080.ingress-nginx.svc.cluster.local:8080/stub_status
  27. name: main
  28. ports:
  29. - name: http
  30. containerPort: 9113
  31. protocol: TCP
  32. resources:
  33. limits:
  34. cpu: "200m"
  35. memory: "256Mi"

Installing Prometheus-Operator

  1. $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
  2. $ helm repo update
  3. $ helm install [RELEASE] prometheus-community/kube-prometheus-stack --namespace prometheus --create-namespace

The [RELEASE] used by us in the above command is kube-prometheus-stack-1640678515. This string determines some subsequent configurations. If it changed, the configurations of subsequent yaml files will also need to be changed.

After the installation of Prometheus, the following ServiceMonitor should be applied to monitor the status exposed by Ingress-Nginx:

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: ServiceMonitor
  3. metadata:
  4. labels:
  5. release: kube-prometheus-stack-1640678515
  6. name: ingress-nginx-monitor
  7. namespace: ingress-nginx
  8. spec:
  9. selector:
  10. matchLabels:
  11. app: ingress-nginx-exporter
  12. endpoints:
  13. - interval: 5s
  14. port: exporter

Correctness Check

After the above dependency installation and configuration is completed, we need to check the correctness of them first.

Checking whether Nginx Status API is usable

Firstly, we apply a simple pod with /bin/sh and curl tools.

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: centos
  5. namespace: ingress-nginx
  6. spec:
  7. containers:
  8. - name: main
  9. image: centos:latest
  10. command: ["/bin/sh", "-c", "sleep 100000000"]

Then, execute kubectl exec command into this main container, and try to request the nginx status API by executing curl:

  1. $ k exec busybox -n ingress-nginx -it -- /bin/sh
  2. sh-4.4# curl -L http://ingress-nginx-controller-8080.ingress-nginx.svc.cluster.local:8080/stub_status
  3. Active connections: 6
  4. server accepts handled requests
  5. 12092 12092 23215
  6. Reading: 0 Writing: 1 Waiting: 5

If similar content is output after the above curl command is executed, it indicates that this API is usable.

Checking Whether Prometheus is usable

When we installed Prometheus operator using Helm, we also installed Grafana, a visual tool. Therefore, we can login to Grafana to check whether the metrics of Nginx we want have been collected.

Because Grafana is also deployed in the ACK cluster, if you want to use the local browser to access Grafana, you need to change the Grafana Service Type to LoadBalancer, so that ACK will automatically assign an external IP to Grafana. With this external IP, you can access Grafana using your local browser. The default user and password of Grafana can be parsed from the corresponding Secret:

  1. user: admin
  2. password: prom-operator

After logging into Grafana, click Explore in the navigation bar on the left, and you can see the list of Metrics collected and stored by Prometheus if you click the Metrics Browser. If the Metrics we pay attention to exist, it means that the configuration is correct.

Deployment

After the above environment is ready and everything is confirmed to be usable, then you can deploy the hello-web applications and elastic components.

Deploying Application

We’re going to deploy the hello-web application. If you access this application, it will return a simple HTML page with similar contents as follows:

  1. Hello Web
  2. Current Backend Server Info
  3. Server Name: hello-web-57b767f456-bnw24
  4. Server IP: 47.89.252.93
  5. Server Port: 80
  6. Current Client Request Info
  7. Request Time Float: 1640766227.537
  8. Client IP: 10.64.0.65
  9. Client Port: 52230
  10. User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
  11. Request Method: GET
  12. Thank you for using PHP.
  13. Request URI: /

Deploying Application using CloneSet:

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: CloneSet
  3. metadata:
  4. name: hello-web
  5. namespace: ingress-nginx
  6. labels:
  7. app: hello-web
  8. spec:
  9. replicas: 1
  10. selector:
  11. matchLabels:
  12. app: hello-web
  13. template:
  14. metadata:
  15. labels:
  16. app: hello-web
  17. spec:
  18. containers:
  19. - name: hello-web
  20. image: zhangsean/hello-web
  21. ports:
  22. - containerPort: 80
  23. resources:
  24. requests:
  25. cpu: "1"
  26. memory: "256Mi"
  27. limits:
  28. cpu: "2"
  29. memory: "512Mi"
  30. ---
  31. kind: Service
  32. apiVersion: v1
  33. metadata:
  34. name: hello-web
  35. namespace: ingress-nginx
  36. spec:
  37. type: ClusterIP
  38. selector:
  39. app: hello-web
  40. ports:
  41. - protocol: TCP
  42. port: 80
  43. targetPort: 80
  44. ---
  45. apiVersion: networking.k8s.io/v1
  46. kind: Ingress
  47. metadata:
  48. name: ingress-web
  49. namespace: ingress-nginx
  50. spec:
  51. rules:
  52. - http:
  53. paths:
  54. - path: /
  55. pathType: Prefix
  56. backend:
  57. service:
  58. name: hello-web
  59. port:
  60. number: 80
  61. ingressClassName: nginx

Deploying WorkloadSpread

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: WorkloadSpread
  3. metadata:
  4. name: workloadspread-sample
  5. namespace: ingress-nginx
  6. spec:
  7. targetRef:
  8. apiVersion: apps.kruise.io/v1alpha1
  9. kind: CloneSet
  10. name: ingress-nginx-controller
  11. scheduleStrategy:
  12. type: Adaptive
  13. adaptive:
  14. rescheduleCriticalSeconds: 2
  15. subsets:
  16. - name: fixed-resource-pool
  17. requiredNodeSelectorTerm:
  18. matchExpressions:
  19. - key: type
  20. operator: NotIn
  21. values:
  22. - virtual-kubelet
  23. patch:
  24. metadata:
  25. labels:
  26. resource-pool: fixed
  27. - name: elastic-resource-pool
  28. requiredNodeSelectorTerm:
  29. matchExpressions:
  30. - key: type
  31. operator: In
  32. values:
  33. - virtual-kubelet
  34. tolerations:
  35. - effect: NoSchedule
  36. key: virtual-kubelet.io/provider
  37. operator: Exists
  38. patch:
  39. metadata:
  40. labels:
  41. resource-pool: elastic

The above WorkloadSpread configuration contains two subsets, which correspond fixed resource pool and elastic resource pool. We expect the CloneSet named hello-web to schedule its Pods to the fixed resource pool preferentially, and then to the elastic resource pool if the resource pool is unschedulable.

When APIServer receives a corresponding pod creation request, it will call kruise Webhook to inject the scheduling rules of the WorkloadSpread. The injection strategy is append instead of replace. For example, if Pod itself had ‘requiredNodeSelectorterm’ or ‘Tolerations’, WorkloadSpread will append its scheduling rules to the end of ‘requiredNodeSelectorterm’ or ‘Tolerations’ of the Pod.

Therefore, we suggest:

  • Write the common and immutable scheduling rules to workload.

  • Write the customized scheduling rules to the WorkloadSpread subset.

Deploying ScaleObject

  1. apiVersion: keda.sh/v1alpha1
  2. kind: ScaledObject
  3. metadata:
  4. name: ingress-nginx-scaledobject
  5. namespace: ingress-nginx
  6. spec:
  7. maxReplicaCount: 10
  8. minReplicaCount: 1
  9. pollingInterval: 10
  10. cooldownPeriod: 2
  11. advanced:
  12. horizontalPodAutoscalerConfig:
  13. behavior:
  14. scaleDown:
  15. stabilizationWindowSeconds: 10
  16. scaleTargetRef:
  17. apiVersion: apps.kruise.io/v1alpha1
  18. kind: CloneSet
  19. name: hello-web
  20. triggers:
  21. - type: prometheus
  22. metadata:
  23. serverAddress: http://kube-prometheus-stack-1640-prometheus.prometheus:9090/
  24. metricName: nginx_http_requests_total
  25. query: sum(rate(nginx_http_requests_total{job="ingress-nginx-exporter"}[12s]))
  26. threshold: '100'

Demo Show

Firstly, make sure that all the configurations have been applied:

result-show-0

Then, use go-stress-testing to do pressure test for hello-web application.

When the first traffic peak comes,you can see the Workload is scaling up, and the newly-created pods are scheduled to the fixed resource pool first:

result-show-1

When the second traffic peak comes (higher), the fixed resource pool is insufficient due to the lack of resource, the Workload is scaling up to the elastic resource pool:

result-show-2

When the traffic peak gone, the Workload is scaling down, and the pods in the elastic resource pool are deleted firstly:

result-show-3