基于 Effective HPA 实现自定义指标的智能弹性实践
Effective HPA 的最佳实践.
Kubernetes HPA 支持了丰富的弹性扩展能力,Kubernetes 平台开发者部署服务实现自定义 Metric 的服务,Kubernetes 用户配置多项内置的资源指标或者自定义 Metric 指标实现自定义水平弹性。 Effective HPA 兼容社区的 Kubernetes HPA 的能力,提供了更智能的弹性策略,比如基于预测的弹性和基于 Cron 周期的弹性等。 Prometheus 是当下流行的开源监控系统,通过 Prometheus 可以获取到用户的自定义指标配置。
本文将通过一个例子介绍了如何基于 Effective HPA 实现自定义指标的智能弹性。部分配置来自于 官方文档
部署环境要求
- Kubernetes 1.18+
- Helm 3.1.0
- Crane v0.6.0+
- Prometheus
参考 安裝文档 在集群中安装 Crane,Prometheus 可以使用安装文档中的也可以是已部署的 Prometheus。
环境搭建
安装 PrometheusAdapter
Crane 组件 Metric-Adapter 和 PrometheusAdapter 都基于 custom-metric-apiserver 实现了 Custom Metric 和 External Metric 的 ApiService。在安装 Crane 时会将对应的 ApiService 安装为 Crane 的 Metric-Adapter,因此安装 PrometheusAdapter 前需要删除 ApiService 以确保 Helm 安装成功。
# 查看当前集群 ApiService
kubectl get apiservice
因为安装了 Crane, 结果如下:
NAME SERVICE AVAILABLE AGE
v1beta1.batch Local True 35d
v1beta1.custom.metrics.k8s.io crane-system/metric-adapter True 18d
v1beta1.discovery.k8s.io Local True 35d
v1beta1.events.k8s.io Local True 35d
v1beta1.external.metrics.k8s.io crane-system/metric-adapter True 18d
v1beta1.flowcontrol.apiserver.k8s.io Local True 35d
v1beta1.metrics.k8s.io kube-system/metrics-service True 35d
删除 crane 安装的 ApiService
kubectl delete apiservice v1beta1.custom.metrics.k8s.io
kubectl delete apiservice v1beta1.external.metrics.k8s.io
通过 Helm 安装 PrometheusAdapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-adapter -n crane-system prometheus-community/prometheus-adapter
再将 ApiService 改回 Crane 的 Metric-Adapter
kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/deploy/metric-adapter/apiservice.yaml
配置 Metric-Adapter 开启 RemoteAdapter 功能
在安装 PrometheusAdapter 时没有将 ApiService 指向 PrometheusAdapter,因此为了让 PrometheusAdapter 也可以提供自定义 Metric,通过 Crane Metric Adapter 的 RemoteAdapter
功能将请求转发给 PrometheusAdapter。
修改 Metric-Adapter 的配置,将 PrometheusAdapter 的 Service 配置成 Crane Metric Adapter 的 RemoteAdapter
# 查看当前集群 ApiService
kubectl edit deploy metric-adapter -n crane-system
根据 PrometheusAdapter 的配置做以下修改:
apiVersion: apps/v1
kind: Deployment
metadata:
name: metric-adapter
namespace: crane-system
spec:
template:
spec:
containers:
- args:
#添加外部 Adapter 配置
- --remote-adapter=true
- --remote-adapter-service-namespace=crane-system
- --remote-adapter-service-name=prometheus-adapter
- --remote-adapter-service-port=443
RemoteAdapter 能力
Kubernetes 限制一个 ApiService 只能配置一个后端服务,因此,为了在一个集群内使用 Crane 提供的 Metric 和 PrometheusAdapter 提供的 Metric,Crane 支持了 RemoteAdapter 解决此问题
- Crane Metric-Adapter 支持配置一个 Kubernetes Service 作为一个远程 Adapter
- Crane Metric-Adapter 处理请求时会先检查是否是 Crane 提供的 Local Metric,如果不是,则转发给远程 Adapter
运行例子
准备应用
将以下应用部署到集群中,应用暴露了 Metric 展示每秒收到的 http 请求数量。
sample-app.deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-app
labels:
app: sample-app
spec:
replicas: 1
selector:
matchLabels:
app: sample-app
template:
metadata:
labels:
app: sample-app
spec:
containers:
- image: luxas/autoscale-demo:v0.1.2
name: metrics-provider
resources:
limits:
cpu: 500m
requests:
cpu: 200m
ports:
- name: http
containerPort: 8080
sample-app.service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: sample-app
name: sample-app
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
selector:
app: sample-app
type: ClusterIP
kubectl create -f sample-app.deploy.yaml
kubectl create -f sample-app.service.yaml
当应用部署完成后,您可以通过命令检查 http_requests_total
Metric:
curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
配置采集规则
配置 Prometheus 的 ScrapeConfig,收集应用的 Metric: http_requests_total
kubectl edit configmap -n crane-system prometheus-server
添加以下配置
- job_name: sample-app
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: default;sample-app-(.+)
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
此时,您可以在 Prometheus 查询 psql:sum(rate(http_requests_total[5m])) by (pod)
验证 PrometheusAdapter
PrometheusAdapter 默认的 Rule 配置支持将 http_requests_total 转换成 Pods 类型的 Custom Metric,通过命令验证:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .
结果应包括 pods/http_requests
:
{
"name": "pods/http_requests",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
这表明现在可以通过 Pod Metric 配置 HPA。
配置弹性
现在我们可以创建 Effective HPA。此时 Effective HPA 可以通过 Pod Metric http_requests
进行弹性:
如何定义一个自定义指标开启预测功能
在 Effective HPA 的 Annotation 按以下规则添加配置:
annotations:
# metric-query.autoscaling.crane.io 是固定的前缀,后面是 Metric 名字,需跟 spec.metrics 中的 Metric.name 相同,支持 Pods 类型和 External 类型
metric-query.autoscaling.crane.io/http_requests: "sum(rate(http_requests_total[5m])) by (pod)"
sample-app-hpa.yaml
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
metadata:
name: php-apache
annotations:
# metric-query.autoscaling.crane.io 是固定的前缀,后面是 Metric 名字,需跟 spec.metrics 中的 Metric.name 相同,支持 Pods 类型和 External 类型
metric-query.autoscaling.crane.io/http_requests: "sum(rate(http_requests_total[5m])) by (pod)"
spec:
# ScaleTargetRef is the reference to the workload that should be scaled.
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
minReplicas: 1 # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
maxReplicas: 10 # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
scaleStrategy: Auto # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual".
# Metrics contains the specifications for which to use to calculate the desired replica count.
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: http_requests
target:
type: AverageValue
averageValue: 500m
# Prediction defines configurations for predict resources.
# If unspecified, defaults don't enable prediction.
prediction:
predictionWindowSeconds: 3600 # PredictionWindowSeconds is the time window to predict metrics in the future.
predictionAlgorithm:
algorithmType: dsp
dsp:
sampleInterval: "60s"
historyLength: "7d"
kubectl create -f sample-app-hpa.yaml
查看 TimeSeriesPrediction 状态,如果应用运行时间较短,可能会无法预测:
apiVersion: prediction.crane.io/v1alpha1
kind: TimeSeriesPrediction
metadata:
creationTimestamp: "2022-07-11T16:10:09Z"
generation: 1
labels:
app.kubernetes.io/managed-by: effective-hpa-controller
app.kubernetes.io/name: ehpa-php-apache
app.kubernetes.io/part-of: php-apache
autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
name: ehpa-php-apache
namespace: default
spec:
predictionMetrics:
- algorithm:
algorithmType: dsp
dsp:
estimators: {}
historyLength: 7d
sampleInterval: 60s
resourceIdentifier: crane_pod_cpu_usage
resourceQuery: cpu
type: ResourceQuery
- algorithm:
algorithmType: dsp
dsp:
estimators: {}
historyLength: 7d
sampleInterval: 60s
expressionQuery:
expression: sum(rate(http_requests_total[5m])) by (pod)
resourceIdentifier: crane_custom.pods_http_requests
type: ExpressionQuery
predictionWindowSeconds: 3600
targetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
namespace: default
status:
conditions:
- lastTransitionTime: "2022-07-12T06:54:42Z"
message: not all metric predicted
reason: PredictPartial
status: "False"
type: Ready
predictionMetrics:
- ready: false
resourceIdentifier: crane_pod_cpu_usage
- prediction:
- labels:
- name: pod
value: sample-app-7cfb596f98-8h5vv
samples:
- timestamp: 1657608900
value: "0.01683"
- timestamp: 1657608960
value: "0.01683"
......
ready: true
resourceIdentifier: crane_custom.pods_http_requests
查看 Effective HPA 创建的 HPA 对象,可以观测到已经创建出基于自定义指标预测的 Metric: crane_custom.pods_http_requests
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
creationTimestamp: "2022-07-11T16:10:10Z"
labels:
app.kubernetes.io/managed-by: effective-hpa-controller
app.kubernetes.io/name: ehpa-php-apache
app.kubernetes.io/part-of: php-apache
autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
name: ehpa-php-apache
namespace: default
spec:
maxReplicas: 10
metrics:
- pods:
metric:
name: http_requests
target:
averageValue: 500m
type: AverageValue
type: Pods
- pods:
metric:
name: crane_custom.pods_http_requests
selector:
matchLabels:
autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
target:
averageValue: 500m
type: AverageValue
type: Pods
- resource:
name: cpu
target:
averageUtilization: 50
type: Utilization
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
总结
由于生产环境的复杂性,基于多指标的弹性(CPU/Memory/自定义指标)往往是生产应用的常见选择,因此 Effective HPA 通过预测算法覆盖了多指标的弹性,达到了帮助更多业务在生产环境落地水平弹性的成效。