高可用保障方案
本文介绍 ZadigX 高可用保障相关的配置和建议。
高可用保障
包括 K8s 基础设施、ZadigX 系统组件和数据库的高可用。
K8s 资源高可用
基础资源的高可用,由 Kubernetes 集群的提供者(公有云/自建云)保障。
应用高可用
提供以下建议保障应用高可用:
- 使用 K8s 原生的 deployment 部署,保证服务出现极端情况异常退出后可以被迅速拉起
- 通过内置的 metric 接口,结合 Prometheus 以及预配置的 Grafana 面板实时展现 ZadigX 组件的性能指标(CPU / Memory 利用率)
- 结合 Prometheus alertmanager 配置核心组件监控报警,报警规则建议如下:
- aslan 组件:当 CPU 占用率超过 2C,或内存占用超过 4G,持续超过 2 分钟时触发报警
- cron 组件:当 CPU 占用率超过 1C,或者内存占用超过 512M,持续超过 2 分钟时触发报警
- dind 组件:当 CPU 占用率超过 4C,或者内存占用超过 8G,持续超过 2 分钟时触发报警
- 备份当前安装参数以及 Chart:
- 通过
helm get values <ZadigX Release Name> -n <ZadigX Namespace> > ZadigX.yaml
获取当前安装参数 - 通过
helm pull koderover-chart/zadigx --version=<version>
备份当前 Chart
- 通过
提示
ZadigX 中部分组件支持多副本策略,若有必要可以按需调整实际副本数。支持多副本配置的组件如下:
- deployment/zadig-portal
- deployment/opa
- deployment/resource-server
- deployment/warpdrive
- deployment/gateway-proxy
- statefulSet/dind
数据高可用
建议安装时使用高可用 MongoDB、MySQL 存储介质,并对 ZadigX 系统数据定时备份。建议如下:
- 使用高可用的数据库,以阿里云为例:
- MySQL 建议使用 RDS MySQL 高可用版
- MongoDB 选用独占型实例
- 使用高可用的制品仓库:
- 使用云厂商提供的高可用镜像仓库 / Helm 仓库
- 若使用 K8s Helm Chart 项目,建议集成高可用的对象存储并设为默认使用,数据备份建议:一天一次
- 定期备份数据库数据
- 若使用公有云,需要手动设置公有云 MySQL/MongoDB 实例的自动备份策略
- 备份策略建议:一天一次
- MinIO 备份策略:一天一次
监控告警配置
推荐使用 Prometheus (opens new window) 作为监控和性能指标信息存储方案,使用 Grafana (opens new window) 作为可视化组件进行展示。
Prometheus 配置
- 在 Prometheus 配置的 scrape_configs 中添加如下 job:
如果 ZadigX 和 Prometheus 在同一集群中
job_name: prometheus
metrics_path: /api/metrics
static_configs:
- targets:
- aslan.<部署 namespace>.svc:25000
如果 ZadigX 和 Prometheus 在不同集群中
job_name: admin
metrics_path: /api/aslan/metrics
static_configs:
- targets:
- <ZadigX的访问域名>
scheme: https
- 重新加载 Prometheus 配置
- 在 Prometheus 的 query 界面,输入 request_total 进行查询,确认有数据,证明配置成功
Grafana 配置
- [可选项]在 Grafana 的 configuration - Data source 中配置上文的 Prometheus 为数据源
- 在 dashboards - import 中输入如下 JSON 导入 Grafana 面板
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 1,
"links": [],
"liveNow": false,
"panels": [
{
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 3,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"editorMode": "code",
"exemplar": false,
"expr": "sum by(instance) (rate(request_total{status=\"200\"}[1m]))",
"format": "time_series",
"hide": false,
"instant": false,
"interval": "",
"legendFormat": "200",
"range": true,
"refId": "A"
},
{
"editorMode": "code",
"expr": "sum by(instance) (rate(request_total{status=~\"4..\"}[1m]))",
"hide": false,
"legendFormat": "4xx",
"range": true,
"refId": "B"
},
{
"editorMode": "code",
"expr": "sum by(instance) (rate(request_total{status=\"5..\"}[1m]))",
"hide": false,
"legendFormat": "5xx",
"range": true,
"refId": "C"
}
],
"title": "QPS追踪",
"type": "timeseries"
},
{
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 12,
"y": 0
},
"id": 4,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "9.4.7",
"targets": [
{
"editorMode": "code",
"expr": "running_workflows",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "运行中的工作流",
"type": "gauge"
},
{
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 18,
"y": 0
},
"id": 6,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "9.4.7",
"targets": [
{
"editorMode": "code",
"expr": "pending_workflows",
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "排队中的工作流",
"type": "gauge"
},
{
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 9
},
"id": 8,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"editorMode": "code",
"expr": "cpu",
"legendFormat": "{{service}}",
"range": true,
"refId": "A"
}
],
"title": "CPU 消耗",
"type": "timeseries"
},
{
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 9
},
"id": 10,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"editorMode": "code",
"expr": "memory",
"legendFormat": "{{service}}",
"range": true,
"refId": "A"
}
],
"title": "内存消耗(MB)",
"type": "timeseries"
},
{
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "慢接口"
},
"properties": [
{
"id": "custom.width",
"value": 1301
}
]
}
]
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 17
},
"id": 12,
"options": {
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"frameIndex": 0,
"showHeader": true,
"sortBy": []
},
"pluginVersion": "9.4.7",
"targets": [
{
"editorMode": "code",
"exemplar": false,
"expr": "topk(10, sum by(method, handler) (api_response_time_bucket{status=\"200\",le=\"+Inf\"}) - sum by(method, handler) (api_response_time_bucket{status=\"200\",le=\"1.4\"}))",
"format": "time_series",
"hide": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "慢接口列表",
"transformations": [
{
"id": "reduce",
"options": {
"includeTimeField": false,
"labelsToFields": false,
"mode": "seriesToRows",
"reducers": [
"last"
]
}
},
{
"id": "sortBy",
"options": {
"fields": {},
"sort": [
{
"desc": true,
"field": "Last"
}
]
}
},
{
"id": "organize",
"options": {
"excludeByName": {},
"indexByName": {},
"renameByName": {
"Field": "慢接口",
"Last": "请求数量"
}
}
}
],
"type": "table"
}
],
"refresh": "5s",
"revision": 1,
"schemaVersion": 38,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-5m",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "ZadigX 监控面板",
"uid": "w1U8k1xVk",
"version": 2,
"weekStart": ""
}
- 在面板列表中找到 ZadigX 监控面板, 确认数据正常展示。
FAQs
参考文档:部署运维 FAQs。