Citadel Health Checking
You can enable Citadel’s health checking featureto detect the failures of the Citadel CSR (Certificate Signing Request) service.When a failure is detected, Kubelet automatically restarts the Citadel container.
When the health checking feature is enabled,the prober client module in Citadel periodically checks the health status of Citadel’s CSR gRPC server.It does this by sending CSRs to the gRPC server and verifies the responses.If Citadel is healthy, the prober client updates the modification time of the health status file.Otherwise, it does nothing. Citadel relies on aKubernetes liveness and readiness probewith command line to check the modification time of the health status file on the pod.If the file is not updated for a period, Kubelet will restart the Citadel container.
Since Citadel health checking currently only monitors the health status of CSR service API,this feature is not needed if the production setup is not usingSDS or adding virtual machines.
Before you begin
Follow the Istio installation guide to install Istio with mutual TLS enabled.
Deploying Citadel with health checking
To enable health checking, redeploy Citadel:
$ istioctl manifest generate --set values.global.mtls.enabled=true,values.security.citadelHealthCheck=true > citadel-health-check.yaml
$ kubectl apply -f citadel-health-check.yaml
Verify that health checking works
Citadel will log the health checking results. Run the following in command line:
$ kubectl logs `kubectl get po -n istio-system | grep istio-citadel | awk '{print $1}'` -n istio-system | grep "CSR signing service"
You will see the output similar to:
... CSR signing service is healthy (logged every 100 times).
The log above indicates the periodic health checking is working.The default health checking interval is 15 seconds and is logged once every 100 checks.
(Optional) Configuring the health checking
This section talks about how to modify the health checking configuration. Open the filecitadel-health-check.yaml
, and locate the following lines.
...
- --liveness-probe-path=/tmp/ca.liveness # path to the liveness health checking status file
- --liveness-probe-interval=60s # interval for health checking file update
- --probe-check-interval=15s # interval for health status check
livenessProbe:
exec:
command:
- /usr/local/bin/istio_ca
- probe
- --probe-path=/tmp/ca.liveness # path to the liveness health checking status file
- --interval=125s # the maximum time gap allowed between the file mtime and the current sys clock.
initialDelaySeconds: 60
periodSeconds: 60
...
The paths to the health status files are liveness-probe-path
and probe-path
.You should update the paths in Citadel and in the livenessProbe
at the same time.If Citadel is healthy, the value of the liveness-probe-interval
entry determines the interval used to update thehealth status file.The Citadel health checking controller uses the value of the probe-check-interval
entry to determine the interval tocall the Citadel CSR service.The interval
is the maximum time elapsed since the last update of the health status file, for the prober to considerCitadel as healthy.The values in the initialDelaySeconds
and periodSeconds
entries determine the initial delay and the interval betweeneach activation of the livenessProbe
.
Prolonging probe-check-interval
will reduce the health checking overhead, but there will be a greater lagging for theprober to get notified on the unhealthy status.To avoid the prober restarting Citadel due to temporary unavailability, the interval
on the prober can beconfigured to be more than N
times of the liveness-probe-interval
. This will allow the prober to tolerate N-1
continuously failed health checks.
Cleanup
- To disable health checking on Citadel:
$ istioctl manifest apply --set values.global.mtls.enabled=true
See also
Health Checking of Istio Services
Shows how to do health checking for Istio services.
Provision and manage DNS certificates in Istio.
Introducing the Istio v1beta1 Authorization Policy
Introduction, motivation and design principles for the Istio v1beta1 Authorization Policy.
A more secure way to manage Istio webhooks.
Multi-Mesh Deployments for Isolation and Boundary Protection
Deploy environments that require isolation into separate meshes and enable inter-mesh communication by mesh federation.
App Identity and Access Adapter
Using Istio to secure multi-cloud Kubernetes applications with zero code changes.