Rancher HA

The commands/steps listed on this page can be used to check your Rancher Kubernetes Installation.

Make sure you configured the correct kubeconfig (for example, export KUBECONFIG=$PWD/kube_config_cluster.yml).

Check Rancher Pods

Rancher pods are deployed as a Deployment in the cattle-system namespace.

Check if the pods are running on all nodes:

  1. kubectl -n cattle-system get pods -l app=rancher -o wide

Example output:

  1. NAME READY STATUS RESTARTS AGE IP NODE
  2. rancher-7dbd7875f7-n6t5t 1/1 Running 0 8m x.x.x.x x.x.x.x
  3. rancher-7dbd7875f7-qbj5k 1/1 Running 0 8m x.x.x.x x.x.x.x
  4. rancher-7dbd7875f7-qw7wb 1/1 Running 0 8m x.x.x.x x.x.x.x

If a pod is unable to run (Status is not Running, Ready status is not showing 1/1 or you see a high count of Restarts), check the pod details, logs and namespace events.

Pod Details

  1. kubectl -n cattle-system describe pods -l app=rancher

Pod Container Logs

  1. kubectl -n cattle-system logs -l app=rancher

Namespace Events

  1. kubectl -n cattle-system get events

Check Ingress

Ingress should have the correct HOSTS (showing the configured FQDN) and ADDRESS (host address(es) it will be routed to).

  1. kubectl -n cattle-system get ingress

Example output:

  1. NAME HOSTS ADDRESS PORTS AGE
  2. rancher rancher.yourdomain.com x.x.x.x,x.x.x.x,x.x.x.x 80, 443 2m

Check Ingress Controller Logs

When accessing your configured Rancher FQDN does not show you the UI, check the ingress controller logging to see what happens when you try to access Rancher:

  1. kubectl -n ingress-nginx logs -l app=ingress-nginx

Leader Election

The leader is determined by a leader election process. After the leader has been determined, the leader (holderIdentity) is saved in the cattle-controllers Lease in the kube-system namespace (in this example, rancher-dbc7ff869-gvg6k).

  1. kubectl -n kube-system get lease cattle-controllers

Example output:

  1. NAME HOLDER AGE
  2. cattle-controllers rancher-dbc7ff869-gvg6k 6h10m

Configuration

Available as of Rancher 2.8.3

If the Kubernetes API experiences latency, the Rancher replica holding the leader lock may not be able to renew the lease before the lease becomes invalid, which can be observed in the Rancher logs:

  1. E0629 04:13:07.293461 34 leaderelection.go:364] Failed to update lock: Put "https://172.17.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s": context deadline exceeded
  2. I0629 04:13:07.293594 34 leaderelection.go:280] failed to renew lease kube-system/cattle-controllers: timed out waiting for the condition
  3. ...
  4. 2024/06/29 04:13:10 [FATAL] leaderelection lost for cattle-controllers

To mitigate this, you can set environment variables in the rancher Deployment to modify the default parameters for leader election:

  • CATTLE_ELECTION_LEASE_DURATION: The lease duration. The default value is 45s.
  • CATTLE_ELECTION_RENEW_DEADLINE: The renew deadline. The default value is 30s.
  • CATTLE_ELECTION_RETRY_PERIOD: The retry period. The default value is 2s.

Example:

  1. kubectl -n cattle-system set env deploy/rancher CATTLE_ELECTION_LEASE_DURATION=2m CATTLE_ELECTION_RENEW_DEADLINE=90s CATTLE_ELECTION_RETRY_PERIOD=10s

This will temporarily increase the lease duration, renew deadline and retry period to 120, 90 and 10 seconds respectively. Alternatively, in order to make such changes permanent, these environment variables can be set by using Helm values instead.