- Important: RKE add-on install is only supported up to Rancher v2.0.8
- Double check if all the required ports are opened in your (host) firewall
- All nodes should be present and in Ready state
- All pods/jobs should be in Running/Completed state
- Check ingress
- List all Kubernetes cluster events
- Check Rancher container logging
- Check NGINX ingress controller logging
- Check if overlay network is functioning correctly
Important: RKE add-on install is only supported up to Rancher v2.0.8
Please use the Rancher Helm chart to install Rancher on a Kubernetes cluster. For details, see the Kubernetes Install .
If you are currently using the RKE add-on install method, see Migrating from a Kubernetes Install with an RKE Add-on for details on how to move to using the helm chart.
Below are steps that you can follow to determine what is wrong in your cluster.
Double check if all the required ports are opened in your (host) firewall
Double check if all the required ports are opened in your (host) firewall.
All nodes should be present and in Ready state
To check, run the command:
kubectl --kubeconfig kube_config_rancher-cluster.yml get nodes
If a node is not shown in this output or a node is not in Ready state, you can check the logging of the kubelet
container. Login to the node and run docker logs kubelet
.
All pods/jobs should be in Running/Completed state
To check, run the command:
kubectl --kubeconfig kube_config_rancher-cluster.yml get pods --all-namespaces
If a pod is not in Running state, you can dig into the root cause by running:
Describe pod
kubectl --kubeconfig kube_config_rancher-cluster.yml describe pod POD_NAME -n NAMESPACE
Pod container logs
kubectl --kubeconfig kube_config_rancher-cluster.yml logs POD_NAME -n NAMESPACE
If a job is not in Completed state, you can dig into the root cause by running:
Describe job
kubectl --kubeconfig kube_config_rancher-cluster.yml describe job JOB_NAME -n NAMESPACE
Logs from the containers of pods of the job
kubectl --kubeconfig kube_config_rancher-cluster.yml logs -l job-name=JOB_NAME -n NAMESPACE
Check ingress
Ingress should have the correct HOSTS
(showing the configured FQDN) and ADDRESS
(address(es) it will be routed to).
kubectl --kubeconfig kube_config_rancher-cluster.yml get ingress --all-namespaces
List all Kubernetes cluster events
Kubernetes cluster events are stored, and can be retrieved by running:
kubectl --kubeconfig kube_config_rancher-cluster.yml get events --all-namespaces
Check Rancher container logging
kubectl --kubeconfig kube_config_rancher-cluster.yml logs -l app=cattle -n cattle-system
Check NGINX ingress controller logging
kubectl --kubeconfig kube_config_rancher-cluster.yml logs -l app=ingress-nginx -n ingress-nginx
Check if overlay network is functioning correctly
The pod can be scheduled to any of the hosts you used for your cluster, but that means that the NGINX ingress controller needs to be able to route the request from NODE_1
to NODE_2
. This happens over the overlay network. If the overlay network is not functioning, you will experience intermittent TCP/HTTP connection failures due to the NGINX ingress controller not being able to route to the pod.
To test the overlay network, you can launch the following DaemonSet
definition. This will run an alpine
container on every host, which we will use to run a ping
test between containers on all hosts.
Save the following file as
ds-alpine.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: alpine
spec:
selector:
matchLabels:
name: alpine
template:
metadata:
labels:
name: alpine
spec:
tolerations:
- effect: NoExecute
key: "node-role.kubernetes.io/etcd"
value: "true"
- effect: NoSchedule
key: "node-role.kubernetes.io/controlplane"
value: "true"
containers:
- image: alpine
imagePullPolicy: Always
name: alpine
command: ["sh", "-c", "tail -f /dev/null"]
terminationMessagePath: /dev/termination-log
Launch it using
kubectl --kubeconfig kube_config_rancher-cluster.yml create -f ds-alpine.yml
Wait until
kubectl --kubeconfig kube_config_rancher-cluster.yml rollout status ds/alpine -w
returns:daemon set "alpine" successfully rolled out
.Run the following command to let each container on every host ping each other (it’s a single line command).
echo "=> Start"; kubectl --kubeconfig kube_config_rancher-cluster.yml get pods -l name=alpine -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' | while read spod shost; do kubectl --kubeconfig kube_config_rancher-cluster.yml get pods -l name=alpine -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' | while read tip thost; do kubectl --kubeconfig kube_config_rancher-cluster.yml --request-timeout='10s' exec $spod -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $shost cannot reach $thost; fi; done; done; echo "=> End"
When this command has finished running, the output indicating everything is correct is:
=> Start
=> End
If you see error in the output, that means that the required ports for overlay networking are not opened between the hosts indicated.
Example error output of a situation where NODE1 had the UDP ports blocked.
=> Start
command terminated with exit code 1
NODE2 cannot reach NODE1
command terminated with exit code 1
NODE3 cannot reach NODE1
command terminated with exit code 1
NODE1 cannot reach NODE2
command terminated with exit code 1
NODE1 cannot reach NODE3
=> End