Installation with managed etcd
The standard Quick Installation guide will set up Cilium to use Kubernetes CRDs to store and propagate state between agents. Use of CRDs can impose scale limitations depending on the size of your environment. Use of etcd optimizes the propagation of state between agents. This guide explains the steps required to set up Cilium with a managed etcd where etcd is managed by an operator which maintains an etcd cluster as part of the Kubernetes cluster.
The identity allocation remains to be CRD-based which means that etcd remains an optional component to improve scalability. Failures in providing etcd will not be critical to the availability of Cilium but will reduce the efficacy of state propagation. This allows the managed etcd to recover while depending on Cilium itself to provide connectivity and security.
Should you encounter any issues during the installation, please refer to the Troubleshooting section and / or seek help on the Slack channel
.
Requirements
Make sure your Kubernetes environment is meeting the requirements:
- Kubernetes >= 1.12
- Linux kernel >= 4.9
- Kubernetes in CNI mode
- Mounted eBPF filesystem mounted on all worker nodes
- Recommended: Enable PodCIDR allocation (
--allocate-node-cidrs
) in thekube-controller-manager
(recommended)
Refer to the section Requirements for detailed instruction on how to prepare your Kubernetes environment.
Deploy Cilium + cilium-etcd-operator
Note
First, make sure you have Helm 3 installed. Helm 2 is no longer supported.
Setup Helm repository:
helm repo add cilium https://helm.cilium.io/
Deploy Cilium release via Helm:
helm install cilium cilium/cilium --version 1.9.8 \
--namespace kube-system \
--set etcd.enabled=true \
--set etcd.managed=true \
--set etcd.k8sService=true
If you do not want Cilium to store state in Kubernetes custom resources (CRDs), consider setting identityAllocationMode
:
--set identityAllocationMode=kvstore
Validate the Installation
You can monitor as Cilium and all required components are being installed:
kubectl -n kube-system get pods --watch
NAME READY STATUS RESTARTS AGE
cilium-etcd-operator-6ffbd46df9-pn6cf 1/1 Running 0 7s
cilium-operator-cb4578bc5-q52qk 0/1 Pending 0 8s
cilium-s8w5m 0/1 PodInitializing 0 7s
coredns-86c58d9df4-4g7dd 0/1 ContainerCreating 0 8m57s
coredns-86c58d9df4-4l6b2 0/1 ContainerCreating 0 8m57s
It may take a couple of minutes for the etcd-operator to bring up the necessary number of etcd pods to achieve quorum. Once it reaches quorum, all components should be healthy and ready:
cilium-etcd-8d95ggpjmw 1/1 Running 0 78s
cilium-etcd-operator-6ffbd46df9-pn6cf 1/1 Running 0 4m12s
cilium-etcd-t695lgxf4x 1/1 Running 0 118s
cilium-etcd-zw285m6t9g 1/1 Running 0 2m41s
cilium-operator-cb4578bc5-q52qk 1/1 Running 0 4m13s
cilium-s8w5m 1/1 Running 0 4m12s
coredns-86c58d9df4-4g7dd 1/1 Running 0 13m
coredns-86c58d9df4-4l6b2 1/1 Running 0 13m
etcd-operator-5cf67779fd-hd9j7 1/1 Running 0 2m42s
Specify Environment Variables
Specify the namespace in which Cilium is installed as CILIUM_NAMESPACE
environment variable. Subsequent commands reference this environment variable.
export CILIUM_NAMESPACE=kube-system
Enable Hubble for Cluster-Wide Visibility
Hubble is the component for observability in Cilium. To obtain cluster-wide visibility into your network traffic, deploy Hubble Relay and the UI as follows on your existing installation:
Installation via Helm
Installation via quick-hubble-install.yaml
If you installed Cilium via helm install
, you may enable Hubble Relay and UI with the following command:
helm upgrade cilium cilium/cilium --version 1.9.8 \
--namespace $CILIUM_NAMESPACE \
--reuse-values \
--set hubble.listenAddress=":4244" \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true
On Cilium 1.9.1 and older, the Cilium agent pods will be restarted in the process.
If you installed Cilium 1.9.2 or newer via the provided quick-install.yaml
, you may deploy Hubble Relay and UI on top of your existing installation with the following command:
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-hubble-install.yaml
Installation via quick-hubble-install.yaml
only works if the installed Cilium version is 1.9.2 or newer. Users of Cilium 1.9.0 or 1.9.1 are encouraged to upgrade to a newer version by applying the most recent Cilium quick-install.yaml
first.
Alternatively, it is possible to manually generate a YAML manifest for the Cilium DaemonSet and Hubble Relay/UI as follows. The generated YAML can be applied on top of an existing installation:
# Set this to your installed Cilium version
export CILIUM_VERSION=1.9.1
# Please set any custom Helm values you may need for Cilium,
# such as for example `--set operator.replicas=1` on single-cluster nodes.
helm template cilium cilium/cilium --version $CILIUM_VERSION \\
--namespace $CILIUM_NAMESPACE \\
--set hubble.tls.auto.method="cronJob" \\
--set hubble.listenAddress=":4244" \\
--set hubble.relay.enabled=true \\
--set hubble.ui.enabled=true > cilium-with-hubble.yaml
# This will modify your existing Cilium DaemonSet and ConfigMap
kubectl apply -f cilium-with-hubble.yaml
The Cilium agent pods will be restarted in the process.
Once the Hubble UI pod is started, use port forwarding for the hubble-ui
service. This allows opening the UI locally on a browser:
kubectl port-forward -n $CILIUM_NAMESPACE svc/hubble-ui --address 0.0.0.0 --address :: 12000:80
And then open http://localhost:12000/ to access the UI.
Hubble UI is not the only way to get access to Hubble data. A command line tool, the Hubble CLI, is also available. It can be installed by following the instructions below:
Linux
MacOS
Windows
Download the latest hubble release:
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz"
curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz.sha256sum"
sha256sum --check hubble-linux-amd64.tar.gz.sha256sum
tar zxf hubble-linux-amd64.tar.gz
and move the hubble
CLI to a directory listed in the $PATH
environment variable. For example:
sudo mv hubble /usr/local/bin
Download the latest hubble release:
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-darwin-amd64.tar.gz"
curl -LO "https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-darwin-amd64.tar.gz.sha256sum"
shasum -a 256 -c hubble-darwin-amd64.tar.gz.sha256sum
tar zxf hubble-darwin-amd64.tar.gz
and move the hubble
CLI to a directory listed in the $PATH
environment variable. For example:
sudo mv hubble /usr/local/bin
Download the latest hubble release:
curl -LO "https://raw.githubusercontent.com/cilium/hubble/master/stable.txt"
set /p HUBBLE_VERSION=<stable.txt
curl -LO "https://github.com/cilium/hubble/releases/download/%HUBBLE_VERSION%/hubble-windows-amd64.tar.gz"
curl -LO "https://github.com/cilium/hubble/releases/download/%HUBBLE_VERSION%/hubble-windows-amd64.tar.gz.sha256sum"
certutil -hashfile hubble-windows-amd64.tar.gz SHA256
type hubble-windows-amd64.tar.gz.sha256sum
:: verify that the checksum from the two commands above match
tar zxf hubble-windows-amd64.tar.gz
and move the hubble.exe
CLI to a directory listed in the %PATH%
environment variable after extracting it from the tarball.
Similarly to the UI, use port forwarding for the hubble-relay
service to make it available locally:
kubectl port-forward -n $CILIUM_NAMESPACE svc/hubble-relay --address 0.0.0.0 --address :: 4245:80
In a separate terminal window, run the hubble status
command specifying the Hubble Relay address:
$ hubble --server localhost:4245 status
Healthcheck (via localhost:4245): Ok
Current/Max Flows: 5455/16384 (33.29%)
Flows/s: 11.30
Connected Nodes: 4/4
If Hubble Relay reports that all nodes are connected, as in the example output above, you can now use the CLI to observe flows of the entire cluster:
hubble --server localhost:4245 observe
If you encounter any problem at this point, you may seek help on Slack.
Tip
Hubble CLI configuration can be persisted using a configuration file or environment variables. This avoids having to specify options specific to a particular environment every time a command is run. Run hubble help config
for more information.
For more information about Hubble and its components, see the Observability section.
Troubleshooting
Make sure that
kube-dns
orcoredns
is running and healthy in thekube-system
namespace. A functioning Kubernetes DNS is strictly required in order for Cilium to resolve the ClusterIP of the etcd cluster. If eitherkube-dns
orcoredns
were already running before Cilium was deployed, the pods may be managed by a former CNI plugin.cilium-operator
will automatically restart the pods to ensure that they are being managed by the Cilium CNI plugin. You can manually restart the pods as well if required and validate that Cilium is managingkube-dns
orcoredns
by running:
kubectl -n kube-system get cep
You should see
kube-dns-xxx
orcoredns-xxx
pods.In order for the entire system to come up, the following components have to be running at the same time:
kube-dns
orcoredns
cilium-xxx
cilium-operator-xxx
cilium-etcd-operator
etcd-operator
cilium-etcd-xxx
All timeouts are configured that this will typically work out smoothly even if some of the pods restart once or twice. In case any of the above pods get into a long
CrashLoopBackoff
, bootstrapping can be expedited by restarting the pods to reset theCrashLoopBackoff
time.
CoreDNS: Enable reverse lookups
In order for the TLS certificates between etcd peers to work correctly, a DNS reverse lookup on a pod IP must map back to pod name. If you are using CoreDNS, check the CoreDNS ConfigMap and validate that in-addr.arpa
and ip6.arpa
are listed as wildcards for the kubernetes block like this:
Kubernetes 1.16+
Kubernetes < 1.16
kubectl -n kube-system edit cm coredns
[...]
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
}
kubectl -n kube-system edit cm coredns
[...]
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . /etc/resolv.conf
cache 30
}
The contents can look different than the above. The specific configuration that matters is to make sure that in-addr.arpa
and ip6.arpa
are listed as wildcards next to cluster.local
.
You can validate this by looking up a pod IP with the host
utility from any pod:
host 10.60.20.86
86.20.60.10.in-addr.arpa domain name pointer cilium-etcd-972nprv9dp.cilium-etcd.kube-system.svc.cluster.local.
What is the cilium-etcd-operator?
The cilium-etcd-operator uses and extends the etcd-operator to guarantee quorum, auto-create certificates, and manage compaction:
- Automatic re-creation of the etcd cluster when the cluster loses quorum. The standard etcd-operator will refuse to bring up new etcd nodes and the etcd cluster becomes unusable.
- Automatic creation of certificates and keys. This simplifies the installation of the operator and makes the certificates and keys required to access the etcd cluster available to Cilium using a well known Kubernetes secret name.
- Compaction is automatically handled.
Limitations
Use of the cilium-etcd-operator offers a lot of advantages including simplicity of installation, automatic management of the etcd cluster including compaction, restart on quorum loss, and automatic use of TLS. There are several disadvantages which can become of relevance as you scale up your clusters:
- etcd nodes operated by the etcd-operator will not use persistent storage. Once the etcd cluster looses quorum, the etcd cluster is automatically re-created by the cilium-etcd-operator. Cilium will automatically recover and re-create all state in etcd. This operation can take couple of seconds and may cause minor disruptions as ongoing distributed locks are invalidated and security identities have to be re-allocated.
- etcd is very sensitive to disk IO latency and requires fast disk access at a certain scale. The cilium-etcd-operator will not take any measures to provide fast disk access and performance will depend whatever is provided to the pods in your Kubernetes cluster. See etcd Hardware recommendations for more details.