FAQ

Issues with BTRFS

  • As @jaredallard pointed out, people running k3d on a system with btrfs, may need to mount /dev/mapper into the nodes for the setup to work.
    • This will do: k3d cluster create CLUSTER_NAME -v /dev/mapper:/dev/mapper

Issues with ZFS

  • k3s currently has no support for ZFS and thus, creating multi-server setups (e.g. k3d cluster create multiserver --servers 3) fails, because the initializing server node (server flag --cluster-init) errors out with the following log:

    1. starting kubernetes: preparing server: start cluster and https: raft_init(): io: create I/O capabilities probe file: posix_allocate: operation not supported on socket

Pods evicted due to lack of disk space

  • Pods go to evicted state after doing X

    • Related issues: #133 - Pods evicted due to NodeHasDiskPressure (collection of #119 and #130)
    • Background: somehow docker runs out of space for the k3d node containers, which triggers a hard eviction in the kubelet
    • Possible fix/workaround by @zer0def:

      • use a docker storage driver which cleans up properly (e.g. overlay2)
      • clean up or expand docker root filesystem
      • change the kubelet’s eviction thresholds upon cluster creation:

        1. k3d cluster create \
        2. --k3s-arg '--kubelet-arg=eviction-hard=imagefs.available<1%,nodefs.available<1%@agent:*' \
        3. --k3s-arg '--kubelet-arg=eviction-minimum-reclaim=imagefs.available=1%,nodefs.available=1%@agent:*'

Restarting a multi-server cluster or the initializing server node fails

  • What you do: You create a cluster with more than one server node and later, you either stop server-0 or stop/start the whole cluster
  • What fails: After the restart, you cannot connect to the cluster anymore and kubectl will give you a lot of errors
  • What causes this issue: it’s a known issue with dqlite in k3s which doesn’t allow the initializing server node to go down
  • What’s the solution: Hopefully, this will be solved by the planned replacement of dqlite with embedded etcd in k3s
  • Related issues: #262

Passing additional arguments/flags to k3s (and on to e.g. the kube-apiserver)

  • The Problem: Passing a feature flag to the Kubernetes API Server running inside k3s.
  • Example: you want to enable the EphemeralContainers feature flag in Kubernetes
  • Solution:

    1. k3d cluster create \
    2. --k3s-arg '--kube-apiserver-arg=feature-gates=EphemeralContainers=true@server:*' \
    3. --k3s-arg '--kube-scheduler-arg=feature-gates=EphemeralContainers=true@server:*' \
    4. --k3s-arg '--kubelet-arg=feature-gates=EphemeralContainers=true@agent:*'
    • Note: Be aware of where the flags require dashes (--) and where not.
      • the k3s flag (--kube-apiserver-arg) has the dashes
      • the kube-apiserver flag feature-gates doesn’t have them (k3s adds them internally)
  • Second example:

    1. k3d cluster create k3d-one \
    2. --k3s-arg "--cluster-cidr=10.118.0.0/17@server:*" \
    3. --k3s-arg "--service-cidr=10.118.128.0/17@server:*" \
    4. --k3s-arg "--disable=servicelb@server:*" \
    5. --k3s-arg "--disable=traefik@server:*" \
    6. --verbose
    • Note: There are many ways to use the " and ' quotes, just be aware, that sometimes shells also try to interpret/interpolate parts of the commands

How to access services (like a database) running on my Docker Host Machine

  • As of version v3.1.0, we’re injecting the host.k3d.internal entry into the k3d containers (k3s nodes) and into the CoreDNS ConfigMap, enabling you to access your host system by referring to it as host.k3d.internal

Running behind a corporate proxy

Running k3d behind a corporate proxy can lead to some issues with k3d that have already been reported in more than one issue.
Some can be fixed by passing the HTTP_PROXY environment variables to k3d, some have to be fixed in docker’s daemon.json file and some are as easy as adding a volume mount.

Pods fail to start: x509: certificate signed by unknown authority

  • Example Error Message:

    1. Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "docker.io/rancher/pause:3.1": failed to pull image "docker.io/rancher/pause:3.1": failed to pull and unpack image "docker.io/rancher/pause:3.1": failed to resolve reference "docker.io/rancher/pause:3.1": failed to do request: Head https://registry-1.docker.io/v2/rancher/pause/manifests/3.1: x509: certificate signed by unknown authority
  • Problem: inside the container, the certificate of the corporate proxy cannot be validated

  • Possible Solution: Mounting the CA Certificate from your host into the node containers at start time via k3d cluster create --volume /path/to/your/certs.crt:/etc/ssl/certs/yourcert.crt
  • Issue: k3d-io/k3d#535

Spurious PID entries in /proc after deleting k3d cluster with shared mounts

  • When you perform cluster create and deletion operations multiple times with same cluster name and shared volume mounts, it was observed that grep k3d /proc/*/mountinfo shows many spurious entries
  • Problem: Due to above, at times you’ll see no space left on device: unknown when a pod is scheduled to the nodes
  • If you observe anything of above sort you can check for inaccessible file systems and unmount them by using below command (note: please remove xargs umount -l and check for the diff o/p first)
  • diff <(df -ha | grep pods | awk '{print $NF}') <(df -h | grep pods | awk '{print $NF}') | awk '{print $2}' | xargs umount -l
  • As per the conversation on k3d-io/k3d#594 above issue wasn’t reported/known earlier and so there are high chances that it’s not universal.

[SOLVED] Nodes fail to start or get stuck in NotReady state with log nf_conntrack_max: permission denied

Problem

  • When: This happens when running k3d on a Linux system with a kernel version >= 5.12.2 (and others like >= 5.11.19) when creating a new cluster
    • the node(s) stop or get stuck with a log line like this: <TIMESTAMP> F0516 05:05:31.782902 7 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
  • Why: The issue was introduced by a change in the Linux kernel (Changelog 5.12.2: Commit), that changed the netfilter_conntrack behavior in a way that kube-proxy is not able to set the nf_conntrack_max value anymore

Workaround

  • Workaround: as a workaround, we can tell kube-proxy to not even try to set this value:

    1. k3d cluster create \
    2. --k3s-arg "--kube-proxy-arg=conntrack-max-per-core=0@server:*" \
    3. --k3s-arg "--kube-proxy-arg=conntrack-max-per-core=0@agent:*" \
    4. --image rancher/k3s:v1.20.6-k3s

Fix

  • Note: k3d v4.4.5 already uses rancher/k3s:v1.21.1-k3s1 as the new default k3s image, so no workarounds needed there!

This is going to be fixed “upstream” in k3s itself in rancher/k3s#3337 and backported to k3s versions as low as v1.18.

DockerHub Pull Rate Limit

Problem

You’re deploying something to the cluster using an image from DockerHub and the image fails to be pulled, with a 429 response code and a message saying You have reached your pull rate limit. You may increase the limit by authenticating and upgrading.

Cause

This is caused by DockerHub’s pull rate limit (see https://docs.docker.com/docker-hub/download-rate-limit/), which limits pulls from unauthenticated/anonymous users to 100 pulls per hour and for authenticated users (not paying customers) to 200 pulls per hour (as of the time of writing).

Solution

a) use images from a private registry, e.g. configured as a pull-through cache for DockerHub
b) use a different public registry without such limitations, if the same image is stored there
c) authenticate containerd inside k3s/k3d to use your DockerHub user

(c) Authenticate Containerd against DockerHub

  1. Create a registry configuration file for containerd:

    1. # saved as e.g. $HOME/registries.yaml
    2. configs:
    3. "docker.io":
    4. auth:
    5. username: "$USERNAME"
    6. password: "$PASSWORD"
  2. Create a k3d cluster using that config:

    1. k3d cluster create --registry-config $HOME/registries.yaml
  3. Profit. That’s it. In the test for this, we pulled the same image 120 times in a row (confirmed, that pull numbers went up), without being rate limited (as a non-paying, normal user)

Longhorn in k3d

Problem

Longhorn is not working when deployed in a K3s cluster spawned with k3d.

Cause

The container image of K3s is quite limited and doesn’t contain the necessary libraries. Also, additional volume mounts and more would be required to get Longhorn up and running properly.
So basically Longhorn does rely too much on the host OS to work properly in the dockerized environment without quite some modifications.

Solution

There are a few ways one can build a working image to use with k3d.
See https://github.com/k3d-io/k3d/discussions/478 for more info.


Last update: February 17, 2022