Understanding ephemeral storage
Overview
In addition to persistent storage, pods and containers can require ephemeral or transient local storage for their operation. The lifetime of this ephemeral storage does not extend beyond the life of the individual pod, and this ephemeral storage cannot be shared across pods.
Pods use ephemeral local storage for scratch space, caching, and logs. Issues related to the lack of local storage accounting and isolation include the following:
Pods do not know how much local storage is available to them.
Pods cannot request guaranteed local storage.
Local storage is a best effort resource.
Pods can be evicted due to other pods filling the local storage, after which new pods are not admitted until sufficient storage has been reclaimed.
Unlike persistent volumes, ephemeral storage is unstructured and the space is shared between all pods running on a node, in addition to other uses by the system, the container runtime, and OKD. The ephemeral storage framework allows pods to specify their transient local storage needs. It also allows OKD to schedule pods where appropriate, and to protect the node against excessive use of local storage.
While the ephemeral storage framework allows administrators and developers to better manage this local storage, it does not provide any promises related to I/O throughput and latency.
Types of ephemeral storage
Ephemeral local storage is always made available in the primary partition. There are two basic ways of creating the primary partition: root and runtime.
Root
This partition holds the kubelet root directory, /var/lib/kubelet/
by default, and /var/log/
directory. This partition can be shared between user pods, the OS, and Kubernetes system daemons. This partition can be consumed by pods through EmptyDir
volumes, container logs, image layers, and container-writable layers. Kubelet manages shared access and isolation of this partition. This partition is ephemeral, and applications cannot expect any performance SLAs, such as disk IOPS, from this partition.
Runtime
This is an optional partition that runtimes can use for overlay file systems. OKD attempts to identify and provide shared access along with isolation to this partition. Container image layers and writable layers are stored here. If the runtime partition exists, the root
partition does not hold any image layer or other writable storage.
Ephemeral storage management
Cluster administrators can manage ephemeral storage within a project by setting quotas that define the limit ranges and number of requests for ephemeral storage across all pods in a non-terminal state. Developers can also set requests and limits on this compute resource at the pod and container level.
You can manage local ephemeral storage by specifying requests and limits. Each container in a pod can specify the following:
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage
Limits and requests for ephemeral storage are measured in byte quantities. You can express storage as a plain integer or as a fixed-point number using one of these suffixes: E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, the following quantities all represent approximately the same value: 128974848, 129e6, 129M, and 123Mi. The case of the suffixes is significant. If you specify 400m of ephemeral storage, this requests 0.4 bytes, rather than 400 mebibytes (400Mi) or 400 megabytes (400M), which was probably what was intended.
The following example shows a pod with two containers. Each container requests 2GiB of local ephemeral storage. Each container has a limit of 4GiB of local ephemeral storage. Therefore, the pod has a request of 4GiB of local ephemeral storage, and a limit of 8GiB of local ephemeral storage.
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
ephemeral-storage: "2Gi" (1)
limits:
ephemeral-storage: "4Gi" (2)
volumeMounts:
- name: ephemeral
mountPath: "/tmp"
- name: log-aggregator
image: images.my-company.example/log-aggregator:v6
resources:
requests:
ephemeral-storage: "2Gi" (1)
volumeMounts:
- name: ephemeral
mountPath: "/tmp"
volumes:
- name: ephemeral
emptyDir: {}
1 | Request for local ephemeral storage. |
2 | Limit for local ephemeral storage. |
This setting in the pod spec affects how the scheduler makes a decision on scheduling pods, and also how kubelet evict pods. First of all, the scheduler ensures that the sum of the resource requests of the scheduled containers is less than the capacity of the node. In this case, the pod can be assigned to a node only if its available ephemeral storage (allocatable resource) is more than 4GiB.
Secondly, at the container level, since the first container sets resource limit, kubelet eviction manager measures the disk usage of this container and evicts the pod if the storage usage of this container exceeds its limit (4GiB). At the pod level, kubelet works out an overall pod storage limit by adding up the limits of all the containers in that pod. In this case, the total storage usage at the pod level is the sum of the disk usage from all containers plus the pod’s emptyDir
volumes. If this total usage exceeds the overall pod storage limit (4GiB), then the kubelet also marks the pod for eviction.
For information about defining quotas for projects, see Quota setting per project.
Monitoring ephemeral storage
You can use /bin/df
as a tool to monitor ephemeral storage usage on the volume where ephemeral container data is located, which is /var/lib/kubelet
and /var/lib/containers
. The available space for only /var/lib/kubelet
is shown when you use the df
command if /var/lib/containers
is placed on a separate disk by the cluster administrator.
To show the human-readable values of used and available space in /var/lib
, enter the following command:
$ df -h /var/lib
The output shows the ephemeral storage usage in /var/lib
:
Example output
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 69G 32G 34G 49% /