Manage HugePages

Configure and manage huge pages as a schedulable resource in a cluster.

FEATURE STATE: Kubernetes v1.14 [stable] (enabled by default: true)

Kubernetes supports the allocation and consumption of pre-allocated huge pages by applications in a Pod. This page describes how users can consume huge pages.

Before you begin

Kubernetes nodes must pre-allocate huge pages in order for the node to report its huge page capacity.

A node can pre-allocate huge pages for multiple sizes, for instance, the following line in /etc/default/grub allocates 2*1GiB of 1 GiB and 512*2 MiB of 2 MiB pages:

  1. GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=512"

The nodes will automatically discover and report all huge page resources as schedulable resources.

When you describe the Node, you should see something similar to the following in the following in the Capacity and Allocatable sections:

  1. Capacity:
  2. cpu: ...
  3. ephemeral-storage: ...
  4. hugepages-1Gi: 2Gi
  5. hugepages-2Mi: 1Gi
  6. memory: ...
  7. pods: ...
  8. Allocatable:
  9. cpu: ...
  10. ephemeral-storage: ...
  11. hugepages-1Gi: 2Gi
  12. hugepages-2Mi: 1Gi
  13. memory: ...
  14. pods: ...

Note:

For dynamically allocated pages (after boot), the Kubelet needs to be restarted for the new allocations to be refrelected.

API

Huge pages can be consumed via container level resource requirements using the resource name hugepages-<size>, where <size> is the most compact binary notation using integer values supported on a particular node. For example, if a node supports 2048KiB and 1048576KiB page sizes, it will expose a schedulable resources hugepages-2Mi and hugepages-1Gi. Unlike CPU or memory, huge pages do not support overcommit. Note that when requesting hugepage resources, either memory or CPU resources must be requested as well.

A pod may consume multiple huge page sizes in a single pod spec. In this case it must use medium: HugePages-<hugepagesize> notation for all volume mounts.

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: huge-pages-example
  5. spec:
  6. containers:
  7. - name: example
  8. image: fedora:latest
  9. command:
  10. - sleep
  11. - inf
  12. volumeMounts:
  13. - mountPath: /hugepages-2Mi
  14. name: hugepage-2mi
  15. - mountPath: /hugepages-1Gi
  16. name: hugepage-1gi
  17. resources:
  18. limits:
  19. hugepages-2Mi: 100Mi
  20. hugepages-1Gi: 2Gi
  21. memory: 100Mi
  22. requests:
  23. memory: 100Mi
  24. volumes:
  25. - name: hugepage-2mi
  26. emptyDir:
  27. medium: HugePages-2Mi
  28. - name: hugepage-1gi
  29. emptyDir:
  30. medium: HugePages-1Gi

A pod may use medium: HugePages only if it requests huge pages of one size.

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: huge-pages-example
  5. spec:
  6. containers:
  7. - name: example
  8. image: fedora:latest
  9. command:
  10. - sleep
  11. - inf
  12. volumeMounts:
  13. - mountPath: /hugepages
  14. name: hugepage
  15. resources:
  16. limits:
  17. hugepages-2Mi: 100Mi
  18. memory: 100Mi
  19. requests:
  20. memory: 100Mi
  21. volumes:
  22. - name: hugepage
  23. emptyDir:
  24. medium: HugePages
  • Huge page requests must equal the limits. This is the default if limits are specified, but requests are not.
  • Huge pages are isolated at a container scope, so each container has own limit on their cgroup sandbox as requested in a container spec.
  • EmptyDir volumes backed by huge pages may not consume more huge page memory than the pod request.
  • Applications that consume huge pages via shmget() with SHM_HUGETLB must run with a supplemental group that matches proc/sys/vm/hugetlb_shm_group.
  • Huge page usage in a namespace is controllable via ResourceQuota similar to other compute resources like cpu or memory using the hugepages-<size> token.