Filesystems, Disks and Volumes

Making persistent storage in the cluster (volumes) accessible to VMs consists of three parts. First, volumes are specified in spec.volumes. Second, disks are added to the VM by specifying them in spec.domain.devices.disks. Finally, a reference to the specified volume is added to the disk specification by name.

Disks

Like all other vmi devices a spec.domain.devices.disks element has a mandatory name, and furthermore, the disk’s name must reference the name of a volume inside spec.volumes.

A disk can be made accessible via four different types:

All possible configuration options are available in the Disk API Reference.

All types allow you to specify the bus attribute. The bus attribute determines how the disk will be presented to the guest operating system.

lun

A lun disk will expose the volume as a LUN device to the VM. This allows the VM to execute arbitrary iSCSI command passthrough.

A minimal example which attaches a PersistentVolumeClaim named mypvc as a lun device to the VM:

  1. metadata:
  2. name: testvmi-lun
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mypvcdisk
  13. # This makes it a lun device
  14. lun: {}
  15. volumes:
  16. - name: mypvcdisk
  17. persistentVolumeClaim:
  18. claimName: mypvc

persistent reservation

It is possible to reserve a LUN through the the SCSI Persistent Reserve commands. In order to issue privileged SCSI ioctls, the VM requires activation of the persistent resevation flag:

  1. devices:
  2. disks:
  3. - name: mypvcdisk
  4. lun:
  5. reservation: true

This feature is enabled by the feature gate PersistentReservation:

  1. configuration:
  2. developerConfiguration:
  3. featureGates:
  4. - PersistentReservation

Note: The persistent reservation feature enables an additional privileged component to be deployed together with virt-handler. Because this feature allows for sensitive security procedures, it is disabled by default and requires cluster administrator configuration.

disk

A disk disk will expose the volume as an ordinary disk to the VM.

A minimal example which attaches a PersistentVolumeClaim named mypvc as a disk device to the VM:

  1. metadata:
  2. name: testvmi-disk
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mypvcdisk
  13. # This makes it a disk
  14. disk: {}
  15. volumes:
  16. - name: mypvcdisk
  17. persistentVolumeClaim:
  18. claimName: mypvc

You can set the disk bus type, overriding the defaults, which in turn depends on the chipset the VM is configured to use:

  1. metadata:
  2. name: testvmi-disk
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mypvcdisk
  13. # This makes it a disk
  14. disk:
  15. # This makes it exposed as /dev/vda, being the only and thus first
  16. # disk attached to the VM
  17. bus: virtio
  18. volumes:
  19. - name: mypvcdisk
  20. persistentVolumeClaim:
  21. claimName: mypvc

cdrom

A cdrom disk will expose the volume as a cdrom drive to the VM. It is read-only by default.

A minimal example which attaches a PersistentVolumeClaim named mypvc as a cdrom device to the VM:

  1. metadata:
  2. name: testvmi-cdrom
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mypvcdisk
  13. # This makes it a cdrom
  14. cdrom:
  15. # This makes the cdrom writeable
  16. readonly: false
  17. # This makes the cdrom be exposed as SATA device
  18. bus: sata
  19. volumes:
  20. - name: mypvcdisk
  21. persistentVolumeClaim:
  22. claimName: mypvc

filesystems

A filesystem device will expose the volume as a filesystem to the VM. filesystems rely on virtiofs to make visible external filesystems to KubeVirt VMs. Further information about virtiofs can be found at the Official Virtiofs Site.

Compared with disk, filesystems allow changes in the source to be dynamically reflected in the volumes inside the VM. For instance, if a given configMap is shared with filesystems any change made on it will be reflected in the VMs. However, it is important to note that filesystems do not allow live migration.

Additionally, filesystem devices must be mounted inside the VM. This can be done through cloudInitNoCloud or manually connecting to the VM shell and targeting the same command. The main challenge is to understand how the device tag used to identify the new filesystem and mount it with the mount -t virtiofs [device tag] [path] command. For that purpose, the tag is assigned to the filesystem in the VM spec spec.domain.devices.filesystems.name. For instance, if in a given VM spec is spec.domain.devices.filesystems.name: foo, the required command inside the VM to mount the filesystem in the /tmp/foo path will be mount -t virtiofs foo /tmp/foo:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. name: testvmi-filesystems
  5. spec:
  6. domain:
  7. devices:
  8. filesystems:
  9. - name: foo
  10. virtiofs: {}
  11. disks:
  12. - name: containerdisk
  13. disk:
  14. bus: virtio
  15. - name: cloudinitdisk
  16. disk:
  17. bus: virtio
  18. volumes:
  19. - containerDisk:
  20. image: quay.io/containerdisks/fedora:latest
  21. name: containerdisk
  22. - cloudInitNoCloud:
  23. userData: |-
  24. #cloud-config
  25. chpasswd:
  26. expire: false
  27. password: fedora
  28. user: fedora
  29. bootcmd:
  30. - "sudo mkdir /tmp/foo"
  31. - "sudo mount -t virtiofs foo /tmp/foo"
  32. - persistentVolumeClaim:
  33. claimName: mypvc
  34. name: foo

Note: As stated, filesystems rely on virtiofs. Moreover, virtiofs requires kernel linux support to work in the VM. To check if the linux image of the VM has the required support, you can address the following command: modprobe virtiofs. If the command output is modprobe: FATAL: Module virtiofs not found, the linux image of the VM does not support virtiofs. Also, you can check if the kernel version is up to 5.4 in any linux distribution or up to 4.18 in centos/rhel. To check this, you can target the following command: uname -r.

Refer to section Sharing Directories with VMs for usage examples of filesystems.

error policy

The error policy controls how the hypervisor should behave when an IO error occurs on a disk read or write. The default behaviour is to stop the guest and a Kubernetes event is generated. However, it is possible to change the value to either:

  • report: the error is reported in the guest
  • ignore: the error is ignored, but the read/write failure goes undetected
  • enospace: error when there isn’t enough space on the disk

The error policy can be specified per disk or lun.

Example:

  1. spec:
  2. domain:
  3. devices:
  4. disks:
  5. - disk:
  6. bus: virtio
  7. name: containerdisk
  8. errorPolicy: "report"
  9. - lun:
  10. bus: scsi
  11. name: scsi-disk
  12. errorPolicy: "report"

Volumes

Supported volume sources are

All possible configuration options are available in the Volume API Reference.

cloudInitNoCloud

Allows attaching cloudInitNoCloud data-sources to the VM. If the VM contains a proper cloud-init setup, it will pick up the disk as a user-data source.

A simple example which attaches a Secret as a cloud-init disk datasource may look like this:

  1. metadata:
  2. name: testvmi-cloudinitnocloud
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mybootdisk
  13. lun: {}
  14. - name: mynoclouddisk
  15. disk: {}
  16. volumes:
  17. - name: mybootdisk
  18. persistentVolumeClaim:
  19. claimName: mypvc
  20. - name: mynoclouddisk
  21. cloudInitNoCloud:
  22. secretRef:
  23. name: testsecret

cloudInitConfigDrive

Allows attaching cloudInitConfigDrive data-sources to the VM. If the VM contains a proper cloud-init setup, it will pick up the disk as a user-data source.

A simple example which attaches a Secret as a cloud-init disk datasource may look like this:

  1. metadata:
  2. name: testvmi-cloudinitconfigdrive
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mybootdisk
  13. lun: {}
  14. - name: myconfigdrivedisk
  15. disk: {}
  16. volumes:
  17. - name: mybootdisk
  18. persistentVolumeClaim:
  19. claimName: mypvc
  20. - name: myconfigdrivedisk
  21. cloudInitConfigDrive:
  22. secretRef:
  23. name: testsecret

The cloudInitConfigDrive can also be used to configure VMs with Ignition. You just need to replace the cloud-init data by the Ignition data.

persistentVolumeClaim

Allows connecting a PersistentVolumeClaim to a VM disk.

Use a PersistentVolumeClaim when the VirtualMachineInstance’s disk needs to persist after the VM terminates. This allows for the VM’s data to remain persistent between restarts.

A PersistentVolume can be in “filesystem” or “block” mode:

  • Filesystem: For KubeVirt to be able to consume the disk present on a PersistentVolume’s filesystem, the disk must be named disk.img and be placed in the root path of the filesystem. Currently the disk is also required to be in raw format. > Important: The disk.img image file needs to be owned by the user-id 107 in order to avoid permission issues.

    Note: If the disk.img image file has not been created manually before starting a VM then it will be created automatically with the PersistentVolumeClaim size. Since not every storage provisioner provides volumes with the exact usable amount of space as requested (e.g. due to filesystem overhead), KubeVirt tolerates up to 10% less available space. This can be configured with the developerConfiguration.pvcTolerateLessSpaceUpToPercent value in the KubeVirt CR (kubectl edit kubevirt kubevirt -n kubevirt).

  • Block: Use a block volume for consuming raw block devices. Note: you need to enable the BlockVolume feature gate.

A simple example which attaches a PersistentVolumeClaim as a disk may look like this:

  1. metadata:
  2. name: testvmi-pvc
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mypvcdisk
  13. lun: {}
  14. volumes:
  15. - name: mypvcdisk
  16. persistentVolumeClaim:
  17. claimName: mypvc

Thick and thin volume provisioning

Sparsification can make a disk thin-provisioned, in other words it allows to convert the freed space within the disk image into free space back on the host. The fstrim utility can be used on a mounted filesystem to discard the blocks not used by the filesystem. In order to be able to sparsify a disk inside the guest, the disk needs to be configured in the libvirt xml with the option discard=unmap. In KubeVirt, every disk is passed as default with this option enabled. It is possible to check if the trim configuration is supported in the guest by runninglsblk -D, and check the discard options supported on every disk.

Example:

  1. $ lsblk -D
  2. NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
  3. loop0 0 4K 4G 0
  4. loop1 0 64K 4M 0
  5. sr0 0 0B 0B 0
  6. rbd0 0 64K 4M 0
  7. vda 512 512B 2G 0
  8. └─vda1 0 512B 2G 0

However, in certain cases like preallocaton or when the disk is thick provisioned, the option needs to be disabled. The disk’s PVC has to be marked with an annotation that contains /storage.preallocation or /storage.thick-provisioned, and set to true. If the volume is preprovisioned using CDI and the preallocation is enabled, then the PVC is automatically annotated with: cdi.kubevirt.io/storage.preallocation: true and the discard passthrough option is disabled.

Example of a PVC definition with the annotation to disable discard passthrough:

  1. apiVersion: v1
  2. kind: PersistentVolumeClaim
  3. metadata:
  4. name: pvc
  5. annotations:
  6. user.custom.annotation/storage.thick-provisioned: "true"
  7. spec:
  8. storageClassName: local
  9. accessModes:
  10. - ReadWriteOnce
  11. volumeMode: Filesystem
  12. resources:
  13. requests:
  14. storage: 1Gi

disk expansion

For some storage methods, Kubernetes may support expanding storage in-use (allowVolumeExpansion feature). KubeVirt can respond to it by making the additional storage available for the virtual machines. This feature is currently off by default, and requires enabling a feature gate. To enable it, add the ExpandDisks feature gate in the kubevirt object:

  1. spec:
  2. configuration:
  3. developerConfiguration:
  4. featureGates:
  5. - ExpandDisks

Enabling this feature does two things: - Notify the virtual machine about size changes - If the disk is a Filesystem PVC, the matching file is expanded to the remaining size (while reserving some space for file system overhead).

Statically provisioned block PVCs

To use an externally managed local block device from a host ( e.g. /dev/sdb , zvol, LVM, etc… ) in a VM directly, you would need a provisioner that supports block devices, such as OpenEBS LocalPV.

Alternatively, local volumes can be provisioned by hand. I.e. the following PVC:

  1. apiVersion: v1
  2. kind: PersistentVolumeClaim
  3. metadata:
  4. name: myblock
  5. spec:
  6. storageClassName: local-device
  7. volumeMode: Block
  8. accessModes:
  9. - ReadWriteOnce
  10. resources:
  11. requests:
  12. storage: 100Gi

can claim a PersistentVolume pre-created by a cluster admin like so:

  1. apiVersion: storage.k8s.io/v1
  2. kind: StorageClass
  3. metadata:
  4. name: local-device
  5. provisioner: kubernetes.io/no-provisioner
  6. ---
  7. apiVersion: v1
  8. kind: PersistentVolume
  9. metadata:
  10. name: myblock
  11. spec:
  12. volumeMode: Block
  13. storageClassName: local-device
  14. nodeAffinity:
  15. required:
  16. nodeSelectorTerms:
  17. - matchExpressions:
  18. - key: kubernetes.io/hostname
  19. operator: In
  20. values:
  21. - my-node
  22. accessModes:
  23. - ReadWriteOnce
  24. capacity:
  25. storage: 100Gi
  26. local:
  27. path: /dev/sdb

dataVolume

DataVolumes are a way to automate importing virtual machine disks onto PVCs during the virtual machine’s launch flow. Without using a DataVolume, users have to prepare a PVC with a disk image before assigning it to a VM or VMI manifest. With a DataVolume, both the PVC creation and import is automated on behalf of the user.

DataVolume VM Behavior

DataVolumes can be defined in the VM spec directly by adding the DataVolumes to the dataVolumeTemplates list. Below is an example.

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachine
  3. metadata:
  4. labels:
  5. kubevirt.io/vm: vm-alpine-datavolume
  6. name: vm-alpine-datavolume
  7. spec:
  8. runStrategy: Halted
  9. template:
  10. metadata:
  11. labels:
  12. kubevirt.io/vm: vm-alpine-datavolume
  13. spec:
  14. domain:
  15. devices:
  16. disks:
  17. - disk:
  18. bus: virtio
  19. name: datavolumedisk1
  20. resources:
  21. requests:
  22. memory: 64M
  23. volumes:
  24. - dataVolume:
  25. name: alpine-dv
  26. name: datavolumedisk1
  27. dataVolumeTemplates:
  28. - metadata:
  29. name: alpine-dv
  30. spec:
  31. storage:
  32. resources:
  33. requests:
  34. storage: 2Gi
  35. source:
  36. http:
  37. url: http://cdi-http-import-server.kubevirt/images/alpine.iso

You can see the DataVolume defined in the dataVolumeTemplates section has two parts. The source and pvc

The source part declares that there is a disk image living on an http server that we want to use as a volume for this VM. The pvc part declares the spec that should be used to create the PVC that hosts the source data.

When this VM manifest is posted to the cluster, as part of the launch flow a PVC will be created using the spec provided and the source data will be automatically imported into that PVC before the VM starts. When the VM is deleted, the storage provisioned by the DataVolume will automatically be deleted as well.

DataVolume VMI Behavior

For a VMI object, DataVolumes can be referenced as a volume source for the VMI. When this is done, it is expected that the referenced DataVolume exists in the cluster. The VMI will consume the DataVolume, but the DataVolume’s life-cycle will not be tied to the VMI.

Below is an example of a DataVolume being referenced by a VMI. It is expected that the DataVolume alpine-datavolume was created prior to posting the VMI manifest to the cluster. It is okay to post the VMI manifest to the cluster while the DataVolume is still having data imported. KubeVirt knows not to start the VMI until all referenced DataVolumes have finished their clone and import phases.

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-alpine-datavolume
  6. name: vmi-alpine-datavolume
  7. spec:
  8. domain:
  9. devices:
  10. disks:
  11. - disk:
  12. bus: virtio
  13. name: disk1
  14. machine:
  15. type: ""
  16. resources:
  17. requests:
  18. memory: 64M
  19. terminationGracePeriodSeconds: 0
  20. volumes:
  21. - name: disk1
  22. dataVolume:
  23. name: alpine-datavolume

Enabling DataVolume support.

A DataVolume is a custom resource provided by the Containerized Data Importer (CDI) project. KubeVirt integrates with CDI in order to provide users a workflow for dynamically creating PVCs and importing data into those PVCs.

In order to take advantage of the DataVolume volume source on a VM or VMI, CDI must be installed.

Installing CDI

Go to the CDI release page

Pick the latest stable release and post the corresponding cdi-controller-deployment.yaml manifest to your cluster.

ephemeral

An ephemeral volume is a local COW (copy on write) image that uses a network volume as a read-only backing store. With an ephemeral volume, the network backing store is never mutated. Instead all writes are stored on the ephemeral image which exists on local storage. KubeVirt dynamically generates the ephemeral images associated with a VM when the VM starts, and discards the ephemeral images when the VM stops.

Ephemeral volumes are useful in any scenario where disk persistence is not desired. The COW image is discarded when VM reaches a final state (e.g., succeeded, failed).

Currently, only PersistentVolumeClaim may be used as a backing store of the ephemeral volume.

Up-to-date information on supported backing stores can be found in the KubeVirt API.

  1. metadata:
  2. name: testvmi-ephemeral-pvc
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: mypvcdisk
  13. lun: {}
  14. volumes:
  15. - name: mypvcdisk
  16. ephemeral:
  17. persistentVolumeClaim:
  18. claimName: mypvc

containerDisk

containerDisk was originally registryDisk, please update your code when needed.

The containerDisk feature provides the ability to store and distribute VM disks in the container image registry. containerDisks can be assigned to VMs in the disks section of the VirtualMachineInstance spec.

No network shared storage devices are utilized by containerDisks. The disks are pulled from the container registry and reside on the local node hosting the VMs that consume the disks.

When to use a containerDisk

containerDisks are ephemeral storage devices that can be assigned to any number of active VirtualMachineInstances. This makes them an ideal tool for users who want to replicate a large number of VM workloads that do not require persistent data. containerDisks are commonly used in conjunction with VirtualMachineInstanceReplicaSets.

When Not to use a containerDisk

containerDisks are not a good solution for any workload that requires persistent root disks across VM restarts.

containerDisk Workflow Example

Users can inject a VirtualMachineInstance disk into a container image in a way that is consumable by the KubeVirt runtime. Disks must be placed into the /disk directory inside the container. Raw and qcow2 formats are supported. Qcow2 is recommended in order to reduce the container image’s size. containerdisks can and should be based on scratch. No content except the image is required.

Note: Prior to kubevirt 0.20, the containerDisk image needed to have kubevirt/container-disk-v1alpha as base image.

Note: The containerDisk needs to be readable for the user with the UID 107 (qemu).

Example: Inject a local VirtualMachineInstance disk into a container image.

  1. cat << END > Dockerfile
  2. FROM scratch
  3. ADD --chown=107:107 fedora25.qcow2 /disk/
  4. END
  5. docker build -t vmidisks/fedora25:latest .

Example: Inject a remote VirtualMachineInstance disk into a container image.

  1. cat << END > Dockerfile
  2. FROM scratch
  3. ADD --chown=107:107 https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2 /disk/
  4. END

Example: Upload the ContainerDisk container image to a registry.

  1. docker push vmidisks/fedora25:latest

Example: Attach the ContainerDisk as an ephemeral disk to a VM.

  1. metadata:
  2. name: testvmi-containerdisk
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: containerdisk
  13. disk: {}
  14. volumes:
  15. - name: containerdisk
  16. containerDisk:
  17. image: vmidisks/fedora25:latest

Note that a containerDisk is file-based and therefore cannot be attached as a lun device to the VM.

Custom disk image path

ContainerDisk also allows to store disk images in any folder, when required. The process is the same as previous. The main difference is, that in custom location, kubevirt does not scan for any image. It is your responsibility to provide full path for the disk image. Providing image path is optional. When no path is provided, kubevirt searches for disk images in default location: /disk.

Example: Build container disk image:

  1. cat << END > Dockerfile
  2. FROM scratch
  3. ADD fedora25.qcow2 /custom-disk-path/fedora25.qcow2
  4. END
  5. docker build -t vmidisks/fedora25:latest .
  6. docker push vmidisks/fedora25:latest

Create VMI with container disk pointing to the custom location:

  1. metadata:
  2. name: testvmi-containerdisk
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. devices:
  11. disks:
  12. - name: containerdisk
  13. disk: {}
  14. volumes:
  15. - name: containerdisk
  16. containerDisk:
  17. image: vmidisks/fedora25:latest
  18. path: /custom-disk-path/fedora25.qcow2

emptyDisk

An emptyDisk works similar to an emptyDir in Kubernetes. An extra sparse qcow2 disk will be allocated and it will live as long as the VM. Thus it will survive guest side VM reboots, but not a VM re-creation. The disk capacity needs to be specified.

Example: Boot cirros with an extra emptyDisk with a size of 2GiB:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. name: testvmi-nocloud
  5. spec:
  6. terminationGracePeriodSeconds: 5
  7. domain:
  8. resources:
  9. requests:
  10. memory: 64M
  11. devices:
  12. disks:
  13. - name: containerdisk
  14. disk:
  15. bus: virtio
  16. - name: emptydisk
  17. disk:
  18. bus: virtio
  19. volumes:
  20. - name: containerdisk
  21. containerDisk:
  22. image: kubevirt/cirros-registry-disk-demo:latest
  23. - name: emptydisk
  24. emptyDisk:
  25. capacity: 2Gi

When to use an emptyDisk

Ephemeral VMs very often come with read-only root images and limited tmpfs space. In many cases this is not enough to install application dependencies and provide enough disk space for the application data. While this data is not critical and thus can be lost, it is still needed for the application to function properly during its lifetime. This is where an emptyDisk can be useful. An emptyDisk is often used and mounted somewhere in /var/lib or /var/run.

hostDisk

A hostDisk volume type provides the ability to create or use a disk image located somewhere on a node. It works similar to a hostPath in Kubernetes and provides two usage types:

  • DiskOrCreate if a disk image does not exist at a given location then create one

  • Disk a disk image must exist at a given location

Note: you need to enable the HostDisk feature gate.

Example: Create a 1Gi disk image located at /data/disk.img and attach it to a VM.

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-host-disk
  6. name: vmi-host-disk
  7. spec:
  8. domain:
  9. devices:
  10. disks:
  11. - disk:
  12. bus: virtio
  13. name: host-disk
  14. machine:
  15. type: ""
  16. resources:
  17. requests:
  18. memory: 64M
  19. terminationGracePeriodSeconds: 0
  20. volumes:
  21. - hostDisk:
  22. capacity: 1Gi
  23. path: /data/disk.img
  24. type: DiskOrCreate
  25. name: host-disk
  26. status: {}

Note: This does not always work as expected. Instead you may want to consider creating a PersistentVolume

configMap

A configMap is a reference to a ConfigMap in Kubernetes. A configMap can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk does not support dynamic change propagation and filesystem does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.

As a disk

By using disk, an extra iso disk will be allocated which has to be mounted on a VM. To mount the configMap users can use cloudInit and the disk’s serial number. The name needs to be set for a reference to the created kubernetes ConfigMap.

Note: Currently, ConfigMap update is not propagate into the VMI. If a ConfigMap is updated, only a pod will be aware of changes, not running VMIs.

Note: Due to a Kubernetes CRD issue, you cannot control the paths within the volume where ConfigMap keys are projected.

Example: Attach the configMap to a VM and use cloudInit to mount the iso disk:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. disks:
  11. - disk:
  12. bus: virtio
  13. name: containerdisk
  14. - disk:
  15. bus: virtio
  16. name: cloudinitdisk
  17. - disk:
  18. name: app-config-disk
  19. # set serial
  20. serial: CVLY623300HK240D
  21. machine:
  22. type: ""
  23. resources:
  24. requests:
  25. memory: 1024M
  26. terminationGracePeriodSeconds: 0
  27. volumes:
  28. - name: containerdisk
  29. containerDisk:
  30. image: kubevirt/fedora-cloud-container-disk-demo:latest
  31. - cloudInitNoCloud:
  32. userData: |-
  33. #cloud-config
  34. password: fedora
  35. chpasswd: { expire: False }
  36. bootcmd:
  37. # mount the ConfigMap
  38. - "sudo mkdir /mnt/app-config"
  39. - "sudo mount /dev/$(lsblk --nodeps -no name,serial | grep CVLY623300HK240D | cut -f1 -d' ') /mnt/app-config"
  40. name: cloudinitdisk
  41. - configMap:
  42. name: app-config
  43. name: app-config-disk
  44. status: {}

As a filesystem

By using filesystem, configMaps are shared through virtiofs. In contrast with using disk for sharing configMaps, filesystem allows you to dynamically propagate changes on configMaps to VMIs (i.e. the VM does not need to be rebooted).

Note: Currently, VMIs can not be live migrated since virtiofs does not support live migration.

To share a given configMap, the following VM definition could be used:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. filesystems:
  11. - name: config-fs
  12. virtiofs: {}
  13. disks:
  14. - disk:
  15. bus: virtio
  16. name: containerdisk
  17. machine:
  18. type: ""
  19. resources:
  20. requests:
  21. memory: 1024M
  22. terminationGracePeriodSeconds: 0
  23. volumes:
  24. - name: containerdisk
  25. containerDisk:
  26. image: quay.io/containerdisks/fedora:latest
  27. - cloudInitNoCloud:
  28. userData: |-
  29. #cloud-config
  30. chpasswd:
  31. expire: false
  32. password: fedora
  33. user: fedora
  34. bootcmd:
  35. # mount the ConfigMap
  36. - "sudo mkdir /mnt/app-config"
  37. - "sudo mount -t virtiofs config-fs /mnt/app-config"
  38. name: cloudinitdisk
  39. - configMap:
  40. name: app-config
  41. name: config-fs

secret

A secret is a reference to a Secret in Kubernetes. A secret can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk does not support dynamic change propagation and filesystem does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.

As a disk

By using disk, an extra iso disk will be allocated which has to be mounted on a VM. To mount the secret users can use cloudInit and the disks serial number. The secretName needs to be set for a reference to the created kubernetes Secret.

Note: Currently, Secret update propagation is not supported. If a Secret is updated, only a pod will be aware of changes, not running VMIs.

Note: Due to a Kubernetes CRD issue, you cannot control the paths within the volume where Secret keys are projected.

Example: Attach the secret to a VM and use cloudInit to mount the iso disk:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. disks:
  11. - disk:
  12. bus: virtio
  13. name: containerdisk
  14. - disk:
  15. bus: virtio
  16. name: cloudinitdisk
  17. - disk:
  18. name: app-secret-disk
  19. # set serial
  20. serial: D23YZ9W6WA5DJ487
  21. machine:
  22. type: ""
  23. resources:
  24. requests:
  25. memory: 1024M
  26. terminationGracePeriodSeconds: 0
  27. volumes:
  28. - name: containerdisk
  29. containerDisk:
  30. image: kubevirt/fedora-cloud-container-disk-demo:latest
  31. - cloudInitNoCloud:
  32. userData: |-
  33. #cloud-config
  34. password: fedora
  35. chpasswd: { expire: False }
  36. bootcmd:
  37. # mount the Secret
  38. - "sudo mkdir /mnt/app-secret"
  39. - "sudo mount /dev/$(lsblk --nodeps -no name,serial | grep D23YZ9W6WA5DJ487 | cut -f1 -d' ') /mnt/app-secret"
  40. name: cloudinitdisk
  41. - secret:
  42. secretName: app-secret
  43. name: app-secret-disk
  44. status: {}

As a filesystem

By using filesystem, secrets are shared through virtiofs. In contrast with using disk for sharing secrets, filesystem allows you to dynamically propagate changes on secrets to VMIs (i.e. the VM does not need to be rebooted).

Note: Currently, VMIs can not be live migrated since virtiofs does not support live migration.

To share a given secret, the following VM definition could be used:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. filesystems:
  11. - name: app-secret-fs
  12. virtiofs: {}
  13. disks:
  14. - disk:
  15. bus: virtio
  16. name: containerdisk
  17. machine:
  18. type: ""
  19. resources:
  20. requests:
  21. memory: 1024M
  22. terminationGracePeriodSeconds: 0
  23. volumes:
  24. - name: containerdisk
  25. containerDisk:
  26. image: quay.io/containerdisks/fedora:latest
  27. - cloudInitNoCloud:
  28. userData: |-
  29. #cloud-config
  30. chpasswd:
  31. expire: false
  32. password: fedora
  33. user: fedora
  34. bootcmd:
  35. # mount the Secret
  36. - "sudo mkdir /mnt/app-secret"
  37. - "sudo mount -t virtiofs app-secret-fs /mnt/app-secret"
  38. name: cloudinitdisk
  39. - secret:
  40. secretName: app-secret
  41. name: app-secret-fs

serviceAccount

A serviceAccount volume references a Kubernetes ServiceAccount. A serviceAccount can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk does not support dynamic change propagation and filesystem does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.

As a disk

By using disk, a new iso disk will be allocated with the content of the service account (namespace, token and ca.crt), which needs to be mounted in the VM. For automatic mounting, see the configMap and secret examples above.

Note: Currently, ServiceAccount update propagation is not supported. If a ServiceAccount is updated, only a pod will be aware of changes, not running VMIs.

Example:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. disks:
  11. - disk:
  12. name: containerdisk
  13. - disk:
  14. name: serviceaccountdisk
  15. machine:
  16. type: ""
  17. resources:
  18. requests:
  19. memory: 1024M
  20. terminationGracePeriodSeconds: 0
  21. volumes:
  22. - name: containerdisk
  23. containerDisk:
  24. image: kubevirt/fedora-cloud-container-disk-demo:latest
  25. - name: serviceaccountdisk
  26. serviceAccount:
  27. serviceAccountName: default

As a filesystem

By using filesystem, serviceAccounts are shared through virtiofs. In contrast with using disk for sharing serviceAccounts, filesystem allows you to dynamically propagate changes on serviceAccounts to VMIs (i.e. the VM does not need to be rebooted).

Note: Currently, VMIs can not be live migrated since virtiofs does not support live migration.

To share a given serviceAccount, the following VM definition could be used:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. filesystems:
  11. - name: serviceaccount-fs
  12. virtiofs: {}
  13. disks:
  14. - disk:
  15. bus: virtio
  16. name: containerdisk
  17. machine:
  18. type: ""
  19. resources:
  20. requests:
  21. memory: 1024M
  22. terminationGracePeriodSeconds: 0
  23. volumes:
  24. - name: containerdisk
  25. containerDisk:
  26. image: quay.io/containerdisks/fedora:latest
  27. - cloudInitNoCloud:
  28. userData: |-
  29. #cloud-config
  30. chpasswd:
  31. expire: false
  32. password: fedora
  33. user: fedora
  34. bootcmd:
  35. # mount the ConfigMap
  36. - "sudo mkdir /mnt/serviceaccount"
  37. - "sudo mount -t virtiofs serviceaccount-fs /mnt/serviceaccount"
  38. name: cloudinitdisk
  39. - name: serviceaccount-fs
  40. serviceAccount:
  41. serviceAccountName: default

downwardMetrics

downwardMetrics expose a limited set of VM and host metrics to the guest. The format is compatible with vhostmd.

Getting a limited set of host and VM metrics is in some cases required to allow third-parties diagnosing performance issues on their appliances. One prominent example is SAP HANA.

In order to expose downwardMetrics to VMs, the methods disk and virtio-serial port are supported.

Note: The DownwardMetrics feature gate must be enabled to use the metrics. Available starting with KubeVirt v0.42.0.

Disk

A volume is created, and it is exposed to the guest as a raw block volume. KubeVirt will update it periodically (by default, every 5 seconds).

Example:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. disks:
  11. - disk:
  12. bus: virtio
  13. name: containerdisk
  14. - disk:
  15. bus: virtio
  16. name: metrics
  17. machine:
  18. type: ""
  19. resources:
  20. requests:
  21. memory: 1024M
  22. terminationGracePeriodSeconds: 0
  23. volumes:
  24. - name: containerdisk
  25. containerDisk:
  26. image: quay.io/containerdisks/fedora:latest
  27. - name: metrics
  28. downwardMetrics: {}

Virtio-serial port

This method uses a virtio-serial port to expose the metrics data to the VM. KubeVirt creates a port named /dev/virtio-ports/org.github.vhostmd.1 inside the VM, in which the Virtio Transport protocol is supported. downwardMetrics can be retrieved from this port. See vhostmd documentation under Virtio Transport for further information.

To expose the metrics using a virtio-serial port, a downwardMetrics device must be added (i.e., spec.domain.devices.downwardMetrics: {}).

Example:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-fedora
  6. name: vmi-fedora
  7. spec:
  8. domain:
  9. devices:
  10. downwardMetrics: {}
  11. disks:
  12. - disk:
  13. bus: virtio
  14. name: containerdisk
  15. machine:
  16. type: ""
  17. resources:
  18. requests:
  19. memory: 1024M
  20. terminationGracePeriodSeconds: 0
  21. volumes:
  22. - name: containerdisk
  23. containerDisk:
  24. image: quay.io/containerdisks/fedora:latest

Accessing Metrics Data

To access the DownwardMetrics shared with a disk or a virtio-serial port, the vm-dump-metrics tool can be used:

  1. $ sudo dnf install -y vm-dump-metrics
  2. $ sudo vm-dump-metrics
  3. <metrics>
  4. <metric type="string" context="host">
  5. <name>HostName</name>
  6. <value>node01</value>
  7. [...]
  8. <metric type="int64" context="host" unit="s">
  9. <name>Time</name>
  10. <value>1619008605</value>
  11. </metric>
  12. <metric type="string" context="host">
  13. <name>VirtualizationVendor</name>
  14. <value>kubevirt.io</value>
  15. </metric>
  16. </metrics>

vm-dump-metrics is useful as a standalone tool to verify the serial port is working and to inspect the metrics. However, applications that consume metrics will usually connect to the virtio-serial port themselves.

Note: The tool vm-dump-metrics provides the option --virtio in case the virtio-serial port is used. Please, refer to vm-dump-metrics --help for further information.

High Performance Features

IOThreads

Libvirt has the ability to use IOThreads for dedicated disk access (for supported devices). These are dedicated event loop threads that perform block I/O requests and improve scalability on SMP systems. KubeVirt exposes this libvirt feature through the ioThreadsPolicy setting. Additionally, each Disk device exposes a dedicatedIOThread setting. This is a boolean that indicates the specified disk should be allocated an exclusive IOThread that will never be shared with other disks.

Currently valid policies are shared and auto. If ioThreadsPolicy is omitted entirely, use of IOThreads will be disabled. However, if any disk requests a dedicated IOThread, ioThreadsPolicy will be enabled and default to shared.

Shared

An ioThreadsPolicy of shared indicates that KubeVirt should use one thread that will be shared by all disk devices. This policy stems from the fact that large numbers of IOThreads is generally not useful as additional context switching is incurred for each thread.

Disks with dedicatedIOThread set to true will not use the shared thread, but will instead be allocated an exclusive thread. This is generally useful if a specific Disk is expected to have heavy I/O traffic, e.g. a database spindle.

Auto

auto IOThreads indicates that KubeVirt should use a pool of IOThreads and allocate disks to IOThreads in a round-robin fashion. The pool size is generally limited to twice the number of VCPU’s allocated to the VM. This essentially attempts to dedicate disks to separate IOThreads, but only up to a reasonable limit. This would come in to play for systems with a large number of disks and a smaller number of CPU’s for instance.

As a caveat to the size of the IOThread pool, disks with dedicatedIOThread will always be guaranteed their own thread. This effectively diminishes the upper limit of the number of threads allocated to the rest of the disks. For example, a VM with 2 CPUs would normally use 4 IOThreads for all disks. However if one disk had dedicatedIOThread set to true, then KubeVirt would only use 3 IOThreads for the shared pool.

There is always guaranteed to be at least one thread for disks that will use the shared IOThreads pool. Thus if a sufficiently large number of disks have dedicated IOThreads assigned, auto and shared policies would essentially result in the same layout.

IOThreads with Dedicated (pinned) CPUs

When guest’s vCPUs are pinned to a host’s physical CPUs, it is also best to pin the IOThreads to specific CPUs to prevent these from floating between the CPUs. KubeVirt will automatically calculate and pin each IOThread to a CPU or a set of CPUs, depending on the ration between them. In case there are more IOThreads than CPUs, each IOThread will be pinned to a CPU, in a round-robin fashion. Otherwise, when there are fewer IOThreads than CPU, each IOThread will be pinned to a set of CPUs.

IOThreads with QEMU Emulator thread and Dedicated (pinned) CPUs

To further improve the vCPUs latency, KubeVirt can allocate an additional dedicated physical CPU1, exclusively for the emulator thread, to which it will be pinned. This will effectively “isolate” the emulator thread from the vCPUs of the VMI. When ioThreadsPolicy is set to auto IOThreads will also be “isolated” from the vCPUs and placed on the same physical CPU as the QEMU emulator thread.

Examples

Shared IOThreads

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-shared
  6. name: vmi-shared
  7. spec:
  8. domain:
  9. ioThreadsPolicy: shared
  10. cpu:
  11. cores: 2
  12. devices:
  13. disks:
  14. - disk:
  15. bus: virtio
  16. name: vmi-shared_disk
  17. - disk:
  18. bus: virtio
  19. name: emptydisk
  20. dedicatedIOThread: true
  21. - disk:
  22. bus: virtio
  23. name: emptydisk2
  24. dedicatedIOThread: true
  25. - disk:
  26. bus: virtio
  27. name: emptydisk3
  28. - disk:
  29. bus: virtio
  30. name: emptydisk4
  31. - disk:
  32. bus: virtio
  33. name: emptydisk5
  34. - disk:
  35. bus: virtio
  36. name: emptydisk6
  37. machine:
  38. type: ""
  39. resources:
  40. requests:
  41. memory: 64M
  42. volumes:
  43. - name: vmi-shared_disk
  44. persistentVolumeClaim:
  45. claimName: vmi-shared_pvc
  46. - emptyDisk:
  47. capacity: 1Gi
  48. name: emptydisk
  49. - emptyDisk:
  50. capacity: 1Gi
  51. name: emptydisk2
  52. - emptyDisk:
  53. capacity: 1Gi
  54. name: emptydisk3
  55. - emptyDisk:
  56. capacity: 1Gi
  57. name: emptydisk4
  58. - emptyDisk:
  59. capacity: 1Gi
  60. name: emptydisk5
  61. - emptyDisk:
  62. capacity: 1Gi
  63. name: emptydisk6

In this example, emptydisk and emptydisk2 both request a dedicated IOThread. vmi-shared_disk, and emptydisk 3 through 6 will all shared one IOThread.

  1. mypvc: 1
  2. emptydisk: 2
  3. emptydisk2: 3
  4. emptydisk3: 1
  5. emptydisk4: 1
  6. emptydisk5: 1
  7. emptydisk6: 1

Auto IOThreads

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-shared
  6. name: vmi-shared
  7. spec:
  8. domain:
  9. ioThreadsPolicy: auto
  10. cpu:
  11. cores: 2
  12. devices:
  13. disks:
  14. - disk:
  15. bus: virtio
  16. name: mydisk
  17. - disk:
  18. bus: virtio
  19. name: emptydisk
  20. dedicatedIOThread: true
  21. - disk:
  22. bus: virtio
  23. name: emptydisk2
  24. dedicatedIOThread: true
  25. - disk:
  26. bus: virtio
  27. name: emptydisk3
  28. - disk:
  29. bus: virtio
  30. name: emptydisk4
  31. - disk:
  32. bus: virtio
  33. name: emptydisk5
  34. - disk:
  35. bus: virtio
  36. name: emptydisk6
  37. machine:
  38. type: ""
  39. resources:
  40. requests:
  41. memory: 64M
  42. volumes:
  43. - name: mydisk
  44. persistentVolumeClaim:
  45. claimName: mypvc
  46. - emptyDisk:
  47. capacity: 1Gi
  48. name: emptydisk
  49. - emptyDisk:
  50. capacity: 1Gi
  51. name: emptydisk2
  52. - emptyDisk:
  53. capacity: 1Gi
  54. name: emptydisk3
  55. - emptyDisk:
  56. capacity: 1Gi
  57. name: emptydisk4
  58. - emptyDisk:
  59. capacity: 1Gi
  60. name: emptydisk5
  61. - emptyDisk:
  62. capacity: 1Gi
  63. name: emptydisk6

This VM is identical to the first, except it requests auto IOThreads. emptydisk and emptydisk2 will still be allocated individual IOThreads, but the rest of the disks will be split across 2 separate iothreads (twice the number of CPU cores is 4).

Disks will be assigned to IOThreads like this:

  1. mypvc: 1
  2. emptydisk: 3
  3. emptydisk2: 4
  4. emptydisk3: 2
  5. emptydisk4: 1
  6. emptydisk5: 2
  7. emptydisk6: 1

Virtio Block Multi-Queue

Block Multi-Queue is a framework for the Linux block layer that maps Device I/O queries to multiple queues. This splits I/O processing up across multiple threads, and therefor multiple CPUs. libvirt recommends that the number of queues used should match the number of CPUs allocated for optimal performance.

This feature is enabled by the BlockMultiQueue setting under Devices:

  1. spec:
  2. domain:
  3. devices:
  4. blockMultiQueue: true
  5. disks:
  6. - disk:
  7. bus: virtio
  8. name: mydisk

Note: Due to the way KubeVirt implements CPU allocation, blockMultiQueue can only be used if a specific CPU allocation is requested. If a specific number of CPUs hasn’t been allocated to a VirtualMachine, KubeVirt will use all CPU’s on the node on a best effort basis. In that case the amount of CPU allocation to a VM at the host level could change over time. If blockMultiQueue were to request a number of queues to match all the CPUs on a node, that could lead to over-allocation scenarios. To avoid this, KubeVirt enforces that a specific slice of CPU resources is requested in order to take advantage of this feature.

Example

  1. metadata:
  2. name: testvmi-disk
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. domain:
  7. resources:
  8. requests:
  9. memory: 64M
  10. cpu: 4
  11. devices:
  12. blockMultiQueue: true
  13. disks:
  14. - name: mypvcdisk
  15. disk:
  16. bus: virtio
  17. volumes:
  18. - name: mypvcdisk
  19. persistentVolumeClaim:
  20. claimName: mypvc

This example will enable Block Multi-Queue for the disk mypvcdisk and allocate 4 queues (to match the number of CPUs requested).

Disk device cache

KubeVirt supports none, writeback, and writethrough KVM/QEMU cache modes.

  • none I/O from the guest is not cached on the host. Use this option for guests with large I/O requirements. This option is generally the best choice.

  • writeback I/O from the guest is cached on the host and written through to the physical media when the guest OS issues a flush.

  • writethrough I/O from the guest is cached on the host but must be written through to the physical medium before the write operation completes.

Important: none cache mode is set as default if the file system supports direct I/O, otherwise, writethrough is used.

Note: It is possible to force a specific cache mode, although if none mode has been chosen and the file system does not support direct I/O then started VMI will return an error.

Example: force writethrough cache mode

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. labels:
  5. special: vmi-pvc
  6. name: vmi-pvc
  7. spec:
  8. domain:
  9. devices:
  10. disks:
  11. - disk:
  12. bus: virtio
  13. name: pvcdisk
  14. cache: writethrough
  15. machine:
  16. type: ""
  17. resources:
  18. requests:
  19. memory: 64M
  20. terminationGracePeriodSeconds: 0
  21. volumes:
  22. - name: pvcdisk
  23. persistentVolumeClaim:
  24. claimName: disk-alpine
  25. status: {}

Disk sharing

Shareable disks allow multiple VMs to share the same underlying storage. In order to use this feature, special care is required because this could lead to data corruption and the loss of important data. Shareable disks demand either data synchronization at the application level or the use of clustered filesystems. These advanced configurations are not within the scope of this documentation and are use-case specific.

If the shareable option is set, it indicates to libvirt/QEMU that the disk is going to be accessed by multiple VMs and not to create a lock for the writes.

In this example, we use Rook Ceph in order to dynamically provisioning the PVC.

  1. apiVersion: v1
  2. kind: PersistentVolumeClaim
  3. metadata:
  4. name: block-pvc
  5. spec:
  6. accessModes:
  7. - ReadWriteMany
  8. volumeMode: Block
  9. resources:
  10. requests:
  11. storage: 1Gi
  12. storageClassName: rook-ceph-block
  1. $ kubectl get pvc
  2. NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
  3. block-pvc Bound pvc-0a161bb2-57c7-4d97-be96-0a20ff0222e2 1Gi RWO rook-ceph-block 51s

Then, we can declare 2 VMs and set the shareable option to true for the shared disk.

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachine
  3. metadata:
  4. labels:
  5. kubevirt.io/vm: vm-block-1
  6. name: vm-block-1
  7. spec:
  8. runStrategy: Always
  9. template:
  10. metadata:
  11. labels:
  12. kubevirt.io/vm: vm-block-1
  13. spec:
  14. domain:
  15. devices:
  16. disks:
  17. - disk:
  18. bus: virtio
  19. name: containerdisk
  20. - disk:
  21. bus: virtio
  22. name: cloudinitdisk
  23. - disk:
  24. bus: virtio
  25. shareable: true
  26. name: block-disk
  27. machine:
  28. type: ""
  29. resources:
  30. requests:
  31. memory: 2G
  32. terminationGracePeriodSeconds: 0
  33. volumes:
  34. - containerDisk:
  35. image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel
  36. name: containerdisk
  37. - cloudInitNoCloud:
  38. userData: |-
  39. #cloud-config
  40. password: fedora
  41. chpasswd: { expire: False }
  42. name: cloudinitdisk
  43. - name: block-disk
  44. persistentVolumeClaim:
  45. claimName: block-pvc
  46. ---
  47. apiVersion: kubevirt.io/v1
  48. kind: VirtualMachine
  49. metadata:
  50. labels:
  51. kubevirt.io/vm: vm-block-2
  52. name: vm-block-2
  53. spec:
  54. runStrategy: Always
  55. template:
  56. metadata:
  57. labels:
  58. kubevirt.io/vm: vm-block-2
  59. spec:
  60. affinity:
  61. podAffinity:
  62. requiredDuringSchedulingIgnoredDuringExecution:
  63. - labelSelector:
  64. matchExpressions:
  65. - key: kubevirt.io/vm
  66. operator: In
  67. values:
  68. - vm-block-1
  69. topologyKey: "kubernetes.io/hostname"
  70. domain:
  71. devices:
  72. disks:
  73. - disk:
  74. bus: virtio
  75. name: containerdisk
  76. - disk:
  77. bus: virtio
  78. name: cloudinitdisk
  79. - disk:
  80. bus: virtio
  81. shareable: true
  82. name: block-disk
  83. machine:
  84. type: ""
  85. resources:
  86. requests:
  87. memory: 2G
  88. terminationGracePeriodSeconds: 0
  89. volumes:
  90. - containerDisk:
  91. image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel
  92. name: containerdisk
  93. - cloudInitNoCloud:
  94. userData: |-
  95. #cloud-config
  96. password: fedora
  97. chpasswd: { expire: False }
  98. name: cloudinitdisk
  99. - name: block-disk
  100. persistentVolumeClaim:
  101. claimName: block-pvc

We can now attempt to write a string from the first guest and then read the string from the second guest to test that the sharing is working.

  1. $ virtctl console vm-block-1
  2. $ printf "Test awesome shareable disks" | sudo dd of=/dev/vdc bs=1 count=150 conv=notrunc
  3. 28+0 records in
  4. 28+0 records out
  5. 28 bytes copied, 0.0264182 s, 1.1 kB/s
  6. # Log into the second guest
  7. $ virtctl console vm-block-2
  8. $ sudo dd if=/dev/vdc bs=1 count=150 conv=notrunc
  9. Test awesome shareable disks150+0 records in
  10. 150+0 records out
  11. 150 bytes copied, 0.136753 s, 1.1 kB/s

If you are using local devices or RWO PVCs, setting the affinity on the VMs that share the storage guarantees they will be scheduled on the same node. In the example, we set the affinity on the second VM using the label used on the first VM. If you are using shared storage with RWX PVCs, then the affinity rule is not necessary as the storage can be attached simultaneously on multiple nodes.

Sharing Directories with VMs

Virtiofs allows to make visible external filesystems to KubeVirt VMs. Virtiofs is a shared file system that lets VMs access a directory tree on the host. Further details can be found at Official Virtiofs Site.

Non-Privileged and Privileged Sharing Modes

KubeVirt supports two PVC sharing modes: non-privileged and privileged.

The non-privileged mode is enabled by default. This mode has the advantage of not requiring any administrative privileges for creating the VM. However, it has some limitations:

  • The virtiofsd daemon (the daemon in charge of sharing the PVC with the VM) will run with the QEMU UID/GID (107), and cannot switch between different UIDs/GIDs. Therefore, it will only have access to directories and files that UID/GID 107 has permission to. Additionally, when creating new files they will always be created with QEMU’s UID/GID regardless of the UID/GID of the process within the guest.
  • Extended attributes are not supported.

To switch to the privileged mode, the feature gate ExperimentalVirtiofsSupport has to be enabled. Take into account that this mode requires privileges to run rootful containers.

Sharing Persistent Volume Claims

Cluster Configuration

We need to create a new VM definition including the spec.devices.disk.filesystems.virtiofs and a PVC. Example:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachineInstance
  3. metadata:
  4. name: testvmi-fs
  5. spec:
  6. domain:
  7. devices:
  8. disks:
  9. - disk:
  10. bus: virtio
  11. name: containerdisk
  12. - disk:
  13. bus: virtio
  14. name: cloudinitdisk
  15. filesystems:
  16. - name: virtiofs-disk
  17. virtiofs: {}
  18. resources:
  19. requests:
  20. memory: 1024Mi
  21. volumes:
  22. - name: containerdisk
  23. containerDisk:
  24. image: quay.io/containerdisks/fedora:latest
  25. - cloudInitNoCloud:
  26. userData: |-
  27. #cloud-config
  28. password: fedora
  29. chpasswd: { expire: False }
  30. name: cloudinitdisk
  31. - name: virtiofs-disk
  32. persistentVolumeClaim:
  33. claimName: mypvc

Configuration Inside the VM

The following configuration can be done in using startup script. See cloudInitNoCloud section for more details. However, we can do it manually by logging in to the VM and mounting it. Here are examples of how to mount it in a linux and windows VMs:

  • Linux Example
  1. $ sudo mkdir -p /mnt/disks/virtio
  2. $ sudo mount -t virtiofs virtiofs-disk /mnt/disks/virtio
  • Windows Example

See this guide for details on startup steps needed for Windows VMs.

Sharing Node Directories

It is allowed using hostpaths. The following configuration example is shown for illustrative purposes. However, the PVCs method is preferred since using hostpath is generally discouraged for security reasons.

Configuration Inside the Node

To share the directory with the VMs, we need to log in to the node, create the shared directory (if it does not already exist), and set the proper SELinux context label container_file_t to the shared directory. In this example we are going to share a new directory /mnt/data (if the desired directory is an existing one, you can skip the mkdir command):

  1. $ mkdir /tmp/data
  2. $ sudo chcon -t container_file_t /tmp/data

Note: If you are attempting to share an existing directory, you must first check the SELinux context label with the command ls -Z <directory>. In the case that the label is not present or is not container_file_t you need to label it with the chcon command.

Cluster Configuration

We need a StorageClass which uses the provider no-provisioner:

  1. apiVersion: storage.k8s.io/v1
  2. kind: StorageClass
  3. metadata:
  4. name: no-provisioner-storage-class
  5. provisioner: kubernetes.io/no-provisioner
  6. reclaimPolicy: Delete
  7. volumeBindingMode: WaitForFirstConsumer

To make the shared directory available for VMs, we need to create a PV and a PVC that could be consumed by the VMs:

  1. kind: PersistentVolume
  2. apiVersion: v1
  3. metadata:
  4. name: hostpath
  5. spec:
  6. capacity:
  7. storage: 10Gi
  8. accessModes:
  9. - ReadWriteMany
  10. hostPath:
  11. path: "/tmp/data"
  12. storageClassName: "no-provisioner-storage-class"
  13. nodeAffinity:
  14. required:
  15. nodeSelectorTerms:
  16. - matchExpressions:
  17. - key: kubernetes.io/hostname
  18. operator: In
  19. values:
  20. - node01
  21. ---
  22. apiVersion: v1
  23. kind: PersistentVolumeClaim
  24. metadata:
  25. name: hostpath-claim
  26. spec:
  27. accessModes:
  28. - ReadWriteMany
  29. storageClassName: "no-provisioner-storage-class"
  30. resources:
  31. requests:
  32. storage: 10Gi

Note: Change the node01 value for the node name where you want the shared directory will be located.

The VM definitions have to request the PVC hostpath-claim and attach it as a virtiofs filesystem:

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachine
  3. metadata:
  4. labels:
  5. kubevirt.io/vm: hostpath-vm
  6. name: hostpath
  7. spec:
  8. runStrategy: Always
  9. template:
  10. metadata:
  11. labels:
  12. kubevirt.io/domain: hostpath
  13. kubevirt.io/vm: hostpath
  14. spec:
  15. domain:
  16. cpu:
  17. cores: 1
  18. sockets: 1
  19. threads: 1
  20. devices:
  21. filesystems:
  22. - name: vm-hostpath
  23. virtiofs: {}
  24. disks:
  25. - name: containerdisk
  26. disk:
  27. bus: virtio
  28. - name: cloudinitdisk
  29. disk:
  30. bus: virtio
  31. interfaces:
  32. - name: default
  33. masquerade: {}
  34. rng: {}
  35. resources:
  36. requests:
  37. memory: 1Gi
  38. networks:
  39. - name: default
  40. pod: {}
  41. terminationGracePeriodSeconds: 180
  42. volumes:
  43. - containerDisk:
  44. image: quay.io/containerdisks/fedora:latest
  45. name: containerdisk
  46. - cloudInitNoCloud:
  47. userData: |-
  48. #cloud-config
  49. chpasswd:
  50. expire: false
  51. password: password
  52. user: fedora
  53. name: cloudinitdisk
  54. - name: vm-hostpath
  55. persistentVolumeClaim:
  56. claimName: hostpath-claim

Configuration Inside the VM

We need to log in to the VM and mount the shared directory:

  1. $ sudo mount -t virtiofs vm-hostpath /mnt

Update volumes strategy

The updateVolumesStrategy field is used to specify the strategy for updating the volumes of a running VM. The following strategies are supported: * Replacement: the update volumes will be replaced upon the VM restart. * Migration: the update of the volumes will trigger a storage migration of the old volumes to the new ones. More details about volume migration can be found in the volume migration documentation.

The update volume migration depends on the feature gate VolumesUpdateStrategy which depends on the VMLiveUpdateFeatures feature gate and configuration.

KubeVirt CR:

  1. apiVersion: kubevirt.io/v1
  2. kind: KubeVirt
  3. spec:
  4. configuration:
  5. developerConfiguration:
  6. featureGates:
  7. - VMLiveUpdateFeatures
  8. - VolumesUpdateStrategy
  9. workloadUpdateStrategy:
  10. workloadUpdateMethods:
  11. - LiveMigrate