Importing virtual machine images with data volumes

Importing virtual machine images with data volumes

You can import an existing virtual machine image into your OKD cluster storage. Using the Containerized Data Importer (CDI), you can import the image into a persistent volume claim (PVC) by using a data volume. OKD Virtualization uses one or more data volumes to automate the data import and the creation of an underlying PVC. You can attach a data volume to a virtual machine for persistent storage.

The virtual machine image can be hosted at an HTTP or HTTPS endpoint, or built into a container disk and stored in a container registry.

When you import a disk image into a PVC, the disk image is expanded to use the full storage capacity that is requested in the PVC. To use this space, the disk partitions and file system(s) in the virtual machine might need to be expanded.

The resizing procedure varies based on the operating system installed on the virtual machine. See the operating system documentation for details.

Prerequisites

If the endpoint requires a TLS certificate, the certificate must be included in a config map in the same namespace as the data volume and referenced in the data volume configuration.
To import a container disk:
- You might need to prepare a container disk from a virtual machine image and store it in your container registry before importing it.
- If the container registry does not have TLS, you must add the registry to the insecureRegistries field of the HyperConverged custom resource before you can import a container disk from it.
You might need to define a storage class or prepare CDI scratch space for this operation to complete successfully.

If you intend to import a virtual machine image into block storage with a data volume, you must have an available local block persistent volume.

CDI supported operations matrix

This matrix shows the supported CDI operations for content types against endpoints, and which of these operations requires scratch space.

Content types	HTTP	HTTPS	HTTP basic auth	Registry	Upload
KubeVirt (QCOW2)	✓ QCOW2 ✓ GZ ✓ XZ	✓ QCOW2* ✓ GZ ✓ XZ	✓ QCOW2 ✓ GZ ✓ XZ	✓ QCOW2 □ GZ □ XZ	✓ QCOW2 ✓ GZ ✓ XZ
KubeVirt (RAW)	✓ RAW ✓ GZ ✓ XZ	✓ RAW ✓ GZ ✓ XZ	✓ RAW ✓ GZ ✓ XZ	✓ RAW □ GZ □ XZ	✓ RAW ✓ GZ ✓ XZ*

✓ Supported operation

□ Unsupported operation

* Requires scratch space

** Requires scratch space if a custom certificate authority is required

CDI now uses the OKD cluster-wide proxy configuration.

About data volumes

DataVolume objects are custom resources that are provided by the Containerized Data Importer (CDI) project. Data volumes orchestrate import, clone, and upload operations that are associated with an underlying persistent volume claim (PVC). You can create a data volume as either a standalone resource or by using the dataVolumeTemplate field in the virtual machine (VM) specification.

VM disk PVCs that are prepared by using standalone data volumes maintain an independent lifecycle from the VM. If you use the dataVolumeTemplate field in the VM specification to prepare the PVC, the PVC shares the same lifecycle as the VM.

After a PVC is populated, the data volume that you used to create the PVC is no longer needed. OKD Virtualization enables automatic garbage collection of completed data volumes by default. Standalone data volumes, and data volumes created by using the dataVolumeTemplate resource, are automatically garbage collected after completion.

Local block persistent volumes

If you intend to import a virtual machine image into block storage with a data volume, you must have an available local block persistent volume.

About block persistent volumes

A block persistent volume (PV) is a PV that is backed by a raw block device. These volumes do not have a file system and can provide performance benefits for virtual machines by reducing overhead.

Raw block volumes are provisioned by specifying volumeMode: Block in the PV and persistent volume claim (PVC) specification.

Creating a local block persistent volume

If you intend to import a virtual machine image into block storage with a data volume, you must have an available local block persistent volume.

Create a local block persistent volume (PV) on a node by populating a file and mounting it as a loop device. You can then reference this loop device in a PV manifest as a Block volume and use it as a block device for a virtual machine image.

Procedure

Log in as root to the node on which to create the local PV. This procedure uses node01 for its examples.
Create a file and populate it with null characters so that it can be used as a block device. The following example creates a file loop10 with a size of 2Gb (20 100Mb blocks):
```
$ dd if=/dev/zero of=<loop10> bs=100M count=20
```
Mount the loop10 file as a loop device.
```
$ losetup </dev/loop10>d3 <loop10>  (1) (2)
```
1 File path where the loop device is mounted.
2 The file created in the previous step to be mounted as the loop device.

Create a PersistentVolume manifest that references the mounted loop device.

kind: PersistentVolume
apiVersion: v1
metadata:
  name: <local-block-pv10>
  annotations:
spec:
  local:
    path: </dev/loop10> (1)
  capacity:
    storage: <2Gi>
  volumeMode: Block (2)
  storageClassName: local (3)
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - <node01> (4)

1	The path of the loop device on the node.
2	Specifies it is a block PV.
3	Optional: Set a storage class for the PV. If you omit it, the cluster default is used.
4	The node on which the block device was mounted.

Create the block PV.
```
# oc create -f <local-block-pv10.yaml>(1)
```
1 The file name of the persistent volume created in the previous step.

Importing a virtual machine image into storage by using a data volume

You can import a virtual machine image into storage by using a data volume.

The virtual machine image can be hosted at an HTTP or HTTPS endpoint or the image can be built into a container disk and stored in a container registry.

You specify the data source for the image in a VirtualMachine configuration file. When the virtual machine is created, the data volume with the virtual machine image is imported into storage.

Prerequisites

To import a virtual machine image you must have the following:
- A virtual machine disk image in RAW, ISO, or QCOW2 format, optionally compressed by using xz or gz.
- An HTTP or HTTPS endpoint where the image is hosted, along with any authentication credentials needed to access the data source.
To import a container disk, you must have a virtual machine image built into a container disk and stored in a container registry, along with any authentication credentials needed to access the data source.
If the virtual machine must communicate with servers that use self-signed certificates or certificates not signed by the system CA bundle, you must create a config map in the same namespace as the data volume.

Procedure

If your data source requires authentication, create a Secret manifest, specifying the data source credentials, and save it as endpoint-secret.yaml:
```
apiVersion: v1
kind: Secret
metadata:
  name: endpoint-secret (1)
  labels:
    app: containerized-data-importer
type: Opaque
data:
  accessKeyId: "" (2)
  secretKey:   "" (3)
```
1 Specify the name of the Secret.
2 Specify the Base64-encoded key ID or user name.
3 Specify the Base64-encoded secret key or password.
Apply the Secret manifest:
```
$ oc apply -f endpoint-secret.yaml
```

Edit the VirtualMachine manifest, specifying the data source for the virtual machine image you want to import, and save it as vm-fedora-datavolume.yaml:

Details

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  creationTimestamp: null
  labels:
    kubevirt.io/vm: vm-fedora-datavolume
  name: vm-fedora-datavolume (1)
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: fedora-dv (2)
    spec:
      storage:
        volumeMode: Block (3)
        resources:
          requests:
            storage: 10Gi
        storageClassName: local
      source: (4)
        http:
          url: "https://mirror.arizona.edu/fedora/linux/releases/35/Cloud/x86_64/images/Fedora-Cloud-Base-35-1.2.x86_64.qcow2" (4)
          secretRef: endpoint-secret (5)
          certConfigMap: "" (6)
        # To use a registry source, uncomment the following lines and delete the preceding HTTP source block
        # registry:
          # url: "docker://kubevirt/fedora-cloud-container-disk-demo:latest"
          # secretRef: registry-secret (5)
          # certConfigMap: "" (6)
    status: {}
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/vm: vm-fedora-datavolume
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: datavolumedisk1
        machine:
          type: ""
        resources:
          requests:
            memory: 1.5Gi
      terminationGracePeriodSeconds: 180
      volumes:
      - dataVolume:
          name: fedora-dv
        name: datavolumedisk1
status: {}

1	Specify the name of the virtual machine.
2	Specify the name of the data volume.
3	The volume and access mode are detected automatically for known storage provisioners. Alternatively, you can specify `Block`.
4	Specify either the URL or the registry endpoint of the virtual machine image you want to import using the comment block. For example, if you want to use a registry source, you can comment out or delete the HTTP or HTTPS source block. Ensure that you replace the example values shown here with your own values.
5	Specify the `Secret` name if you created a `Secret` for the data source.
6	Optional: Specify a CA certificate config map.

Create the virtual machine:

$ oc create -f vm-fedora-datavolume.yaml

The oc create command creates the data volume and the virtual machine. The CDI controller creates an underlying PVC with the correct annotation and the import process begins. When the import is complete, the data volume status changes to Succeeded. You can start the virtual machine.

Data volume provisioning happens in the background, so there is no need to monitor the process.

Verification

The importer pod downloads the virtual machine image or container disk from the specified URL and stores it on the provisioned PV. View the status of the importer pod by running the following command:
```
$ oc get pods
```
Monitor the data volume until its status is Succeeded by running the following command:
```
$ oc describe dv fedora-dv (1)
```
1 Specify the data volume name that you defined in the VirtualMachine manifest.
Verify that provisioning is complete and that the virtual machine has started by accessing its serial console:
```
$ virtctl console vm-fedora-datavolume
```

Additional resources

Configure preallocation mode to improve write performance for data volume operations.

1	File path where the loop device is mounted.
2	The file created in the previous step to be mounted as the loop device.

1	Specify the name of the `Secret`.
2	Specify the Base64-encoded key ID or user name.
3	Specify the Base64-encoded secret key or password.