Configuring mediated devices

Configuring mediated devices

OKD Virtualization automatically creates mediated devices, such as virtual GPUs (vGPUs), if you provide a list of devices in the HyperConverged custom resource (CR).

About using the NVIDIA GPU Operator

The NVIDIA GPU Operator manages NVIDIA GPU resources in an OKD cluster and automates tasks related to bootstrapping GPU nodes. Since the GPU is a special resource in the cluster, you must install some components before deploying application workloads onto the GPU. These components include the NVIDIA drivers which enables compute unified device architecture (CUDA), Kubernetes device plugin, container runtime and others such as automatic node labelling, monitoring and more.

The NVIDIA GPU Operator is supported only by NVIDIA. For more information about obtaining support from NVIDIA, see Obtaining Support from NVIDIA.

There are two ways to enable GPUs with OKD OKD Virtualization: the OKD-native way described here and by using the NVIDIA GPU Operator.

The NVIDIA GPU Operator is a Kubernetes Operator that enables OKD OKD Virtualization to expose GPUs to virtualized workloads running on OKD. It allows users to easily provision and manage GPU-enabled virtual machines, providing them with the ability to run complex artificial intelligence/machine learning (AI/ML) workloads on the same platform as their other workloads. It also provides an easy way to scale the GPU capacity of their infrastructure, allowing for rapid growth of GPU-based workloads.

For more information about using the NVIDIA GPU Operator to provision worker nodes for running GPU-accelerated VMs, see NVIDIA GPU Operator with OpenShift Virtualization.

About using virtual GPUs with OKD Virtualization

Some graphics processing unit (GPU) cards support the creation of virtual GPUs (vGPUs). OKD Virtualization can automatically create vGPUs and other mediated devices if an administrator provides configuration details in the HyperConverged custom resource (CR). This automation is especially useful for large clusters.

Refer to your hardware vendor’s documentation for functionality and support details.

Mediated device

A physical device that is divided into one or more virtual devices. A vGPU is a type of mediated device (mdev); the performance of the physical GPU is divided among the virtual devices. You can assign mediated devices to one or more virtual machines (VMs), but the number of guests must be compatible with your GPU. Some GPUs do not support multiple guests.

Prerequisites

If your hardware vendor provides drivers, you installed them on the nodes where you want to create mediated devices.
- If you use NVIDIA cards, you installed the NVIDIA GRID driver.

Configuration overview

When configuring mediated devices, an administrator must complete the following tasks:

Create the mediated devices.
Expose the mediated devices to the cluster.

The HyperConverged CR includes APIs that accomplish both tasks.

Creating mediated devices

# ...
spec:
  mediatedDevicesConfiguration:
    mediatedDevicesTypes: (1)
    - <device_type>
    nodeMediatedDeviceTypes: (2)
    - mediatedDevicesTypes: (3)
      - <device_type>
      nodeSelector: (4)
        <node_selector_key>: <node_selector_value>
# ...

1	Required: Configures global settings for the cluster.
2	Optional: Overrides the global configuration for a specific node or group of nodes. Must be used with the global `mediatedDevicesTypes` configuration.
3	Required if you use `nodeMediatedDeviceTypes`. Overrides the global `mediatedDevicesTypes` configuration for the specified nodes.
4	Required if you use `nodeMediatedDeviceTypes`. Must include a `key:value` pair.

Exposing mediated devices to the cluster

# ...
  permittedHostDevices:
    mediatedDevices:
    - mdevNameSelector: GRID T4-2Q (1)
      resourceName: nvidia.com/GRID_T4-2Q (2)
# ...

Exposes the mediated devices that map to this value on the host.

You can see the mediated device types that your device supports by viewing the contents of /sys/bus/pci/devices/<slot>:<bus>:<domain>.<function>/mdev_supported_types/<type>/name, substituting the correct values for your system.

For example, the name file for the nvidia-231 type contains the selector string GRID T4-2Q. Using GRID T4-2Q as the mdevNameSelector value allows nodes to use the nvidia-231 type.

The resourceName should match that allocated on the node. Find the resourceName by using the following command:

$ oc get $NODE -o json  \
| jq ‘.status.allocatable | \
with_entries(select(.key | startswith(“nvidia.com/“))) | \
with_entries(select(.value != “0”))’

How vGPUs are assigned to nodes

For each physical device, OKD Virtualization configures the following values:

A single mdev type.
The maximum number of instances of the selected mdev type.

The cluster architecture affects how devices are created and assigned to nodes.

Large cluster with multiple cards per node

On nodes with multiple cards that can support similar vGPU types, the relevant device types are created in a round-robin manner. For example:

# ...
mediatedDevicesConfiguration:
  mediatedDevicesTypes:
  - nvidia-222
  - nvidia-228
  - nvidia-105
  - nvidia-108
# ...

In this scenario, each node has two cards, both of which support the following vGPU types:

nvidia-105
# ...
nvidia-108
nvidia-217
nvidia-299
# ...

On each node, OKD Virtualization creates the following vGPUs:

16 vGPUs of type nvidia-105 on the first card.
2 vGPUs of type nvidia-108 on the second card.

One node has a single card that supports more than one requested vGPU type

OKD Virtualization uses the supported type that comes first on the mediatedDevicesTypes list.

For example, the card on a node card supports nvidia-223 and nvidia-224. The following mediatedDevicesTypes list is configured:

# ...
mediatedDevicesConfiguration:
  mediatedDevicesTypes:
  - nvidia-22
  - nvidia-223
  - nvidia-224
# ...

In this example, OKD Virtualization uses the nvidia-223 type.

About changing and removing mediated devices

The cluster’s mediated device configuration can be updated with OKD Virtualization by:

Editing the HyperConverged CR and change the contents of the mediatedDevicesTypes stanza.
Changing the node labels that match the nodeMediatedDeviceTypes node selector.

Removing the device information from the spec.mediatedDevicesConfiguration and spec.permittedHostDevices stanzas of the HyperConverged CR.

If you remove the device information from the spec.permittedHostDevices stanza without also removing it from the spec.mediatedDevicesConfiguration stanza, you cannot create a new mediated device type on the same node. To properly remove mediated devices, remove the device information from both stanzas.

Depending on the specific changes, these actions cause OKD Virtualization to reconfigure mediated devices or remove them from the cluster nodes.

Preparing hosts for mediated devices

You must enable the Input-Output Memory Management Unit (IOMMU) driver before you can configure mediated devices.

Adding kernel arguments to enable the IOMMU driver

To enable the IOMMU (Input-Output Memory Management Unit) driver in the kernel, create the MachineConfig object and add the kernel arguments.

Prerequisites

Administrative privilege to a working OKD cluster.
Intel or AMD CPU hardware.
Intel Virtualization Technology for Directed I/O extensions or AMD IOMMU in the BIOS (Basic Input/Output System) is enabled.

Procedure

Create a MachineConfig object that identifies the kernel argument. The following example shows a kernel argument for an Intel CPU.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker (1)
  name: 100-worker-iommu (2)
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
      - intel_iommu=on (3)
# ...

1	Applies the new kernel argument only to worker nodes.
2	The `name` indicates the ranking of this kernel argument (100) among the machine configs and its purpose. If you have an AMD CPU, specify the kernel argument as `amd_iommu=on`.
3	Identifies the kernel argument as `intel_iommu` for an Intel CPU.

Create the new MachineConfig object:

$ oc create -f 100-worker-kernel-arg-iommu.yaml

Verification

Verify that the new MachineConfig object was added.
```
$ oc get MachineConfig
```

Adding and removing mediated devices

You can add or remove mediated devices.

Creating and exposing mediated devices

You can expose and create mediated devices such as virtual GPUs (vGPUs) by editing the HyperConverged custom resource (CR).

Prerequisites

You enabled the IOMMU (Input-Output Memory Management Unit) driver.

Procedure

Edit the HyperConverged CR in your default editor by running the following command:
```
$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
```

Add the mediated device information to the HyperConverged CR spec, ensuring that you include the mediatedDevicesConfiguration and permittedHostDevices stanzas. For example:

Example configuration file

apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  mediatedDevicesConfiguration: (1)
    mediatedDevicesTypes: (2)
    - nvidia-231
    nodeMediatedDeviceTypes: (3)
    - mediatedDevicesTypes: (4)
      - nvidia-233
      nodeSelector:
        kubernetes.io/hostname: node-11.redhat.com
  permittedHostDevices: (5)
    mediatedDevices:
    - mdevNameSelector: GRID T4-2Q
      resourceName: nvidia.com/GRID_T4-2Q
    - mdevNameSelector: GRID T4-8Q
      resourceName: nvidia.com/GRID_T4-8Q
# ...

1	Creates mediated devices.
2	Required: Global `mediatedDevicesTypes` configuration.
3	Optional: Overrides the global configuration for specific nodes.
4	Required if you use `nodeMediatedDeviceTypes`.
5	Exposes mediated devices to the cluster.

Save your changes and exit the editor.

Verification

You can verify that a device was added to a specific node by running the following command:
```
$ oc describe node <node_name>
```

Removing mediated devices from the cluster using the CLI

To remove a mediated device from the cluster, delete the information for that device from the HyperConverged custom resource (CR).

Procedure

Edit the HyperConverged CR in your default editor by running the following command:
```
$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
```
Remove the device information from the spec.mediatedDevicesConfiguration and spec.permittedHostDevices stanzas of the HyperConverged CR. Removing both entries ensures that you can later create a new mediated device type on the same node. For example:

Example configuration file
```
apiVersion: hco.kubevirt.io/v1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  mediatedDevicesConfiguration:
    mediatedDevicesTypes: (1)
      - nvidia-231
  permittedHostDevices:
    mediatedDevices: (2)
    - mdevNameSelector: GRID T4-2Q
      resourceName: nvidia.com/GRID_T4-2Q
```
1 To remove the nvidia-231 device type, delete it from the mediatedDevicesTypes array.
2 To remove the GRID T4-2Q device, delete the mdevNameSelector field and its corresponding resourceName field.
Save your changes and exit the editor.

Using mediated devices

A vGPU is a type of mediated device; the performance of the physical GPU is divided among the virtual devices. You can assign mediated devices to one or more virtual machines.

Assigning a mediated device to a virtual machine

Assign mediated devices such as virtual GPUs (vGPUs) to virtual machines.

Prerequisites

The mediated device is configured in the HyperConverged custom resource.

Procedure

Assign the mediated device to a virtual machine (VM) by editing the spec.domain.devices.gpus stanza of the VirtualMachine manifest:

Example virtual machine manifest

apiVersion: kubevirt.io/v1
kind: VirtualMachine
spec:
  domain:
    devices:
      gpus:
      - deviceName: nvidia.com/TU104GL_Tesla_T4 (1)
        name: gpu1 (2)
      - deviceName: nvidia.com/GRID_T4-1Q
        name: gpu2

1	The resource name associated with the mediated device.
2	A name to identify the device on the VM.

Verification

To verify that the device is available from the virtual machine, run the following command, substituting <device_name> with the deviceName value from the VirtualMachine manifest:
```
$ lspci -nnk | grep <device_name>
```

Additional resources

Enabling Intel VT-X and AMD-V Virtualization Hardware Extensions in BIOS

1	To remove the `nvidia-231` device type, delete it from the `mediatedDevicesTypes` array.
2	To remove the `GRID T4-2Q` device, delete the `mdevNameSelector` field and its corresponding `resourceName` field.