Architecture

KubeVirt is built using a service oriented architecture and a choreography pattern.

Stack

  1. +---------------------+
  2. | KubeVirt |
  3. ~~+---------------------+~~
  4. | Orchestration (K8s) |
  5. +---------------------+
  6. | Scheduling (K8s) |
  7. +---------------------+
  8. | Container Runtime |
  9. ~~+---------------------+~~
  10. | Operating System |
  11. +---------------------+
  12. | Virtual(kvm) |
  13. ~~+---------------------+~~
  14. | Physical |
  15. +---------------------+

Users requiring virtualization services are speaking to the Virtualization API (see below) which in turn is speaking to the Kubernetes cluster to schedule requested Virtual Machine Instances (VMIs). Scheduling, networking, and storage are all delegated to Kubernetes, while KubeVirt provides the virtualization functionality.

Additional Services

KubeVirt provides additional functionality to your Kubernetes cluster, to perform virtual machine management

If we recall how Kubernetes is handling Pods, then we remember that Pods are created by posting a Pod specification to the Kubernetes API Server. This specification is then transformed into an object inside the API Server, this object is of a specific type or kind - that is how it’s called in the specification. A Pod is of the type Pod. Controllers within Kubernetes know how to handle these Pod objects. Thus once a new Pod object is seen, those controllers perform the necessary actions to bring the Pod alive, and to match the required state.

This same mechanism is used by KubeVirt. Thus KubeVirt delivers three things to provide the new functionality:

  1. Additional types - so called Custom Resource Definition (CRD) - are added to the Kubernetes API
  2. Additional controllers for cluster wide logic associated with these new types
  3. Additional daemons for node specific logic associated with new types

Once all three steps have been completed, you are able to

  • create new objects of these new types in Kubernetes (VMIs in our case)
  • and the new controllers take care to get the VMIs scheduled on some host,
  • and a daemon - the virt-handler - is taking care of a host - alongside the kubelet - to launch the VMI and configure it until it matches the required state.

One final note; both controllers and daemons are running as Pods (or similar) on top of the Kubernetes cluster, and are not installed alongside it. The type is - as said before - even defined inside the Kubernetes API server. This allows users to speak to Kubernetes, but modify VMIs.

The following diagram illustrates how the additional controllers and daemons communicate with Kubernetes and where the additional types are stored:

Architecture diagram

And a simplified version:

Simplified architecture diagram

Application Layout

  • Cluster
  • KubeVirt Components
    • virt-controller
    • virt-handler
    • libvirtd
  • KubeVirt Managed Pods
    • VMI Foo
    • VMI Bar
  • KubeVirt Custom Resources
    • VirtualMachine (VM) Foo -> VirtualMachineInstance (VMI) Foo
    • VirtualMachineInstanceReplicaSet (VMIRS) Bar -> VirtualMachineInstance (VMI) Bar

VirtualMachineInstance (VMI) is the custom resource that represents the basic ephemeral building block of an instance. In a lot of cases this object won’t be created directly by the user but by a high level resource. High level resources for VMI can be:

  • VirtualMachine (VM) - StateFul VM that can be stopped and started while keeping the VM data and state.
  • VirtualMachineInstanceReplicaSet (VMIRS) - Similar to pods ReplicaSet, a group of ephemeral VMIs with similar configuration defined in a template.

Native Workloads

KubeVirt is deployed on top of a Kubernetes cluster. This means that you can continue to run your Kubernetes-native workloads next to the VMIs managed through KubeVirt.

Furthermore: if you can run native workloads, and you have KubeVirt installed, you should be able to run VM-based workloads, too. For example, Application Operators should not require additional permissions to use cluster features for VMs, compared to using that feature with a plain Pod.

Security-wise, installing and using KubeVirt must not grant users any permission they do not already have regarding native workloads. For example, a non-privileged Application Operator must never gain access to a privileged Pod by using a KubeVirt feature.

The Razor

We love virtual machines, think that they are very important and work hard to make them easy to use in Kubernetes. But even more than VMs, we love good design and modular, reusable components. Quite frequently, we face a dilemma: should we solve a problem in KubeVirt in a way that is best optimized for VMs, or should we take a longer path and introduce the solution to Pod-based workloads too?

To decide these dilemmas we came up with the KubeVirt Razor: “If something is useful for Pods, we should not implement it only for VMs”.

For example, we debated how we should connect VMs to external network resources. The quickest way seems to introduce KubeVirt-specific code, attaching a VM to a host bridge. However, we chose the longer path of integrating with Multus and CNI and improving them.

VirtualMachine

A VirtualMachine provides additional management capabilities to a VirtualMachineInstance inside the cluster. That includes:

  • API stability

  • Start/stop/restart capabilities on the controller level

  • Offline configuration change with propagation on VirtualMachineInstance recreation

  • Ensure that the VirtualMachineInstance is running if it should be running

It focuses on a 1:1 relationship between the controller instance and a virtual machine instance. In many ways it is very similar to a StatefulSet with spec.replica set to 1.

How to use a VirtualMachine

A VirtualMachine will make sure that a VirtualMachineInstance object with an identical name will be present in the cluster, if spec.running is set to true. Further it will make sure that a VirtualMachineInstance will be removed from the cluster if spec.running is set to false.

There exists a field spec.runStrategy which can also be used to control the state of the associated VirtualMachineInstance object. To avoid confusing and contradictory states, these fields are mutually exclusive.

An extended explanation of spec.runStrategy vs spec.running can be found in Run Strategies

Starting and stopping

After creating a VirtualMachine it can be switched on or off like this:

  1. # Start the virtual machine:
  2. virtctl start vm
  3. # Stop the virtual machine:
  4. virtctl stop vm

kubectl can be used too:

  1. # Start the virtual machine:
  2. kubectl patch virtualmachine vm --type merge -p \
  3. '{"spec":{"running":true}}'
  4. # Stop the virtual machine:
  5. kubectl patch virtualmachine vm --type merge -p \
  6. '{"spec":{"running":false}}'

Find more details about a VM’s life-cycle in the relevant section

Controller status

Once a VirtualMachineInstance is created, its state will be tracked via status.created and status.ready fields of the VirtualMachine. If a VirtualMachineInstance exists in the cluster, status.created will equal true. If the VirtualMachineInstance is also ready, status.ready will equal true too.

If a VirtualMachineInstance reaches a final state but the spec.running equals true, the VirtualMachine controller will set status.ready to false and re-create the VirtualMachineInstance.

Additionally, the status.printableStatus field provides high-level summary information about the state of the VirtualMachine. This information is also displayed when listing VirtualMachines using the CLI:

  1. $ kubectl get virtualmachines
  2. NAME AGE STATUS VOLUME
  3. vm1 4m Running
  4. vm2 11s Stopped

Here’s the list of states currently supported and their meanings. Note that states may be added/removed in future releases, so caution should be used if consumed by automated programs.

  • Stopped: The virtual machine is currently stopped and isn’t expected to start.
  • Provisioning: Cluster resources associated with the virtual machine (e.g., DataVolumes) are being provisioned and prepared.
  • Starting: The virtual machine is being prepared for running.
  • Running: The virtual machine is running.
  • Paused: The virtual machine is paused.
  • Migrating: The virtual machine is in the process of being migrated to another host.
  • Stopping: The virtual machine is in the process of being stopped.
  • Terminating: The virtual machine is in the process of deletion, as well as its associated resources (VirtualMachineInstance, DataVolumes, …).
  • Unknown: The state of the virtual machine could not be obtained, typically due to an error in communicating with the host on which it’s running.

Restarting

A VirtualMachineInstance restart can be triggered by deleting the VirtualMachineInstance. This will also propagate configuration changes from the template in the VirtualMachine:

  1. # Restart the virtual machine (you delete the instance!):
  2. kubectl delete virtualmachineinstance vm

To restart a VirtualMachine named vm using virtctl:

  1. $ virtctl restart vm

This would perform a normal restart for the VirtualMachineInstance and would reschedule the VirtualMachineInstance on a new virt-launcher Pod

To force restart a VirtualMachine named vm using virtctl:

  1. $ virtctl restart vm --force --grace-period=0

This would try to perform a normal restart, and would also delete the virt-launcher Pod of the VirtualMachineInstance with setting GracePeriodSeconds to the seconds passed in the command.

Currently, only setting grace-period=0 is supported.

Note

Force restart can cause data corruption, and should be used in cases of kernel panic or VirtualMachine being unresponsive to normal restarts.

Fencing considerations

A VirtualMachine will never restart or re-create a VirtualMachineInstance until the current instance of the VirtualMachineInstance is deleted from the cluster.

Exposing as a Service

A VirtualMachine can be exposed as a service. The actual service will be available once the VirtualMachineInstance starts without additional interaction.

For example, exposing SSH port (22) as a ClusterIP service using virtctl after the VirtualMachine was created, but before it started:

  1. $ virtctl expose virtualmachine vmi-ephemeral --name vmiservice --port 27017 --target-port 22

All service exposure options that apply to a VirtualMachineInstance apply to a VirtualMachine.

See Service Objects for more details.

When to use a VirtualMachine

When API stability is required between restarts

A VirtualMachine makes sure that VirtualMachineInstance API configurations are consistent between restarts. A classical example are licenses which are bound to the firmware UUID of a virtual machine. The VirtualMachine makes sure that the UUID will always stay the same without the user having to take care of it.

One of the main benefits is that a user can still make use of defaulting logic, although a stable API is needed.

When config updates should be picked up on the next restart

If the VirtualMachineInstance configuration should be modifiable inside the cluster and these changes should be picked up on the next VirtualMachineInstance restart. This means that no hotplug is involved.

When you want to let the cluster manage your individual VirtualMachineInstance

Kubernetes as a declarative system can help you to manage the VirtualMachineInstance. You tell it that you want this VirtualMachineInstance with your application running, the VirtualMachine will try to make sure it stays running.

Note

The current belief is that if it is defined that the VirtualMachineInstance should be running, it should be running. This is different from many classical virtualization platforms, where VMs stay down if they were switched off. Restart policies may be added if needed. Please provide your use-case if you need this!

Example

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachine
  3. metadata:
  4. labels:
  5. kubevirt.io/vm: vm-cirros
  6. name: vm-cirros
  7. spec:
  8. running: false
  9. template:
  10. metadata:
  11. labels:
  12. kubevirt.io/vm: vm-cirros
  13. spec:
  14. domain:
  15. devices:
  16. disks:
  17. - disk:
  18. bus: virtio
  19. name: containerdisk
  20. - disk:
  21. bus: virtio
  22. name: cloudinitdisk
  23. machine:
  24. type: ""
  25. resources:
  26. requests:
  27. memory: 64M
  28. terminationGracePeriodSeconds: 0
  29. volumes:
  30. - name: containerdisk
  31. containerDisk:
  32. image: kubevirt/cirros-container-disk-demo:latest
  33. - cloudInitNoCloud:
  34. userDataBase64: IyEvYmluL3NoCgplY2hvICdwcmludGVkIGZyb20gY2xvdWQtaW5pdCB1c2VyZGF0YScK
  35. name: cloudinitdisk

Saving this manifest into vm.yaml and submitting it to Kubernetes will create the controller instance:

  1. $ kubectl create -f vm.yaml
  2. virtualmachine "vm-cirros" created

Since spec.running is set to false, no vmi will be created:

  1. $ kubectl get vmis
  2. No resources found.

Let’s start the VirtualMachine:

  1. $ virtctl start vm vm-cirros

As expected, a VirtualMachineInstance called vm-cirros got created:

  1. $ kubectl describe vm vm-cirros
  2. Name: vm-cirros
  3. Namespace: default
  4. Labels: kubevirt.io/vm=vm-cirros
  5. Annotations: <none>
  6. API Version: kubevirt.io/v1
  7. Kind: VirtualMachine
  8. Metadata:
  9. Cluster Name:
  10. Creation Timestamp: 2018-04-30T09:25:08Z
  11. Generation: 0
  12. Resource Version: 6418
  13. Self Link: /apis/kubevirt.io/v1/namespaces/default/virtualmachines/vm-cirros
  14. UID: 60043358-4c58-11e8-8653-525500d15501
  15. Spec:
  16. Running: true
  17. Template:
  18. Metadata:
  19. Creation Timestamp: <nil>
  20. Labels:
  21. Kubevirt . Io / Ovmi: vm-cirros
  22. Spec:
  23. Domain:
  24. Devices:
  25. Disks:
  26. Disk:
  27. Bus: virtio
  28. Name: containerdisk
  29. Volume Name: containerdisk
  30. Disk:
  31. Bus: virtio
  32. Name: cloudinitdisk
  33. Volume Name: cloudinitdisk
  34. Machine:
  35. Type:
  36. Resources:
  37. Requests:
  38. Memory: 64M
  39. Termination Grace Period Seconds: 0
  40. Volumes:
  41. Name: containerdisk
  42. Registry Disk:
  43. Image: kubevirt/cirros-registry-disk-demo:latest
  44. Cloud Init No Cloud:
  45. User Data Base 64: IyEvYmluL3NoCgplY2hvICdwcmludGVkIGZyb20gY2xvdWQtaW5pdCB1c2VyZGF0YScK
  46. Name: cloudinitdisk
  47. Status:
  48. Created: true
  49. Ready: true
  50. Events:
  51. Type Reason Age From Message
  52. ---- ------ ---- ---- -------
  53. Normal SuccessfulCreate 15s virtualmachine-controller Created virtual machine: vm-cirros

Kubectl commandline interactions

Whenever you want to manipulate the VirtualMachine through the commandline you can use the kubectl command. The following are examples demonstrating how to do it.

  1. # Define a virtual machine:
  2. kubectl create -f vm.yaml
  3. # Start the virtual machine:
  4. kubectl patch virtualmachine vm --type merge -p \
  5. '{"spec":{"running":true}}'
  6. # Look at virtual machine status and associated events:
  7. kubectl describe virtualmachine vm
  8. # Look at the now created virtual machine instance status and associated events:
  9. kubectl describe virtualmachineinstance vm
  10. # Stop the virtual machine instance:
  11. kubectl patch virtualmachine vm --type merge -p \
  12. '{"spec":{"running":false}}'
  13. # Restart the virtual machine (you delete the instance!):
  14. kubectl delete virtualmachineinstance vm
  15. # Implicit cascade delete (first deletes the virtual machine and then the virtual machine instance)
  16. kubectl delete virtualmachine vm
  17. # Explicit cascade delete (first deletes the virtual machine and then the virtual machine instance)
  18. kubectl delete virtualmachine vm --cascade=true
  19. # Orphan delete (The running virtual machine is only detached, not deleted)
  20. # Recreating the virtual machine would lead to the adoption of the virtual machine instance
  21. kubectl delete virtualmachine vm --cascade=false