Node assignment

You can constrain the VM to only run on specific nodes or to prefer running on specific nodes:

  • nodeSelector
  • Affinity and anti-affinity
  • Taints and Tolerations

nodeSelector

Setting spec.nodeSelector requirements, constrains the scheduler to only schedule VMs on nodes, which contain the specified labels. In the following example the vmi contains the labels cpu: slow and storage: fast:

  1. metadata:
  2. name: testvmi-ephemeral
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. nodeSelector:
  7. cpu: slow
  8. storage: fast
  9. domain:
  10. resources:
  11. requests:
  12. memory: 64M
  13. devices:
  14. disks:
  15. - name: mypvcdisk
  16. lun: {}
  17. volumes:
  18. - name: mypvcdisk
  19. persistentVolumeClaim:
  20. claimName: mypvc

Thus the scheduler will only schedule the vmi to nodes which contain these labels in their metadata. It works exactly like the Pods nodeSelector. See the Pod nodeSelector Documentation for more examples.

Affinity and anti-affinity

The spec.affinity field allows specifying hard- and soft-affinity for VMs. It is possible to write matching rules against workloads (VMs and Pods) and Nodes. Since VMs are a workload type based on Pods, Pod-affinity affects VMs as well.

An example for podAffinity and podAntiAffinity may look like this:

  1. metadata:
  2. name: testvmi-ephemeral
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. nodeSelector:
  7. cpu: slow
  8. storage: fast
  9. domain:
  10. resources:
  11. requests:
  12. memory: 64M
  13. devices:
  14. disks:
  15. - name: mypvcdisk
  16. lun: {}
  17. affinity:
  18. podAffinity:
  19. requiredDuringSchedulingIgnoredDuringExecution:
  20. - labelSelector:
  21. matchExpressions:
  22. - key: security
  23. operator: In
  24. values:
  25. - S1
  26. topologyKey: failure-domain.beta.kubernetes.io/zone
  27. podAntiAffinity:
  28. preferredDuringSchedulingIgnoredDuringExecution:
  29. - weight: 100
  30. podAffinityTerm:
  31. labelSelector:
  32. matchExpressions:
  33. - key: security
  34. operator: In
  35. values:
  36. - S2
  37. topologyKey: kubernetes.io/hostname
  38. volumes:
  39. - name: mypvcdisk
  40. persistentVolumeClaim:
  41. claimName: mypvc

Affinity and anti-affinity works exactly like the Pods affinity. This includes podAffinity, podAntiAffinity, nodeAffinity and nodeAntiAffinity. See the Pod affinity and anti-affinity Documentation for more examples and details.

Taints and Tolerations

Affinity as described above, is a property of VMs that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite - they allow a node to repel a set of VMs.

Taints and tolerations work together to ensure that VMs are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any VMs that do not tolerate the taints. Tolerations are applied to VMs, and allow (but do not require) the VMs to schedule onto nodes with matching taints.

You add a taint to a node using kubectl taint. For example,

  1. kubectl taint nodes node1 key=value:NoSchedule

An example for tolerations may look like this:

  1. metadata:
  2. name: testvmi-ephemeral
  3. apiVersion: kubevirt.io/v1
  4. kind: VirtualMachineInstance
  5. spec:
  6. nodeSelector:
  7. cpu: slow
  8. storage: fast
  9. domain:
  10. resources:
  11. requests:
  12. memory: 64M
  13. devices:
  14. disks:
  15. - name: mypvcdisk
  16. lun: {}
  17. tolerations:
  18. - key: "key"
  19. operator: "Equal"
  20. value: "value"
  21. effect: "NoSchedule"

Node balancing with Descheduler

In some cases we might need to rebalance the cluster on current scheduling policy and load conditions. Descheduler can find pods, which violates e.g. scheduling decisions and evict them based on descheduler policies. Kubevirt VMs are handled as pods with local storage, so by default, descheduler will not evict them. But it can be easily overridden by adding special annotation to the VMI template in the VM:

  1. spec:
  2. template:
  3. metadata:
  4. annotations:
  5. descheduler.alpha.kubernetes.io/evict: true

This annotation will cause, that the descheduler will be able to evict the VM’s pod which can then be scheduled by scheduler on different nodes. A VirtualMachine will never restart or re-create a VirtualMachineInstance until the current instance of the VirtualMachineInstance is deleted from the cluster.

Live update

When the VM rollout strategy is set to LiveUpdate, changes to a VM’s node selector or affinities will dynamically propagate to the VMI (unless the RestartRequired condition is set). Changes to tolerations will not dynamically propagate, and will trigger a RestartRequired condition if changed on a running VM.

Modifications of the node selector / affinities will only take effect on next migration, the change alone will not trigger one.