Virtual machine health checks
You can configure virtual machine (VM) health checks by defining readiness and liveness probes in the VirtualMachine
resource.
About readiness and liveness probes
Use readiness and liveness probes to detect and handle unhealthy virtual machines (VMs). You can include one or more probes in the specification of the VM to ensure that traffic does not reach a VM that is not ready for it and that a new VM is created when a VM becomes unresponsive.
A readiness probe determines whether a VM is ready to accept service requests. If the probe fails, the VM is removed from the list of available endpoints until the VM is ready.
A liveness probe determines whether a VM is responsive. If the probe fails, the VM is deleted and a new VM is created to restore responsiveness.
You can configure readiness and liveness probes by setting the spec.readinessProbe
and the spec.livenessProbe
fields of the VirtualMachine
object. These fields support the following tests:
HTTP GET
The probe determines the health of the VM by using a web hook. The test is successful if the HTTP response code is between 200 and 399. You can use an HTTP GET test with applications that return HTTP status codes when they are completely initialized.
TCP socket
The probe attempts to open a socket to the VM. The VM is only considered healthy if the probe can establish a connection. You can use a TCP socket test with applications that do not start listening until initialization is complete.
Guest agent ping
The probe uses the guest-ping
command to determine if the QEMU guest agent is running on the virtual machine.
Defining an HTTP readiness probe
Define an HTTP readiness probe by setting the spec.readinessProbe.httpGet
field of the virtual machine (VM) configuration.
Procedure
Include details of the readiness probe in the VM configuration file.
Sample readiness probe with an HTTP GET test
# ...
spec:
readinessProbe:
httpGet: (1)
port: 1500 (2)
path: /healthz (3)
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 120 (4)
periodSeconds: 20 (5)
timeoutSeconds: 10 (6)
failureThreshold: 3 (7)
successThreshold: 3 (8)
# ...
1 The HTTP GET request to perform to connect to the VM. 2 The port of the VM that the probe queries. In the above example, the probe queries port 1500. 3 The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz path returns a success code, the VM is considered to be healthy. If the handler returns a failure code, the VM is removed from the list of available endpoints. 4 The time, in seconds, after the VM starts before the readiness probe is initiated. 5 The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds
.6 The number of seconds of inactivity after which the probe times out and the VM is assumed to have failed. The default value is 1. This value must be lower than periodSeconds
.7 The number of times that the probe is allowed to fail. The default is 3. After the specified number of attempts, the pod is marked Unready
.8 The number of times that the probe must report success, after a failure, to be considered successful. The default is 1. Create the VM by running the following command:
$ oc create -f <file_name>.yaml
Defining a TCP readiness probe
Define a TCP readiness probe by setting the spec.readinessProbe.tcpSocket
field of the virtual machine (VM) configuration.
Procedure
Include details of the TCP readiness probe in the VM configuration file.
Sample readiness probe with a TCP socket test
# ...
spec:
readinessProbe:
initialDelaySeconds: 120 (1)
periodSeconds: 20 (2)
tcpSocket: (3)
port: 1500 (4)
timeoutSeconds: 10 (5)
# ...
1 The time, in seconds, after the VM starts before the readiness probe is initiated. 2 The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds
.3 The TCP action to perform. 4 The port of the VM that the probe queries. 5 The number of seconds of inactivity after which the probe times out and the VM is assumed to have failed. The default value is 1. This value must be lower than periodSeconds
.Create the VM by running the following command:
$ oc create -f <file_name>.yaml
Defining an HTTP liveness probe
Define an HTTP liveness probe by setting the spec.livenessProbe.httpGet
field of the virtual machine (VM) configuration. You can define both HTTP and TCP tests for liveness probes in the same way as readiness probes. This procedure configures a sample liveness probe with an HTTP GET test.
Procedure
Include details of the HTTP liveness probe in the VM configuration file.
Sample liveness probe with an HTTP GET test
# ...
spec:
livenessProbe:
initialDelaySeconds: 120 (1)
periodSeconds: 20 (2)
httpGet: (3)
port: 1500 (4)
path: /healthz (5)
httpHeaders:
- name: Custom-Header
value: Awesome
timeoutSeconds: 10 (6)
# ...
1 The time, in seconds, after the VM starts before the liveness probe is initiated. 2 The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds
.3 The HTTP GET request to perform to connect to the VM. 4 The port of the VM that the probe queries. In the above example, the probe queries port 1500. The VM installs and runs a minimal HTTP server on port 1500 via cloud-init. 5 The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz
path returns a success code, the VM is considered to be healthy. If the handler returns a failure code, the VM is deleted and a new VM is created.6 The number of seconds of inactivity after which the probe times out and the VM is assumed to have failed. The default value is 1. This value must be lower than periodSeconds
.Create the VM by running the following command:
$ oc create -f <file_name>.yaml
Defining a watchdog
You can define a watchdog to monitor the health of the guest operating system by performing the following steps:
Configure a watchdog device for the virtual machine (VM).
Install the watchdog agent on the guest.
The watchdog device monitors the agent and performs one of the following actions if the guest operating system is unresponsive:
poweroff
: The VM powers down immediately. Ifspec.running
is set totrue
orspec.runStrategy
is not set tomanual
, then the VM reboots.reset
: The VM reboots in place and the guest operating system cannot react.The reboot time might cause liveness probes to time out. If cluster-level protections detect a failed liveness probe, the VM might be forcibly rescheduled, increasing the reboot time.
shutdown
: The VM gracefully powers down by stopping all services.
Watchdog is not available for Windows VMs. |
Configuring a watchdog device for the virtual machine
You configure a watchdog device for the virtual machine (VM).
Prerequisites
- The VM must have kernel support for an
i6300esb
watchdog device. Fedora images supporti6300esb
.
Procedure
Create a
YAML
file with the following contents:apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm2-rhel84-watchdog
name: <vm-name>
spec:
running: false
template:
metadata:
labels:
kubevirt.io/vm: vm2-rhel84-watchdog
spec:
domain:
devices:
watchdog:
name: <watchdog>
i6300esb:
action: "poweroff" (1)
# ...
1 Specify poweroff
,reset
, orshutdown
.The example above configures the
i6300esb
watchdog device on a RHEL8 VM with the poweroff action and exposes the device as/dev/watchdog
.This device can now be used by the watchdog binary.
Apply the YAML file to your cluster by running the following command:
$ oc apply -f <file_name>.yaml
Verification
This procedure is provided for testing watchdog functionality only and must not be run on production machines. |
Run the following command to verify that the VM is connected to the watchdog device:
$ lspci | grep watchdog -i
Run one of the following commands to confirm the watchdog is active:
Trigger a kernel panic:
# echo c > /proc/sysrq-trigger
Stop the watchdog service:
# pkill -9 watchdog
Installing the watchdog agent on the guest
You install the watchdog agent on the guest and start the watchdog
service.
Procedure
Log in to the virtual machine as root user.
Install the
watchdog
package and its dependencies:# yum install watchdog
Uncomment the following line in the
/etc/watchdog.conf
file and save the changes:#watchdog-device = /dev/watchdog
Enable the
watchdog
service to start on boot:# systemctl enable --now watchdog.service
Defining a guest agent ping probe
Define a guest agent ping probe by setting the spec.readinessProbe.guestAgentPing
field of the virtual machine (VM) configuration.
The guest agent ping probe is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. |
Prerequisites
- The QEMU guest agent must be installed and enabled on the virtual machine.
Procedure
Include details of the guest agent ping probe in the VM configuration file. For example:
Sample guest agent ping probe
# ...
spec:
readinessProbe:
guestAgentPing: {} (1)
initialDelaySeconds: 120 (2)
periodSeconds: 20 (3)
timeoutSeconds: 10 (4)
failureThreshold: 3 (5)
successThreshold: 3 (6)
# ...
1 The guest agent ping probe to connect to the VM. 2 Optional: The time, in seconds, after the VM starts before the guest agent probe is initiated. 3 Optional: The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds
.4 Optional: The number of seconds of inactivity after which the probe times out and the VM is assumed to have failed. The default value is 1. This value must be lower than periodSeconds
.5 Optional: The number of times that the probe is allowed to fail. The default is 3. After the specified number of attempts, the pod is marked Unready
.6 Optional: The number of times that the probe must report success, after a failure, to be considered successful. The default is 1. Create the VM by running the following command:
$ oc create -f <file_name>.yaml