Troubleshooting

OKD Virtualization provides tools and logs for troubleshooting virtual machines and virtualization components.

You can troubleshoot OKD Virtualization components by using the tools provided in the web console or by using the oc CLI tool.

Events

OKD events are records of important life-cycle information and are useful for monitoring and troubleshooting virtual machine, namespace, and resource issues.

  • VM events: Navigate to the Events tab of the VirtualMachine details page in the web console.

    Namespace events

    You can view namespace events by running the following command:

    1. $ oc get events -n <namespace>

    See the list of events for details about specific events.

    Resource events

    You can view resource events by running the following command:

    1. $ oc describe <resource> <resource_name>

Logs

You can review the following logs for troubleshooting:

Viewing virtual machine logs with the web console

You can view virtual machine logs with the OKD web console.

Procedure

  1. Navigate to VirtualizationVirtualMachines.

  2. Select a virtual machine to open the VirtualMachine details page.

  3. On the Details tab, click the pod name to open the Pod details page.

  4. Click the Logs tab to view the logs.

Viewing OKD Virtualization pod logs

You can view logs for OKD Virtualization pods by using the oc CLI tool.

You can configure the verbosity level of the logs by editing the HyperConverged custom resource (CR).

Viewing OKD Virtualization pod logs with the CLI

You can view logs for the OKD Virtualization pods by using the oc CLI tool.

Procedure

  1. View a list of pods in the OKD Virtualization namespace by running the following command:

    1. $ oc get pods -n openshift-cnv

    Example output

    1. NAME READY STATUS RESTARTS AGE
    2. disks-images-provider-7gqbc 1/1 Running 0 32m
    3. disks-images-provider-vg4kx 1/1 Running 0 32m
    4. virt-api-57fcc4497b-7qfmc 1/1 Running 0 31m
    5. virt-api-57fcc4497b-tx9nc 1/1 Running 0 31m
    6. virt-controller-76c784655f-7fp6m 1/1 Running 0 30m
    7. virt-controller-76c784655f-f4pbd 1/1 Running 0 30m
    8. virt-handler-2m86x 1/1 Running 0 30m
    9. virt-handler-9qs6z 1/1 Running 0 30m
    10. virt-operator-7ccfdbf65f-q5snk 1/1 Running 0 32m
    11. virt-operator-7ccfdbf65f-vllz8 1/1 Running 0 32m
  2. View the pod log by running the following command:

    1. $ oc logs -n openshift-cnv <pod_name>

    If a pod fails to start, you can use the —previous option to view logs from the last attempt.

    To monitor log output in real time, use the -f option.

    Example output

    1. {"component":"virt-handler","level":"info","msg":"set verbosity to 2","pos":"virt-handler.go:453","timestamp":"2022-04-17T08:58:37.373695Z"}
    2. {"component":"virt-handler","level":"info","msg":"set verbosity to 2","pos":"virt-handler.go:453","timestamp":"2022-04-17T08:58:37.373726Z"}
    3. {"component":"virt-handler","level":"info","msg":"setting rate limiter to 5 QPS and 10 Burst","pos":"virt-handler.go:462","timestamp":"2022-04-17T08:58:37.373782Z"}
    4. {"component":"virt-handler","level":"info","msg":"CPU features of a minimum baseline CPU model: map[apic:true clflush:true cmov:true cx16:true cx8:true de:true fpu:true fxsr:true lahf_lm:true lm:true mca:true mce:true mmx:true msr:true mtrr:true nx:true pae:true pat:true pge:true pni:true pse:true pse36:true sep:true sse:true sse2:true sse4.1:true ssse3:true syscall:true tsc:true]","pos":"cpu_plugin.go:96","timestamp":"2022-04-17T08:58:37.390221Z"}
    5. {"component":"virt-handler","level":"warning","msg":"host model mode is expected to contain only one model","pos":"cpu_plugin.go:103","timestamp":"2022-04-17T08:58:37.390263Z"}
    6. {"component":"virt-handler","level":"info","msg":"node-labeller is running","pos":"node_labeller.go:94","timestamp":"2022-04-17T08:58:37.391011Z"}

Configuring OKD Virtualization pod log verbosity

You can configure the verbosity level of OKD Virtualization pod logs by editing the HyperConverged custom resource (CR).

Procedure

  1. To set log verbosity for specific components, open the HyperConverged CR in your default text editor by running the following command:

    1. $ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
  2. Set the log level for one or more components by editing the spec.logVerbosityConfig stanza. For example:

    1. apiVersion: hco.kubevirt.io/v1beta1
    2. kind: HyperConverged
    3. metadata:
    4. name: kubevirt-hyperconverged
    5. spec:
    6. logVerbosityConfig:
    7. kubevirt:
    8. virtAPI: 5 (1)
    9. virtController: 4
    10. virtHandler: 3
    11. virtLauncher: 2
    12. virtOperator: 6
    1The log verbosity value must be an integer in the range 1–9, where a higher number indicates a more detailed log. In this example, the virtAPI component logs are exposed if their priority level is 5 or higher.
  3. Apply your changes by saving and exiting the editor.

Common error messages

The following error messages might appear in OKD Virtualization logs:

ErrImagePull or ImagePullBackOff

Indicates an incorrect deployment configuration or problems with the images that are referenced.

Viewing aggregated OKD Virtualization logs with the LokiStack

You can view aggregated logs for OKD Virtualization pods and containers by using the LokiStack in the web console.

Prerequisites

  • You deployed the LokiStack.

Procedure

  1. Navigate to ObserveLogs in the web console.

  2. Select application, for virt-launcher pod logs, or infrastructure, for OKD Virtualization control plane pods and containers, from the log type list.

  3. Click Show Query to display the query field.

  4. Enter the LogQL query in the query field and click Run Query to display the filtered logs.

OKD Virtualization LogQL queries

You can view and filter aggregated logs for OKD Virtualization components by running Loki Query Language (LogQL) queries on the ObserveLogs page in the web console.

The default log type is infrastructure. The virt-launcher log type is application.

Optional: You can include or exclude strings or regular expressions by using line filter expressions.

If the query matches a large number of logs, the query might time out.

Table 1. OKD Virtualization LogQL example queries
ComponentLogQL query

All

  1. {log_type=~”.+”}|json
  2. |kubernetes_labels_app_kubernetes_io_part_of=”hyperconverged-cluster

cdi-apiserver

cdi-deployment

cdi-operator

  1. {log_type=~”.+”}|json
  2. |kubernetes_labels_app_kubernetes_io_part_of=”hyperconverged-cluster
  3. |kubernetes_labels_app_kubernetes_io_component=”storage

hco-operator

  1. {log_type=~”.+”}|json
  2. |kubernetes_labels_app_kubernetes_io_part_of=”hyperconverged-cluster
  3. |kubernetes_labels_app_kubernetes_io_component=”deployment

kubemacpool

  1. {log_type=~”.+”}|json
  2. |kubernetes_labels_app_kubernetes_io_part_of=”hyperconverged-cluster
  3. |kubernetes_labels_app_kubernetes_io_component=”network

virt-api

virt-controller

virt-handler

virt-operator

  1. {log_type=~”.+”}|json
  2. |kubernetes_labels_app_kubernetes_io_part_of=”hyperconverged-cluster
  3. |kubernetes_labels_app_kubernetes_io_component=”compute

ssp-operator

  1. {log_type=~”.+”}|json
  2. |kubernetes_labels_app_kubernetes_io_part_of=”hyperconverged-cluster
  3. |kubernetes_labels_app_kubernetes_io_component=”schedule

Container

  1. {log_type=~”.+”,kubernetes_container_name=~”<container>|<container>”} (1)
  2. |json|kubernetes_labels_app_kubernetes_io_part_of=”hyperconverged-cluster
1Specify one or more containers separated by a pipe (|).

virt-launcher

You must select application from the log type list before running this query.

  1. {log_type=~”.+”, kubernetes_container_name=”compute”}|json
  2. |!= custom-ga-command (1)
1|!= “custom-ga-command” excludes libvirt logs that contain the string custom-ga-command. (BZ#2177684)

You can filter log lines to include or exclude strings or regular expressions by using line filter expressions.

Table 2. Line filter expressions
Line filter expressionDescription

|= “<string>”

Log line contains string

!= “<string>”

Log line does not contain string

|~ “<regex>”

Log line contains regular expression

!~ “<regex>”

Log line does not contain regular expression

Example line filter expression

  1. {log_type=~".+"}|json
  2. |kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster"
  3. |= "error" != "timeout"

Additional resources for LokiStack and LogQL

Troubleshooting data volumes

You can check the Conditions and Events sections of the DataVolume object to analyze and resolve issues.

About data volume conditions and events

You can diagnose data volume issues by examining the output of the Conditions and Events sections generated by the command:

  1. $ oc describe dv <DataVolume>

The Conditions section displays the following Types:

  • Bound

  • Running

  • Ready

The Events section provides the following additional information:

  • Type of event

  • Reason for logging

  • Source of the event

  • Message containing additional diagnostic information.

The output from oc describe does not always contains Events.

An event is generated when the Status, Reason, or Message changes. Both conditions and events react to changes in the state of the data volume.

For example, if you misspell the URL during an import operation, the import generates a 404 message. That message change generates an event with a reason. The output in the Conditions section is updated as well.

Analyzing data volume conditions and events

By inspecting the Conditions and Events sections generated by the describe command, you determine the state of the data volume in relation to persistent volume claims (PVCs), and whether or not an operation is actively running or completed. You might also receive messages that offer specific details about the status of the data volume, and how it came to be in its current state.

There are many different combinations of conditions. Each must be evaluated in its unique context.

Examples of various combinations follow.

  • Bound - A successfully bound PVC displays in this example.

    Note that the Type is Bound, so the Status is True. If the PVC is not bound, the Status is False.

    When the PVC is bound, an event is generated stating that the PVC is bound. In this case, the Reason is Bound and Status is True. The Message indicates which PVC owns the data volume.

    Message, in the Events section, provides further details including how long the PVC has been bound (Age) and by what resource (From), in this case datavolume-controller:

    Example output

    1. Status:
    2. Conditions:
    3. Last Heart Beat Time: 2020-07-15T03:58:24Z
    4. Last Transition Time: 2020-07-15T03:58:24Z
    5. Message: PVC win10-rootdisk Bound
    6. Reason: Bound
    7. Status: True
    8. Type: Bound
    9. ...
    10. Events:
    11. Type Reason Age From Message
    12. ---- ------ ---- ---- -------
    13. Normal Bound 24s datavolume-controller PVC example-dv Bound
  • Running - In this case, note that Type is Running and Status is False, indicating that an event has occurred that caused an attempted operation to fail, changing the Status from True to False.

    However, note that Reason is Completed and the Message field indicates Import Complete.

    In the Events section, the Reason and Message contain additional troubleshooting information about the failed operation. In this example, the Message displays an inability to connect due to a 404, listed in the Events section’s first Warning.

    From this information, you conclude that an import operation was running, creating contention for other operations that are attempting to access the data volume:

    Example output

    1. Status:
    2. Conditions:
    3. Last Heart Beat Time: 2020-07-15T04:31:39Z
    4. Last Transition Time: 2020-07-15T04:31:39Z
    5. Message: Import Complete
    6. Reason: Completed
    7. Status: False
    8. Type: Running
    9. ...
    10. Events:
    11. Type Reason Age From Message
    12. ---- ------ ---- ---- -------
    13. Warning Error 12s (x2 over 14s) datavolume-controller Unable to connect
    14. to http data source: expected status code 200, got 404. Status: 404 Not Found
  • Ready – If Type is Ready and Status is True, then the data volume is ready to be used, as in the following example. If the data volume is not ready to be used, the Status is False:

    Example output

    1. Status:
    2. Conditions:
    3. Last Heart Beat Time: 2020-07-15T04:31:39Z
    4. Last Transition Time: 2020-07-15T04:31:39Z
    5. Status: True
    6. Type: Ready