Prometheus queries for virtual resources
OKD Virtualization provides metrics for monitoring how infrastructure resources are consumed in the cluster. The metrics cover the following resources:
vCPU
Network
Storage
Guest memory swapping
Use the OKD monitoring dashboard to query virtualization metrics.
Prerequisites
To use the vCPU metric, the
schedstats=enable
kernel argument must be applied to theMachineConfig
object. This kernel argument enables scheduler statistics used for debugging and performance tuning and adds a minor additional load to the scheduler. See the OKD machine configuration tasks documentation for more information on applying a kernel argument.For guest memory swapping queries to return data, memory swapping must be enabled on the virtual guests.
Querying metrics
The OKD monitoring dashboard enables you to run Prometheus Query Language (PromQL) queries to examine metrics visualized on a plot. This functionality provides information about the state of a cluster and any user-defined workloads that you are monitoring.
As a cluster administrator, you can query metrics for all core OKD and user-defined projects.
As a developer, you must specify a project name when querying metrics. You must have the required privileges to view metrics for the selected project.
Querying metrics for all projects as a cluster administrator
As a cluster administrator or as a user with view permissions for all projects, you can access metrics for all default OKD and user-defined projects in the Metrics UI.
Only cluster administrators have access to the third-party UIs provided with OKD Monitoring. |
Prerequisites
You have access to the cluster as a user with the
cluster-admin
role or with view permissions for all projects.You have installed the OpenShift CLI (
oc
).
Procedure
In the Administrator perspective within the OKD web console, select Observe → Metrics.
Select Insert Metric at Cursor to view a list of predefined queries.
To create a custom query, add your Prometheus Query Language (PromQL) query to the Expression field.
To add multiple queries, select Add Query.
To delete a query, select next to the query, then choose Delete query.
To disable a query from being run, select next to the query and choose Disable query.
Select Run Queries to run the queries that you have created. The metrics from the queries are visualized on the plot. If a query is invalid, the UI shows an error message.
Queries that operate on large amounts of data might time out or overload the browser when drawing time series graphs. To avoid this, select Hide graph and calibrate your query using only the metrics table. Then, after finding a feasible query, enable the plot to draw the graphs.
Optional: The page URL now contains the queries you ran. To use this set of queries again in the future, save this URL.
Additional resources
- See the Prometheus query documentation for more information about creating PromQL queries.
Querying metrics for user-defined projects as a developer
You can access metrics for a user-defined project as a developer or as a user with view permissions for the project.
In the Developer perspective, the Metrics UI includes some predefined CPU, memory, bandwidth, and network packet queries for the selected project. You can also run custom Prometheus Query Language (PromQL) queries for CPU, memory, bandwidth, network packet and application metrics for the project.
Developers can only use the Developer perspective and not the Administrator perspective. As a developer, you can only query metrics for one project at a time. Developers cannot access the third-party UIs provided with OKD monitoring that are for core platform components. Instead, use the Metrics UI for your user-defined project. |
Prerequisites
You have access to the cluster as a developer or as a user with view permissions for the project that you are viewing metrics for.
You have enabled monitoring for user-defined projects.
You have deployed a service in a user-defined project.
You have created a
ServiceMonitor
custom resource definition (CRD) for the service to define how the service is monitored.
Procedure
From the Developer perspective in the OKD web console, select Observe → Metrics.
Select the project that you want to view metrics for in the Project: list.
Choose a query from the Select Query list, or run a custom PromQL query by selecting Show PromQL.
In the Developer perspective, you can only run one query at a time.
Additional resources
- See the Prometheus query documentation for more information about creating PromQL queries.
Virtualization metrics
The following metric descriptions include example Prometheus Query Language (PromQL) queries. These metrics are not an API and might change between versions.
The following examples use |
vCPU metrics
The following query can identify virtual machines that are waiting for Input/Output (I/O):
kubevirt_vmi_vcpu_wait_seconds
Returns the wait time (in seconds) for a virtual machine’s vCPU.
A value above ‘0’ means that the vCPU wants to run, but the host scheduler cannot run it yet. This inability to run indicates that there is an issue with I/O.
To query the vCPU metric, the |
Example vCPU wait time query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_vcpu_wait_seconds[6m]))) > 0 (1)
1 | This query returns the top 3 VMs waiting for I/O at every given moment over a six-minute time period. |
Network metrics
The following queries can identify virtual machines that are saturating the network:
kubevirt_vmi_network_receive_bytes_total
Returns the total amount of traffic received (in bytes) on the virtual machine’s network.
kubevirt_vmi_network_transmit_bytes_total
Returns the total amount of traffic transmitted (in bytes) on the virtual machine’s network.
Example network traffic query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_network_receive_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_network_transmit_bytes_total[6m]))) > 0 (1)
1 | This query returns the top 3 VMs transmitting the most network traffic at every given moment over a six-minute time period. |
Storage metrics
Storage-related traffic
The following queries can identify VMs that are writing large amounts of data:
kubevirt_vmi_storage_read_traffic_bytes_total
Returns the total amount (in bytes) of the virtual machine’s storage-related traffic.
kubevirt_vmi_storage_write_traffic_bytes_total
Returns the total amount of storage writes (in bytes) of the virtual machine’s storage-related traffic.
Example storage-related traffic query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_read_traffic_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_write_traffic_bytes_total[6m]))) > 0 (1)
1 | This query returns the top 3 VMs performing the most storage traffic at every given moment over a six-minute time period. |
I/O performance
The following queries can determine the I/O performance of storage devices:
kubevirt_vmi_storage_iops_read_total
Returns the amount of write I/O operations the virtual machine is performing per second.
kubevirt_vmi_storage_iops_write_total
Returns the amount of read I/O operations the virtual machine is performing per second.
Example I/O performance query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_read_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_write_total[6m]))) > 0 (1)
1 | This query returns the top 3 VMs performing the most I/O operations per second at every given moment over a six-minute time period. |
Guest memory swapping metrics
The following queries can identify which swap-enabled guests are performing the most memory swapping:
kubevirt_vmi_memory_swap_in_traffic_bytes_total
Returns the total amount (in bytes) of memory the virtual guest is swapping in.
kubevirt_vmi_memory_swap_out_traffic_bytes_total
Returns the total amount (in bytes) of memory the virtual guest is swapping out.
Example memory swapping query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_in_traffic_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_out_traffic_bytes_total[6m]))) > 0 (1)
1 | This query returns the top 3 VMs where the guest is performing the most memory swapping at every given moment over a six-minute time period. |
Memory swapping indicates that the virtual machine is under memory pressure. Increasing the memory allocation of the virtual machine can mitigate this issue. |