Exposing custom metrics for virtual machines
OKD includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. This monitoring stack is based on the Prometheus monitoring system. Prometheus is a time-series database and a rule evaluation engine for metrics.
In addition to using the OKD monitoring stack, you can enable monitoring for user-defined projects by using the CLI and query custom metrics that are exposed for virtual machines through the node-exporter
service.
Configuring the node exporter service
The node-exporter agent is deployed on every virtual machine in the cluster from which you want to collect metrics. Configure the node-exporter agent as a service to expose internal metrics and processes that are associated with virtual machines.
Prerequisites
Install the OKD CLI
oc
.Log in to the cluster as a user with
cluster-admin
privileges.Create the
cluster-monitoring-config
ConfigMap
object in theopenshift-monitoring
project.Configure the
user-workload-monitoring-config
ConfigMap
object in theopenshift-user-workload-monitoring
project by settingenableUserWorkload
totrue
.
Procedure
Create the
Service
YAML file. In the following example, the file is callednode-exporter-service.yaml
.kind: Service
apiVersion: v1
metadata:
name: node-exporter-service (1)
namespace: dynamation (2)
labels:
servicetype: metrics (3)
spec:
ports:
- name: exmet (4)
protocol: TCP
port: 9100 (5)
targetPort: 9100 (6)
type: ClusterIP
selector:
monitor: metrics (7)
1 The node-exporter service that exposes the metrics from the virtual machines. 2 The namespace where the service is created. 3 The label for the service. The ServiceMonitor
uses this label to match this service.4 The name given to the port that exposes metrics on port 9100 for the ClusterIP
service.5 The target port used by node-exporter-service
to listen for requests.6 The TCP port number of the virtual machine that is configured with the monitor
label.7 The label used to match the virtual machine’s pods. In this example, any virtual machine’s pod with the label monitor
and a value ofmetrics
will be matched.Create the node-exporter service:
$ oc create -f node-exporter-service.yaml
Configuring a virtual machine with the node exporter service
Download the node-exporter
file on to the virtual machine. Then, create a systemd
service that runs the node-exporter service when the virtual machine boots.
Prerequisites
The pods for the component are running in the
openshift-user-workload-monitoring
project.Grant the
monitoring-edit
role to users who need to monitor this user-defined project.
Procedure
Log on to the virtual machine.
Download the
node-exporter
file on to the virtual machine by using the directory path that applies to the version ofnode-exporter
file.$ wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
Extract the executable and place it in the
/usr/bin
directory.$ sudo tar xvf node_exporter-1.3.1.linux-amd64.tar.gz \
--directory /usr/bin --strip 1 "*/node_exporter"
Create a
node_exporter.service
file in this directory path:/etc/systemd/system
. Thissystemd
service file runs the node-exporter service when the virtual machine reboots.[Unit]
Description=Prometheus Metrics Exporter
After=network.target
StartLimitIntervalSec=0
[Service]
Type=simple
Restart=always
RestartSec=1
User=root
ExecStart=/usr/bin/node_exporter
[Install]
WantedBy=multi-user.target
Enable and start the
systemd
service.$ sudo systemctl enable node_exporter.service
$ sudo systemctl start node_exporter.service
Verification
Verify that the node-exporter agent is reporting metrics from the virtual machine.
$ curl http://localhost:9100/metrics
Example output
go_gc_duration_seconds{quantile="0"} 1.5244e-05
go_gc_duration_seconds{quantile="0.25"} 3.0449e-05
go_gc_duration_seconds{quantile="0.5"} 3.7913e-05
Creating a custom monitoring label for virtual machines
To enable queries to multiple virtual machines from a single service, add a custom label in the virtual machine’s YAML file.
Prerequisites
Install the OKD CLI
oc
.Log in as a user with
cluster-admin
privileges.Access to the web console for stop and restart a virtual machine.
Procedure
Edit the
template
spec of your virtual machine configuration file. In this example, the labelmonitor
has the valuemetrics
.spec:
template:
metadata:
labels:
monitor: metrics
Stop and restart the virtual machine to create a new pod with the label name given to the
monitor
label.
Querying the node-exporter service for metrics
Metrics are exposed for virtual machines through an HTTP service endpoint under the /metrics
canonical name. When you query for metrics, Prometheus directly scrapes the metrics from the metrics endpoint exposed by the virtual machines and presents these metrics for viewing.
Prerequisites
You have access to the cluster as a user with
cluster-admin
privileges or themonitoring-edit
role.You have enabled monitoring for the user-defined project by configuring the node-exporter service.
Procedure
Obtain the HTTP service endpoint by specifying the namespace for the service:
$ oc get service -n <namespace> <node-exporter-service>
To list all available metrics for the node-exporter service, query the
metrics
resource.$ curl http://<172.30.226.162:9100>/metrics | grep -vE "^#|^$"
Example output
node_arp_entries{device="eth0"} 1
node_boot_time_seconds 1.643153218e+09
node_context_switches_total 4.4938158e+07
node_cooling_device_cur_state{name="0",type="Processor"} 0
node_cooling_device_max_state{name="0",type="Processor"} 0
node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
node_cpu_seconds_total{cpu="0",mode="idle"} 1.10586485e+06
node_cpu_seconds_total{cpu="0",mode="iowait"} 37.61
node_cpu_seconds_total{cpu="0",mode="irq"} 233.91
node_cpu_seconds_total{cpu="0",mode="nice"} 551.47
node_cpu_seconds_total{cpu="0",mode="softirq"} 87.3
node_cpu_seconds_total{cpu="0",mode="steal"} 86.12
node_cpu_seconds_total{cpu="0",mode="system"} 464.15
node_cpu_seconds_total{cpu="0",mode="user"} 1075.2
node_disk_discard_time_seconds_total{device="vda"} 0
node_disk_discard_time_seconds_total{device="vdb"} 0
node_disk_discarded_sectors_total{device="vda"} 0
node_disk_discarded_sectors_total{device="vdb"} 0
node_disk_discards_completed_total{device="vda"} 0
node_disk_discards_completed_total{device="vdb"} 0
node_disk_discards_merged_total{device="vda"} 0
node_disk_discards_merged_total{device="vdb"} 0
node_disk_info{device="vda",major="252",minor="0"} 1
node_disk_info{device="vdb",major="252",minor="16"} 1
node_disk_io_now{device="vda"} 0
node_disk_io_now{device="vdb"} 0
node_disk_io_time_seconds_total{device="vda"} 174
node_disk_io_time_seconds_total{device="vdb"} 0.054
node_disk_io_time_weighted_seconds_total{device="vda"} 259.79200000000003
node_disk_io_time_weighted_seconds_total{device="vdb"} 0.039
node_disk_read_bytes_total{device="vda"} 3.71867136e+08
node_disk_read_bytes_total{device="vdb"} 366592
node_disk_read_time_seconds_total{device="vda"} 19.128
node_disk_read_time_seconds_total{device="vdb"} 0.039
node_disk_reads_completed_total{device="vda"} 5619
node_disk_reads_completed_total{device="vdb"} 96
node_disk_reads_merged_total{device="vda"} 5
node_disk_reads_merged_total{device="vdb"} 0
node_disk_write_time_seconds_total{device="vda"} 240.66400000000002
node_disk_write_time_seconds_total{device="vdb"} 0
node_disk_writes_completed_total{device="vda"} 71584
node_disk_writes_completed_total{device="vdb"} 0
node_disk_writes_merged_total{device="vda"} 19761
node_disk_writes_merged_total{device="vdb"} 0
node_disk_written_bytes_total{device="vda"} 2.007924224e+09
node_disk_written_bytes_total{device="vdb"} 0
Creating a ServiceMonitor resource for the node exporter service
You can use a Prometheus client library and scrape metrics from the /metrics
endpoint to access and view the metrics exposed by the node-exporter service. Use a ServiceMonitor
custom resource definition (CRD) to monitor the node exporter service.
Prerequisites
You have access to the cluster as a user with
cluster-admin
privileges or themonitoring-edit
role.You have enabled monitoring for the user-defined project by configuring the node-exporter service.
Procedure
Create a YAML file for the
ServiceMonitor
resource configuration. In this example, the service monitor matches any service with the labelmetrics
and queries theexmet
port every 30 seconds.apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: node-exporter-metrics-monitor
name: node-exporter-metrics-monitor (1)
namespace: dynamation (2)
spec:
endpoints:
- interval: 30s (3)
port: exmet (4)
scheme: http
selector:
matchLabels:
servicetype: metrics
1 The name of the ServiceMonitor
.2 The namespace where the ServiceMonitor
is created.3 The interval at which the port will be queried. 4 The name of the port that is queried every 30 seconds Create the
ServiceMonitor
configuration for the node-exporter service.$ oc create -f node-exporter-metrics-monitor.yaml
Accessing the node exporter service outside the cluster
You can access the node-exporter service outside the cluster and view the exposed metrics.
Prerequisites
You have access to the cluster as a user with
cluster-admin
privileges or themonitoring-edit
role.You have enabled monitoring for the user-defined project by configuring the node-exporter service.
Procedure
Expose the node-exporter service.
$ oc expose service -n <namespace> <node_exporter_service_name>
Obtain the FQDN (Fully Qualified Domain Name) for the route.
$ oc get route -o=custom-columns=NAME:.metadata.name,DNS:.spec.host
Example output
NAME DNS
node-exporter-service node-exporter-service-dynamation.apps.cluster.example.org
Use the
curl
command to display metrics for the node-exporter service.$ curl -s http://node-exporter-service-dynamation.apps.cluster.example.org/metrics
Example output
go_gc_duration_seconds{quantile="0"} 1.5382e-05
go_gc_duration_seconds{quantile="0.25"} 3.1163e-05
go_gc_duration_seconds{quantile="0.5"} 3.8546e-05
go_gc_duration_seconds{quantile="0.75"} 4.9139e-05
go_gc_duration_seconds{quantile="1"} 0.000189423