Configuring built-in monitoring with Prometheus
This guide describes the built-in monitoring support provided by the Operator SDK using the Prometheus Operator and details usage for authors of Go-based and Ansible-based Operators.
Prometheus Operator support
Prometheus is an open-source systems monitoring and alerting toolkit. The Prometheus Operator creates, configures, and manages Prometheus clusters running on Kubernetes-based clusters, such as OKD.
Helper functions exist in the Operator SDK by default to automatically set up metrics in any generated Go-based Operator for use on clusters where the Prometheus Operator is deployed.
Exposing custom metrics for Go-based Operators
As an Operator author, you can publish custom metrics by using the global Prometheus registry from the controller-runtime/pkg/metrics
library.
Prerequisites
Go-based Operator generated using the Operator SDK
Prometheus Operator, which is deployed by default on OKD clusters
Procedure
In your Operator SDK project, uncomment the following line in the
config/default/kustomization.yaml
file:../prometheus
Create a custom controller class to publish additional metrics from the Operator. The following example declares the
widgets
andwidgetFailures
collectors as global variables, and then registers them with theinit()
function in the controller’s package:controllers/memcached_controller_test_metrics.go
filepackage controllers
import (
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)
var (
widgets = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "widgets_total",
Help: "Number of widgets processed",
},
)
widgetFailures = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "widget_failures_total",
Help: "Number of failed widgets",
},
)
)
func init() {
// Register custom metrics with the global prometheus registry
metrics.Registry.MustRegister(widgets, widgetFailures)
}
Record to these collectors from any part of the reconcile loop in the
main
controller class, which determines the business logic for the metric:controllers/memcached_controller.go
filefunc (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
...
...
// Add metrics
widgets.Inc()
widgetFailures.Inc()
return ctrl.Result{}, nil
}
Build and push the Operator:
$ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
Deploy the Operator:
$ make deploy IMG=<registry>/<user>/<image_name>:<tag>
Create role and role binding definitions to allow the service monitor of the Operator to be scraped by the Prometheus instance of the OKD cluster.
Roles must be assigned so that service accounts have the permissions to scrape the metrics of the namespace:
config/prometheus/role.yaml
roleapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s-role
namespace: <operator_namespace>
rules:
- apiGroups:
- ""
resources:
- endpoints
- pods
- services
- nodes
- secrets
verbs:
- get
- list
- watch
config/prometheus/rolebinding.yaml
role bindingapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-k8s-rolebinding
namespace: memcached-operator-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-k8s-role
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
Apply the roles and role bindings for the deployed Operator:
$ oc apply -f config/prometheus/role.yaml
$ oc apply -f config/prometheus/rolebinding.yaml
Set the labels for the namespace that you want to scrape, which enables OpenShift cluster monitoring for that namespace:
$ oc label namespace <operator_namespace> openshift.io/cluster-monitoring="true"
Verification
- Query and view the metrics in the OKD web console. You can use the names that were set in the custom controller class, for example
widgets_total
andwidget_failures_total
.
Exposing custom metrics for Ansible-based Operators
As an Operator author creating Ansible-based Operators, you can use the Operator SDK’s osdk_metrics
module to expose custom Operator and Operand metrics, emit events, and support logging.
Prerequisites
Ansible-based Operator generated using the Operator SDK
Prometheus Operator, which is deployed by default on OKD clusters
Procedure
Generate an Ansible-based Operator. This example uses a
testmetrics.com
domain:$ operator-sdk init \
--plugins=ansible \
--domain=testmetrics.com
Create a
metrics
API. This example uses akind
namedTestmetrics
:$ operator-sdk create api \
--group metrics \
--version v1 \
--kind Testmetrics \
--generate-role
Edit the
roles/testmetrics/tasks/main.yml
file and use theosdk_metrics
module to create custom metrics for your Operator project:Example
roles/testmetrics/tasks/main.yml
file---
# tasks file for Memcached
- name: start k8sstatus
k8s:
definition:
kind: Deployment
apiVersion: apps/v1
metadata:
name: '{{ ansible_operator_meta.name }}-memcached'
namespace: '{{ ansible_operator_meta.namespace }}'
spec:
replicas: "{{size}}"
selector:
matchLabels:
app: memcached
template:
metadata:
labels:
app: memcached
spec:
containers:
- name: memcached
command:
- memcached
- -m=64
- -o
- modern
- -v
image: "docker.io/memcached:1.4.36-alpine"
ports:
- containerPort: 11211
- osdk_metric:
name: my_thing_counter
description: This metric counts things
counter: {}
- osdk_metric:
name: my_counter_metric
description: Add 3.14 to the counter
counter:
increment: yes
- osdk_metric:
name: my_gauge_metric
description: Create my gauge and set it to 2.
gauge:
set: 2
- osdk_metric:
name: my_histogram_metric
description: Observe my histogram
histogram:
observe: 2
- osdk_metric:
name: my_summary_metric
description: Observe my summary
summary:
observe: 2
Verification
Run your Operator on a cluster. For example, to use the “run as a deployment” method:
Build the Operator image and push it to a registry:
$ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
Install the Operator on a cluster:
$ make install
Deploy the Operator:
$ make deploy IMG=<registry>/<user>/<image_name>:<tag>
Create a
Testmetrics
custom resource (CR):Define the CR spec:
Example
config/samples/metrics_v1_testmetrics.yaml
fileapiVersion: metrics.testmetrics.com/v1
kind: Testmetrics
metadata:
name: testmetrics-sample
spec:
size: 1
Create the object:
$ oc create -f config/samples/metrics_v1_testmetrics.yaml
Get the pod details:
$ oc get pods
Example output
NAME READY STATUS RESTARTS AGE
ansiblemetrics-controller-manager-<id> 2/2 Running 0 149m
testmetrics-sample-memcached-<id> 1/1 Running 0 147m
Get the endpoint details:
$ oc get ep
Example output
NAME ENDPOINTS AGE
ansiblemetrics-controller-manager-metrics-service 10.129.2.70:8443 150m
Request a custom metrics token:
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
Check the metrics values:
Check the
my_counter_metric
value:$ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep my_counter
Example output
HELP my_counter_metric Add 3.14 to the counter
TYPE my_counter_metric counter
my_counter_metric 2
Check the
my_gauge_metric
value:$ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep gauge
Example output
HELP my_gauge_metric Create my gauge and set it to 2.
Check the
my_histogram_metric
andmy_summary_metric
values:$ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep Observe
Example output
HELP my_histogram_metric Observe my histogram
HELP my_summary_metric Observe my summary