Understanding the Node Observability Operator

The Node Observability Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

The Node Observability Operator collects and stores the CRI-O and Kubelet profiling data of worker nodes. You can use the profiling data to analyze the CRI-O and Kubelet performance trends and debug the performance related issues.

High level workflow of the Node Observability Operator

After you install the Node Observability Operator in the OKD cluster, you have to create a NodeObservability custom resource, which creates a DaemonSet to deploy a Node Observability agent on each worker node.

To request a profiling query, you have to create a NodeObservabilityRun resource that requests the deployed Node Observability agent to trigger the CRI-O and Kubelet profiling. After the profiling is completed, the Node Observability agent stores the profiling data inside the container file system /run/node-observability directory, which is available for query.

Installing the Node Observability Operator

The Node Observability Operator is not installed in OKD by default. You can install the Node Observability Operator by using the OKD CLI or the web console.

Installing the Node Observability Operator using the CLI

You can install the Node Observability Operator by using the OpenShift CLI (oc).

Prerequisites

  • You have installed the OpenShift CLI (oc).

  • You have access to the cluster with cluster-admin privileges.

Procedure

  1. Confirm that the Node Observability Operator is available by running the following command:

    1. $ oc get packagemanifests -n openshift-marketplace node-observability-operator

    Example output

    1. NAME CATALOG AGE
    2. node-observability-operator Red Hat Operators 9h
  2. Create the node-observability-operator namespace by running the following command::

    1. $ oc new-project node-observability-operator
  3. Create an OperatorGroup object YAML file:

    1. cat <<EOF | oc apply -f -
    2. apiVersion: operators.coreos.com/v1
    3. kind: OperatorGroup
    4. metadata:
    5. name: node-observability-operator
    6. namespace: node-observability-operator
    7. spec:
    8. targetNamespaces:
    9. - node-observability-operator
    10. EOF
  4. Create a Subscription object YAML file to subscribe a namespace to an Operator:

    1. cat <<EOF | oc apply -f -
    2. apiVersion: operators.coreos.com/v1alpha1
    3. kind: Subscription
    4. metadata:
    5. name: node-observability-operator
    6. namespace: node-observability-operator
    7. spec:
    8. channel: alpha
    9. name: node-observability-operator
    10. source: redhat-operators
    11. sourceNamespace: openshift-marketplace
    12. EOF

Verification

  1. View the install plan name by running the following command:

    1. $ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'

    Example output

    1. install-dt54w
  2. Verify the install plan status by running the following command:

    1. $ oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'

    <install_plan_name> is the install plan name that you obtained from the output of the previous command.

    Example output

    1. COMPLETE
  3. Verify that the Node Observability Operator is up and running:

    1. $ oc get deploy -n node-observability-operator

    Example output

    1. NAME READY UP-TO-DATE AVAILABLE AGE
    2. node-observability-operator-controller-manager 1/1 1 1 40h

Installing the Node Observability Operator using the web console

You can install the Node Observability Operator from the OKD web console.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.

  • You have access to the OKD web console.

Procedure

  1. Log in to the OKD web console.

  2. In the Administrator’s navigation panel, expand OperatorsOperatorHub.

  3. In the All items field, enter Node Observability Operator and select the Node Observability Operator tile.

  4. Click Install.

  5. On the Install Operator page, configure the following settings:

    1. In the Update channel area, click alpha.

    2. In the Installation mode area, click A specific namespace on the cluster.

    3. From the Installed Namespace list, select node-observability-operator from the list.

    4. In the Update approval area, select Automatic.

    5. Click Install.

Verification

  1. In the Administrator’s navigation panel, expand OperatorsInstalled Operators.

  2. Verify that the Node Observability Operator is listed in the Operators list.

Creating the Node Observability custom resource

Before you run profiling queries, you must create a NodeObservability custom resource (CR).

Creating a NodeObservability CR reboots all the worker nodes. It might take 10 or more minutes to complete.

When you apply the NodeObservability CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes.

Kubelet profiling is enabled by default.

The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRIO to run the pprof request. Similiarly, the kubelet-serving-ca certificate chain is mounted on the agent pod, which allows secure communication between the agent and node’s kubelet endpoint.

Prerequisites

  • You have installed the Node Observability Operator.

  • You have installed the OpenShift CLI (oc).

  • You have access to the cluster with cluster-admin privileges.

Procedure

  1. Log in to the OKD CLI as a user with the cluster-admin role by running the following command:

    1. $ oc login -u kubeadmin https://<HOSTNAME>:6443
  2. Switch back to the node-observability-operator namespace by running the following command:

    1. $ oc project node-observability-operator
  3. Create a CR file named nodeobservability.yaml that contains the following text:

    1. apiVersion: nodeobservability.olm.openshift.io/v1alpha1
    2. kind: NodeObservability
    3. metadata:
    4. name: cluster (1)
    5. spec:
    6. labels:
    7. node-role.kubernetes.io/worker: ""
    8. type: crio-kubelet
    1You must specify the name as cluster because there should be only one NodeObservability CR per cluster.
  4. Run the NodeObservability CR:

    1. oc apply -f nodeobservability.yaml

    Example output

    1. nodeobservability.olm.openshift.io/cluster created
  5. Review the status of the NodeObservability CR by running the following command:

    1. $ oc get nob/cluster -o yaml | yq '.status.conditions'

    Example output

    1. conditions:
    2. conditions:
    3. - lastTransitionTime: "2022-07-05T07:33:54Z"
    4. message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig
    5. ready: true'
    6. reason: Ready
    7. status: "True"
    8. type: Ready

    NodeObservability CR run is completed when the reason is Ready and the status is True.

Running profiling query

Profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. The Node Observability Operator stores the profiling data inside the container file system /run/node-observability directory. To request profiling data query, you have to create a NodeObservabilityRun resource.

You can request only one profiling query at any point of time.

Prerequisites

  • You have installed the Node Observability Operator.

  • You have created the NodeObservability custom resource (CR).

  • You have access to the cluster with cluster-admin privileges.

Procedure

  1. Create a NodeObservabilityRun resource file named nodeobservabilityrun.yaml that contains the following text:

    1. apiVersion: nodeobservability.olm.openshift.io/v1alpha1
    2. kind: NodeObservabilityRun
    3. metadata:
    4. name: nodeobservabilityrun
    5. spec:
    6. nodeObservabilityRef:
    7. name: cluster
  2. Run the NodeObservabilityRun to trigger the profiling:

    1. $ oc apply -f nodeobservabilityrun.yaml
  3. Review the status of the NodeObservabilityRun by running the following command:

    1. $ oc get nodeobservabilityrun -o yaml | yq '.status.conditions'

    Example output

    1. conditions:
    2. - lastTransitionTime: "2022-07-07T14:57:34Z"
    3. message: Ready to start profiling
    4. reason: Ready
    5. status: "True"
    6. type: Ready
    7. - lastTransitionTime: "2022-07-07T14:58:10Z"
    8. message: Profiling query done
    9. reason: Finished
    10. status: "True"
    11. type: Finished

    Profiling query is complete when the status is True and type is Finished.

  4. Run the following bash script to retrieve the profiling data from container’s /run/node-observability path:

    1. for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do
    2. echo "agent ${a}"
    3. mkdir -p "/tmp/${a}"
    4. for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do
    5. f="$(basename ${p})"
    6. echo "copying ${f} to /tmp/${a}/${f}"
    7. oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}"
    8. done
    9. done