Metrics reference

Performance Analyzer provides a number of metrics to help you evaluate performance. The following tables describe the available metrics, grouped by the dimensions that are most relevant for that metric. All metrics support the avg, sum, min, and max aggregations, although for certain metrics, the measured value is the same regardless of aggregation type.

For information about each of the dimensions, see dimensions reference later in this topic.

This list is extensive. We recommend using Ctrl/Cmd + F to find what you’re looking for.

Relevant dimensions: ShardID, IndexName, Operation, ShardRole

MetricDescription
CPU_UtilizationCPU usage ratio. CPU time (in milliseconds) used by the associated thread(s) in the past five seconds, divided by 5000 milliseconds.
Paging_MajfltRateThe number of major faults per second in the past five seconds. A major fault requires the process to load a memory page from disk.
Paging_MinfltRateThe number of minor faults per second in the past five seconds. A minor fault does not requires the process to load a memory page from disk.
Paging_RSSThe number of pages the process has in real memory—-the pages that count towards text, data, or stack space. This number does not include pages that have not been demand-loaded in or swapped out.
Sched_RuntimeTime (seconds) spent executing on the CPU per context switch.
Sched_WaittimeTime (seconds) spent waiting on a run queue per context switch.
Sched_CtxRateNumber of times run on the CPU per second in the past five seconds.
Heap_AllocRateAn approximation, in bytes, of the heap memory allocated per second in the last 5 seconds.
IO_ReadThroughputNumber of bytes read per second in the last five seconds.
IO_WriteThroughputNumber of bytes written per second in the last five seconds.
IO_TotThroughputNumber of bytes read or written per second in the last five seconds.
IO_ReadSyscallRateRead system calls per second in the last five seconds.
IO_WriteSyscallRateWrite system calls per second in the last five seconds.
IO_TotalSyscallRateRead and write system calls per second in the last five seconds.
Thread_Blocked_TimeThe average amount of time, in seconds, that the associated thread has been blocked from entering or reentering a monitor.
Thread_Blocked_EventThe total number of times that the associated thread has been blocked from entering or reentering a monitor (that is, the number of times a thread has been in the blocked state).
Thread_Waited_TimeThe average amount of time, in seconds, that the associated thread has waited to enter or reenter a monitor (that is, the amount of time a thread has been in the WAITING or TIMED_WAITING state)”.
Thread_Waited_EventThe total number of times that the associated thread has waited to enter or reenter a monitor (that is, the number of times a thread has been in the WAITING or TIMED_WAITING state).
ShardEventsThe total number of events executed on a shard in the past five seconds.
ShardBulkDocsThe total number of documents indexed in the past five seconds.

Relevant dimensions: ShardID, IndexName

MetricDescription
Indexing_ThrottleTimeTime (milliseconds) that the index has been under merge throttling control in the past five seconds.
Cache_Query_HitThe number of successful lookups in the query cache in the past five seconds.
Cache_Query_MissThe number of lookups in the query cache that failed to retrieve a DocIdSet in the past five seconds. DocIdSet is a set of document IDs in Lucene.
Cache_Query_SizeQuery cache memory size in bytes.
Cache_FieldData_EvictionThe number of times OpenSearch has evicted data from the fielddata heap space (occurs when the heap space is full) in the past five seconds.
Cache_FieldData_SizeFielddata memory size in bytes.
Cache_Request_HitThe number of successful lookups in the shard request cache in the past five seconds.
Cache_Request_MissThe number of lookups in the request cache that failed to retrieve the results of search requests in the past five seconds.
Cache_Request_EvictionThe number of times OpenSearch evicts data from shard request cache (occurs when the request cache is full) in the past five seconds.
Cache_Request_SizeShard request cache memory size in bytes.
Refresh_EventThe total number of refreshes executed in the past five seconds.
Refresh_TimeThe total time (milliseconds) spent executing refreshes in the past five seconds
Flush_EventThe total number of flushes executed in the past five seconds.
Flush_TimeThe total time (milliseconds) spent executing flushes in the past five seconds.
Merge_EventThe total number of merges executed in the past five seconds.
Merge_TimeThe total time (milliseconds) spent executing merges in the past five seconds.
Merge_CurrentEventThe current number of merges executing.
Indexing_BufferIndex buffer memory size in bytes.
Segments_TotalThe number of segments.
IndexWriter_MemoryEstimated memory usage by the index writer in bytes.
Bitset_MemoryEstimated memory usage for the cached bit sets in bytes.
VersionMap_MemoryEstimated memory usage of the version map in bytes.
Shard_Size_In_BytesEstimated disk usage of the shard in bytes.

Relevant dimensions: ShardID, IndexName, IndexingStage

MetricDescription
Indexing_Pressure_Current_LimitsThe total heap size, in bytes, that is available for use by an index shard in a particular indexing stage (Coordinating, Primary, or Replica).
Indexing_Pressure_Current_BytesThe total heap size, in bytes, occupied by an index shard in a particular indexing stage (Coordinating, Primary, or Replica).
Indexing_Pressure_Last_Successful_TimestampThe timestamp of a successful request for an index shard in a particular indexing stage (Coordinating, Primary, or Replica).
Indexing_Pressure_Rejection_CountThe total number of rejections performed by OpenSearch for an index shard in a particular indexing stage (Coordinating, Primary, or Replica).
Indexing_Pressure_Average_Window_ThroughputThe average throughput of the last n requests (The value of n is determined by the shard_indexing_pressure.secondary_parameter.throughput.request_size_window setting) for an index shard in a particular indexing stage (Coordinating, Primary, or Replica).

Relevant dimensions: Operation, Exception, Indices, HTTPRespCode, ShardID, IndexName, ShardRole

MetricDescription
LatencyLatency (milliseconds) of a request.

Relevant dimension: MemType

MetricDescription
GC_Collection_EventThe number of garbage collections that have occurred in the past five seconds.
GC_Collection_TimeThe approximate accumulated time (milliseconds) of all garbage collections that have occurred in the past five seconds.
Heap_CommittedThe amount of memory (bytes) that is committed for the JVM to use.
Heap_InitThe amount of memory (bytes) that the JVM initially requests from the operating system for memory management.
Heap_MaxThe maximum amount of memory (bytes) that can be used for memory management.
Heap_UsedThe amount of used memory in bytes.

Relevant dimension: DiskName

MetricDescription
Disk_UtilizationDisk utilization rate: percentage of disk time spent reading and writing by the OpenSearch process in the past five seconds.
Disk_WaitTimeAverage duration (milliseconds) of read and write operations in the past five seconds.
Disk_ServiceRateService rate: MB read or written per second in the past five seconds. This metric assumes that each disk sector stores 512 bytes.

Relevant dimension: DestAddr

MetricDescription
Net_TCP_NumFlowsThe number of samples collected. Performance Analyzer collects 1 sample every 5 seconds.
Net_TCP_TxQThe average number of TCP packets in the send buffer.
Net_TCP_RxQThe average number of TCP packets in the receive buffer.
Net_TCP_LostThe average number of unrecovered recurring timeouts. This number is reset when the recovery finishes or SND.UNA is advanced. SND.UNA is the sequence number of the first byte of data that has been sent but not yet acknowledged.
Net_TCP_SendCWNDThe average size, in bytes, of the sending congestion window.
Net_TCP_SSThreshThe average size, in bytes, of the slow start size threshold.

Relevant dimension: Direction

MetricDescription
Net_PacketRate4The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error.
Net_PacketDropRate4The total number of IPv4 datagrams transmitted or received in error per second.
Net_PacketRate6The total number of IPv6 datagrams transmitted or received from or by interfaces per second, including those transmitted or received in error.
Net_PacketDropRate6The total number of IPv6 datagrams transmitted or received in error per second.
Net_ThroughputThe number of bits transmitted or received per second by all network interfaces.

Relevant dimension: ThreadPoolType

MetricDescription
ThreadPool_QueueSizeThe size of the task queue.
ThreadPool_RejectedReqsThe number of rejected executions.
ThreadPool_TotalThreadsThe current number of threads in the pool.
ThreadPool_ActiveThreadsThe approximate number of threads that are actively executing tasks.
ThreadPool_QueueLatencyThe latency of the task queue.
ThreadPool_QueueCapacityThe current capacity of the task queue.

Relevant dimension: ClusterManager_PendingTaskType

MetricDescription
ClusterManager_PendingQueueSizeThe current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks, such as create index, update mapping, allocate shard, and fail shard.

Relevant dimensions: Operation, Exception, Indices, HTTPRespCode

MetricDescription
HTTP_RequestDocsThe number of items in the request (only for the _bulk request type).
HTTP_TotalRequestsThe number of requests completed in the last 5 seconds.

Relevant dimension: CBType

MetricDescription
CB_EstimatedSizeThe current number of estimated bytes.
CB_TrippedEventsThe number of times that the circuit breaker has tripped.
CB_ConfiguredSizeThe limit, in bytes, of the amount of memory operations can use.

Relevant dimensions: ClusterManagerTaskInsertOrder, ClusterManagerTaskPriority, ClusterManagerTaskType, ClusterManagerTaskMetadata

MetricDescription
ClusterManager_Task_Queue_TimeThe amount of time, in milliseconds, that a cluster manager task spent in the queue.
ClusterManager_Task_Run_TimeThe amount of time, in milliseconds, that a cluster manager task has been running.

Relevant dimension: CacheType

MetricDescription
Cache_MaxSizeThe maximum size of the cache, in bytes.

Relevant dimension: ControllerName

MetricDescription
AdmissionControl_RejectionCountThe total number of rejections performed by a Controller of Admission Control.
AdmissionControl_CurrentValueThe current value for Controller of Admission Control.
AdmissionControl_ThresholdValueThe threshold value for Controller of Admission Control.

Relevant dimension: NodeID

MetricDescription
Data_RetryingPendingTasksCountThe number of throttled pending tasks on which the data node is actively performing retries. It is an absolute metric at that point in time.
ClusterManager_ThrottledPendingTasksCountThe sum of the total pending tasks that were throttled by the cluster manager node. This is a cumulative metric, so make sure to check the max aggregation.

Relevant dimensions: N/A

The following metrics are relevant to the cluster as a whole and do not require specific dimensions.

MetricDescription
Election_TermA number that increases monotonically with every cluster manager election.
PublishClusterState_LatencyThe amount of time taken by the quorum of nodes to publish the new cluster state. This metric is available for the current cluster manager.
PublishClusterState_FailureThe number of times the new cluster state failed to publish on the cluster manager node.
ClusterApplierService_LatencyThe amount of time taken by each node for the apply cluster state sent by the cluster manager.
ClusterApplierService_FailureThe number of times that the apply cluster state action failed on each node.

Relevant dimensions: IndexName, NodeName, ShardType, ShardID

MetricDescription
Shard_StateThe state of each shard, for example, STARTED, UNASSIGNED, or RELOCATING.

Dimensions reference

DimensionReturn values
ShardIDThe ID of the shard, for example, 1.
IndexNameThe name of the index, for example, my-index.
OperationThe type of operation, for example, shardbulk.
ShardRoleThe shard role, for example, primary or replica.
ExceptionOpenSearch exceptions, for example, org.opensearch.index_not_found_exception.
IndicesThe list of indexes in the request URL.
HTTPRespCodeThe response code from OpenSearch, for example, 200.
MemTypeThe memory type, for example, totYoungGC, totFullGC, Survivor, PermGen, OldGen, Eden, NonHeap, or Heap.
DiskNameThe name of the disk, for example, sda1.
DestAddrThe destination address, for example, 010015AC.
DirectionThe direction, for example, in or out.
ThreadPoolTypeThe OpenSearch thread pools, for example, index, search, or snapshot.
CBTypeThe circuit breaker type, for example, accounting, fielddata, in_flight_requests, parent, or request.
ClusterManagerTaskInsertOrderThe order in which the task was inserted, for example, 3691.
ClusterManagerTaskPriorityThe priority of the task, for example, URGENT. OpenSearch executes higher-priority tasks before lower-priority ones, regardless of insert_order.
ClusterManagerTaskTypeThe task type, for example, shard-started, create-index, delete-index, refresh-mapping, put-mapping, CleanupSnapshotRestoreState, or Update snapshot state.
ClusterManagerTaskMetadataThe metadata for the task (if any).
CacheTypeThe cache type, for example, Field_Data_Cache, Shard_Request_Cache, or Node_Query_Cache.