Performance Analyzer

Performance Analyzer is a plugin that contains an agent and REST API that allow you to query numerous cluster performance metrics, including aggregations of those metrics.

The Performance Analyzer plugin is installed by default in OpenSearch versions 2.0 and later. If you want to use OpenSearch 2.0 or later with Performance Analyzer disabled, see Disable Performance Analyzer.

Prerequisites

Before using Performance Analyzer with OpenSearch, review the following prerequisites.

Storage

Performance Analyzer uses /dev/shm for temporary storage. During heavy cluster workloads, Performance Analyzer can use up to 1 GB of space.

Docker, however, has a default /dev/shm size of 64 MB. To change this value, you can use the docker run --shm-size 1gb flag or a similar setting in Docker Compose.

If you’re not using Docker, you can check the size of /dev/shm using df -h. The default value should be adequate, but if you need to change its size, add the following line to /etc/fstab:

  1. tmpfs /dev/shm tmpfs defaults,noexec,nosuid,size=1G 0 0

Then remount the file system:

  1. mount -o remount /dev/shm

Security

Performance Analyzer supports encryption in transit for requests. It currently does not support client or server authentication for requests. To enable encryption in transit, edit performance-analyzer.properties in your $OPENSEARCH_HOME directory:

  1. vi $OPENSEARCH_HOME/config/opensearch-performance-analyzer/performance-analyzer.properties

Change the following lines to configure encryption in transit. Note that certificate-file-path must be a certificate for the server and not a root certificate authority (CA).

  1. https-enabled = true
  2. #Setup the correct path for certificates
  3. certificate-file-path = specify_path
  4. private-key-file-path = specify_path

Install Performance Analyzer

The Performance Analyzer plugin is included in the installations for Docker and tarball, but you can also install the plugin manually.

To install the Performance Analyzer plugin manually, download the plugin from Maven and install it using the standard plugin installation process. Performance Analyzer runs on each node in a cluster.

To start the Performance Analyzer root cause analysis (RCA) agent on a tarball installation, run the following command:

  1. OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli

The following command enables the Performance Analyzer plugin.

  1. curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'

Disable Performance Analyzer

If you prefer to save memory and run your local instance of OpenSearch with the Performance Analyzer plugin disabled, perform the following steps:

  1. Before disabling Performance Analyzer, stop any currently running RCA agent action by using the following command:
  1. curl -XPOST localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}'
  1. Shut down the Performance Analyzer RCA agent by running the following command:
  1. kill $(ps aux | grep -i 'PerformanceAnalyzerApp' | grep -v grep | awk '{print $2}')
  1. Disable the Performance Analyzer plugin by running the following command:
  1. curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}'
  1. Uninstall the Performance Analyzer plugin by running the following command:
  1. bin/opensearch-plugin remove opensearch-performance-analyzer

Configure Performance Analyzer

To configure the Performance Analyzer plugin, edit the performance-analyzer.properties configuration file in the config/opensearch-performance-analyzer/ directory. Make sure to uncomment the line #webservice-bind-host and set it to 0.0.0.0. You can reference the following example configuration.

  1. # ======================== OpenSearch Performance Analyzer plugin config =========================
  2. # NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.
  3. # WebService bind host; default to all interfaces
  4. webservice-bind-host = 0.0.0.0
  5. # Metrics data location
  6. metrics-location = /dev/shm/performanceanalyzer/
  7. # Metrics deletion interval (minutes) for metrics data.
  8. # Interval should be between 1 to 60.
  9. metrics-deletion-interval = 1
  10. # If set to true, the system cleans up the files behind it. So at any point, we should expect only 2
  11. # metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving
  12. # the files and wouldn't like for them to be cleaned up.
  13. cleanup-metrics-db-files = true
  14. # WebService exposed by App's port
  15. webservice-listener-port = 9600
  16. # Metric DB File Prefix Path location
  17. metrics-db-file-prefix-path = /tmp/metricsdb_
  18. https-enabled = false
  19. #Setup the correct path for certificates
  20. #certificate-file-path = specify_path
  21. #private-key-file-path = specify_path
  22. # Plugin Stats Metadata file name, expected to be in the same location
  23. plugin-stats-metadata = plugin-stats-metadata
  24. # Agent Stats Metadata file name, expected to be in the same location
  25. agent-stats-metadata = agent-stats-metadata

To start the Performance Analyzer RCA agent, run the following command:

  1. OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli

Enable Performance Analyzer for RPM/YUM installations

If you installed OpenSearch from an RPM distribution, you can start and stop Performance Analyzer with systemctl:

  1. # Start OpenSearch Performance Analyzer
  2. sudo systemctl start opensearch-performance-analyzer.service
  3. # Stop OpenSearch Performance Analyzer
  4. sudo systemctl stop opensearch-performance-analyzer.service

Example API query and response

The following is an example Performance Analyzer API query. The query pulls performance metrics related to your OpenSearch cluster:

  1. GET localhost:9600/_plugins/_performanceanalyzer/metrics/units

The following is an example response:

  1. {"Disk_Utilization":"%","Cache_Request_Hit":"count",
  2. "Refresh_Time":"ms","ThreadPool_QueueLatency":"count",
  3. "Merge_Time":"ms","ClusterApplierService_Latency":"ms",
  4. "PublishClusterState_Latency":"ms",
  5. "Cache_Request_Size":"B","LeaderCheck_Failure":"count",
  6. "ThreadPool_QueueSize":"count","Sched_Runtime":"s/ctxswitch","Disk_ServiceRate":"MB/s","Heap_AllocRate":"B/s","Indexing_Pressure_Current_Limits":"B",
  7. "Sched_Waittime":"s/ctxswitch","ShardBulkDocs":"count",
  8. "Thread_Blocked_Time":"s/event","VersionMap_Memory":"B",
  9. "Master_Task_Queue_Time":"ms","IO_TotThroughput":"B/s",
  10. "Indexing_Pressure_Current_Bytes":"B",
  11. "Indexing_Pressure_Last_Successful_Timestamp":"ms",
  12. "Net_PacketRate6":"packets/s","Cache_Query_Hit":"count",
  13. "IO_ReadSyscallRate":"count/s","Net_PacketRate4":"packets/s","Cache_Request_Miss":"count",
  14. "ThreadPool_RejectedReqs":"count","Net_TCP_TxQ":"segments/flow","Master_Task_Run_Time":"ms",
  15. "IO_WriteSyscallRate":"count/s","IO_WriteThroughput":"B/s",
  16. "Refresh_Event":"count","Flush_Time":"ms","Heap_Init":"B",
  17. "Indexing_Pressure_Rejection_Count":"count",
  18. "CPU_Utilization":"cores","Cache_Query_Size":"B",
  19. "Merge_Event":"count","Cache_FieldData_Eviction":"count",
  20. "IO_TotalSyscallRate":"count/s","Net_Throughput":"B/s",
  21. "Paging_RSS":"pages",
  22. "AdmissionControl_ThresholdValue":"count",
  23. "Indexing_Pressure_Average_Window_Throughput":"count/s",
  24. "Cache_MaxSize":"B","IndexWriter_Memory":"B",
  25. "Net_TCP_SSThresh":"B/flow","IO_ReadThroughput":"B/s",
  26. "LeaderCheck_Latency":"ms","FollowerCheck_Failure":"count",
  27. "HTTP_RequestDocs":"count","Net_TCP_Lost":"segments/flow",
  28. "GC_Collection_Event":"count","Sched_CtxRate":"count/s",
  29. "AdmissionControl_RejectionCount":"count","Heap_Max":"B",
  30. "ClusterApplierService_Failure":"count",
  31. "PublishClusterState_Failure":"count",
  32. "Merge_CurrentEvent":"count","Indexing_Buffer":"B",
  33. "Bitset_Memory":"B","Net_PacketDropRate4":"packets/s",
  34. "Heap_Committed":"B","Net_PacketDropRate6":"packets/s",
  35. "Thread_Blocked_Event":"count","GC_Collection_Time":"ms",
  36. "Cache_Query_Miss":"count","Latency":"ms",
  37. "Shard_State":"count","Thread_Waited_Event":"count",
  38. "CB_ConfiguredSize":"B","ThreadPool_QueueCapacity":"count",
  39. "CB_TrippedEvents":"count","Disk_WaitTime":"ms",
  40. "Data_RetryingPendingTasksCount":"count",
  41. "AdmissionControl_CurrentValue":"count",
  42. "Flush_Event":"count","Net_TCP_RxQ":"segments/flow",
  43. "Shard_Size_In_Bytes":"B","Thread_Waited_Time":"s/event",
  44. "HTTP_TotalRequests":"count",
  45. "ThreadPool_ActiveThreads":"count",
  46. "Paging_MinfltRate":"count/s","Net_TCP_SendCWND":"B/flow",
  47. "Cache_Request_Eviction":"count","Segments_Total":"count",
  48. "FollowerCheck_Latency":"ms","Heap_Used":"B",
  49. "Master_ThrottledPendingTasksCount":"count",
  50. "CB_EstimatedSize":"B","Indexing_ThrottleTime":"ms",
  51. "Master_PendingQueueSize":"count",
  52. "Cache_FieldData_Size":"B","Paging_MajfltRate":"count/s",
  53. "ThreadPool_TotalThreads":"count","ShardEvents":"count",
  54. "Net_TCP_NumFlows":"count","Election_Term":"count"}

Root cause analysis

The root cause analysis (RCA) framework uses the information from Performance Analyzer to inform administrators of the root cause of performance and availability issues experienced by their clusters.

Enable the RCA framework

To enable the RCA framework, run the following command:

  1. curl -XPOST http://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'

If you encounter the curl: (52) Empty reply from server response, run the following command to enable RCA:

  1. curl -XPOST https://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:<custom-admin-password>' -k

Example API query and response

To request all available RCAs, run the following command:

  1. GET localhost:9600/_plugins/_performanceanalyzer/rca

To request a specific RCA, run the following command:

  1. GET localhost:9600/_plugins/_performanceanalyzer/rca?name=HighHeapUsageClusterRCA

The following is an example response:

  1. {
  2. "HighHeapUsageClusterRCA": [{
  3. "RCA_name": "HighHeapUsageClusterRCA",
  4. "state": "unhealthy",
  5. "timestamp": 1587426650942,
  6. "HotClusterSummary": [{
  7. "number_of_nodes": 2,
  8. "number_of_unhealthy_nodes": 1,
  9. "HotNodeSummary": [{
  10. "host_address": "192.168.144.2",
  11. "node_id": "JtlEoRowSI6iNpzpjlbp_Q",
  12. "HotResourceSummary": [{
  13. "resource_type": "old gen",
  14. "threshold": 0.65,
  15. "value": 0.81827232588145373,
  16. "avg": NaN,
  17. "max": NaN,
  18. "min": NaN,
  19. "unit_type": "heap usage in percentage",
  20. "time_period_seconds": 600,
  21. "TopConsumerSummary": [{
  22. "name": "CACHE_FIELDDATA_SIZE",
  23. "value": 590702564
  24. },
  25. {
  26. "name": "CACHE_REQUEST_SIZE",
  27. "value": 28375
  28. },
  29. {
  30. "name": "CACHE_QUERY_SIZE",
  31. "value": 12687
  32. }
  33. ],
  34. }]
  35. }]
  36. }]
  37. }]
  38. }

Further documentation on the use of Performance Analyzer and RCA can be found at the following links: