4.7. Troubleshooting CouchDB 3 with WeatherReport

4.7.1. Overview

WeatherReport is an OTP application and set of tools that diagnoses common problems which could affect a CouchDB version 3 node or cluster (version 4 or later is not supported). It is accessed via the weatherreport command line escript.

Here is a basic example of using weatherreport followed immediately by the command’s output:

  1. $ weatherreport --etc /path/to/etc
  2. [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.

4.7.2. Usage

For most cases, you can just run the weatherreport command as shown above. However, sometimes you might want to know some extra detail, or run only specific checks. For that, there are command-line options. Execute weatherreport --help to learn more about these options:

  1. $ weatherreport --help
  2. Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
  3. -c, --etc Path to the CouchDB configuration directory
  4. -d, --level Minimum message severity level (default: notice)
  5. -l, --list Describe available diagnostic tasks
  6. -e, --expert Perform more detailed diagnostics
  7. -h, --help Display help/usage
  8. check_name A specific check to run

To get an idea of what checks will be run, use the –list option:

  1. $ weatherreport --list
  2. Available diagnostic checks:
  3. custodian Shard safety/liveness checks
  4. disk Data directory permissions and atime
  5. internal_replication Check the number of pending internal replication jobs
  6. ioq Check the total number of active IOQ requests
  7. mem3_sync Check there is a registered mem3_sync process
  8. membership Cluster membership validity
  9. memory_use Measure memory usage
  10. message_queues Check for processes with large mailboxes
  11. node_stats Check useful erlang statistics for diagnostics
  12. nodes_connected Cluster node liveness
  13. process_calls Check for large numbers of processes with the same current/initial call
  14. process_memory Check for processes with high memory usage
  15. safe_to_rebuild Check whether the node can safely be taken out of service
  16. search Check the local search node is responsive
  17. tcp_queues Measure the length of tcp queues in the kernel

If you want all the gory details about what WeatherReport is doing, you can run the checks at a more verbose logging level with the --level option:

  1. $ weatherreport --etc /path/to/etc --level debug
  2. [debug] Not connected to the local cluster node, trying to connect. alive:false connect_failed:undefined
  3. [debug] Starting distributed Erlang.
  4. [debug] Connected to local cluster node 'node1@127.0.0.1'.
  5. [debug] Local RPC: mem3:nodes([]) [5000]
  6. [debug] Local RPC: os:getpid([]) [5000]
  7. [debug] Running shell command: ps -o pmem,rss -p 73905
  8. [debug] Shell command output:
  9. %MEM RSS
  10. 0.3 25116
  11. [debug] Local RPC: erlang:nodes([]) [5000]
  12. [debug] Local RPC: mem3:nodes([]) [5000]
  13. [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
  14. [info] Process is using 0.3% of available RAM, totalling 25116 KB of real memory.

Most times you’ll want to use the defaults, but any syslog severity name will do (from most to least verbose): debug, info, notice, warning, error, critical, alert, emergency.

Finally, if you want to run just a single diagnostic or a list of specific ones, you can pass their name(s):

  1. $ weatherreport --etc /path/to/etc nodes_connected
  2. [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.