Monitoring ArangoDB using collectd
Problem
The ArangoDB web interface shows a nice summary of the current state. I want to see similar numbers in my monitoring system so I can analyze the system usage post mortem or send alarms on failure.
Solution
Collectd is an excellent tool to gather all kinds of metrics from a system,and deliver it to a central monitoring like Graphiteand / or Nagios.
Ingredients
For this recipe you need to install the following tools:
- collectd >= 5.4.2 The aggregation Daemon
- kcollectd for inspecting the data
Configuring collectd
For aggregating the values we will use the cURL-JSON plug-in.We will store the values using the Round-Robin-Database writer(RRD) which kcollectd
can later on present to you.
We assume your collectd
comes from your distribution and reads its config from /etc/collectd/collectd.conf
. Since this file tends to become pretty unreadable quickly, we use the include
mechanism:
<Include "/etc/collectd/collectd.conf.d">
Filter "*.conf"
</Include>
This way we can make each metric group on compact set config files. It consists of three components:
- loading the plug-in
- adding metrics to the TypesDB
- the configuration for the plug-in itself
rrdtool
We will use the Round-Robin-Database as storage backend for now. It creates its own database files of fixed size for each specific time range. Later you may choose more advanced writer-plug-ins, which may do network distribution of your metrics or integrate the above mentioned Graphite or your already established monitoring, etc.
For the RRD we will go pretty much with defaults:
# Load the plug-in:
LoadPlugin rrdtool
<Plugin rrdtool>
DataDir "/var/lib/collectd/rrd"
# CacheTimeout 120
# CacheFlush 900
# WritesPerSecond 30
# CreateFilesAsync false
# RandomTimeout 0
#
# The following settings are rather advanced
# and should usually not be touched:
# StepSize 10
# HeartBeat 20
# RRARows 1200
# RRATimespan 158112000
# XFF 0.1
</Plugin>
cURL JSON
Collectd
comes with a wide range of metric aggregation plug-ins.Many tools today use JSON as data formatting grammar; so does ArangoDB.
Therefore a plug-in offering to fetch JSON documents via HTTP is the perfect match to query ArangoDBsadministrative Statistics interface:
# Load the plug-in:
LoadPlugin curl_json
# we need to use our own types to generate individual names for our gauges:
# TypesDB "/etc/collectd/arangodb_types.db"
<Plugin curl_json>
# Adjust the URL so collectd can reach your arangod:
<URL "http://localhost:8529/_db/_system/_admin/statistics">
# Set your authentication to Aardvark here:
User "root"
# Password "bar"
<Key "http/requestsTotal">
Type "gauge"
</Key>
<Key "http/requestsPatch">
Type "gauge"
</Key>
<Key "http/requestsPut">
Type "gauge"
</Key>
<Key "http/requestsOther">
Type "gauge"
</Key>
<Key "http/requestsAsync">
Type "gauge"
</Key>
<Key "http/requestsPost">
Type "gauge"
</Key>
<Key "http/requestsOptions">
Type "gauge"
</Key>
<Key "http/requestsHead">
Type "gauge"
</Key>
<Key "http/requestsGet">
Type "gauge"
</Key>
<Key "http/requestsDelete">
Type "gauge"
</Key>
<Key "system/minorPageFaults">
Type "gauge"
</Key>
<Key "system/majorPageFaults">
Type "gauge"
</Key>
<Key "system/userTime">
Type "gauge"
</Key>
<Key "system/systemTime">
Type "gauge"
</Key>
<Key "system/numberOfThreads">
Type "gauge"
</Key>
<Key "system/virtualSize">
Type "gauge"
</Key>
<Key "system/residentSize">
Type "gauge"
</Key>
<Key "system/residentSizePercent">
Type "gauge"
</Key>
<Key "server/threads/running">
Type "gauge"
</Key>
<Key "server/threads/queued">
Type "gauge"
</Key>
<Key "server/threads/working">
Type "gauge"
</Key>
<Key "server/threads/blocked">
Type "gauge"
</Key>
<Key "server/uptime">
Type "gauge"
</Key>
<Key "server/physicalMemory">
Type "gauge"
</Key>
<Key "server/v8Context/available">
Type "gauge"
</Key>
<Key "server/v8Context/max">
Type "gauge"
</Key>
<Key "server/v8Context/busy">
Type "gauge"
</Key>
<Key "server/v8Context/dirty">
Type "gauge"
</Key>
<Key "server/v8Context/free">
Type "gauge"
</Key>
<Key "client/totalTime/count">
Type "client_totalTime_count"
</Key>
<Key "client/totalTime/sum">
Type "client_totalTime_sum"
</Key>
<Key "client/totalTime/counts/0">
Type "client_totalTime_counts0"
</Key>
<Key "client/bytesReceived/count">
Type "client_bytesReceived_count"
</Key>
<Key "client/bytesReceived/sum">
Type "client_bytesReceived_sum"
</Key>
<Key "client/bytesReceived/counts/0">
Type "client_bytesReceived_counts0"
</Key>
<Key "client/requestTime/count">
Type "client_requestTime_count"
</Key>
<Key "client/requestTime/sum">
Type "client_requestTime_sum"
</Key>
<Key "client/requestTime/counts/0">
Type "client_requestTime_counts0"
</Key>
<Key "client/connectionTime/count">
Type "client_connectionTime_count"
</Key>
<Key "client/connectionTime/sum">
Type "client_connectionTime_sum"
</Key>
<Key "client/connectionTime/counts/0">
Type "client_connectionTime_counts0"
</Key>
<Key "client/queueTime/count">
Type "client_queueTime_count"
</Key>
<Key "client/queueTime/sum">
Type "client_queueTime_sum"
</Key>
<Key "client/queueTime/counts/0">
Type "client_queueTime_counts0"
</Key>
<Key "client/bytesSent/count">
Type "client_bytesSent_count"
</Key>
<Key "client/bytesSent/sum">
Type "client_bytesSent_sum"
</Key>
<Key "client/bytesSent/counts/0">
Type "client_bytesSent_counts0"
</Key>
<Key "client/ioTime/count">
Type "client_ioTime_count"
</Key>
<Key "client/ioTime/sum">
Type "client_ioTime_sum"
</Key>
<Key "client/ioTime/counts/0">
Type "client_ioTime_counts0"
</Key>
<Key "client/httpConnections">
Type "gauge"
</Key>
</URL>
</Plugin>
To circumvent the shortcoming of the curl_JSON plug-in to only take the last path element as name for the metric, we need to give them a name using our own types.db
file in /etc/collectd/arangodb_types.db
:
client_totalTime_count value:GAUGE:0:9223372036854775807
client_totalTime_sum value:GAUGE:U:U
client_totalTime_counts0 value:GAUGE:U:U
client_bytesReceived_count value:GAUGE:0:9223372036854775807
client_bytesReceived_sum value:GAUGE:U:U
client_bytesReceived_counts0 value:GAUGE:U:U
client_requestTime_count value:GAUGE:0:9223372036854775807
client_requestTime_sum value:GAUGE:U:U
client_requestTime_counts0 value:GAUGE:U:U
client_connectionTime_count value:GAUGE:0:9223372036854775807
client_connectionTime_sum value:GAUGE:U:U
client_connectionTime_counts0 value:GAUGE:U:U
client_queueTime_count value:GAUGE:0:9223372036854775807
client_queueTime_sum value:GAUGE:U:U
client_queueTime_counts0 value:GAUGE:U:U
client_bytesSent_count value:GAUGE:0:9223372036854775807
client_bytesSent_sum value:GAUGE:U:U
client_bytesSent_counts0 value:GAUGE:U:U
client_ioTime_count value:GAUGE:0:9223372036854775807
client_ioTime_sum value:GAUGE:U:U
client_ioTime_counts0 value:GAUGE:U:U
Please note that you probably need to uncomment this line from the main collectd.conf:
# TypesDB "/usr/share/collectd/types.db" "/etc/collectd/my_types.db"
in order to make it still load its main types definition file.
Rolling your own
You may want to monitor your own metrics from ArangoDB. Here is a simple example how to use the config
:
{
"testArray":[1,2],
"testArrayInbetween":[{"blarg":3},{"blub":4}],
"testDirectHit":5,
"testSubLevelHit":{"oneMoreLevel":6}
}
This config
snippet will parse the JSON above:
<Key "testArray/0">
Type "gauge"
# Expect: 1
</Key>
<Key "testArray/1">
Type "gauge"
# Expect: 2
</Key>
<Key "testArrayInbetween/0/blarg">
Type "gauge"
# Expect: 3
</Key>
<Key "testArrayInbetween/1/blub">
Type "gauge"
# Expect: 4
</Key>
<Key "testDirectHit">
Type "gauge"
# Expect: 5
</Key>
<Key "testSubLevelHit/oneMoreLevel">
Type "gauge"
# Expect: 6
</Key
Get it served
Now we will (re)start collectd
so it picks up our configuration:
/etc/init.d/collectd start
We will inspect the syslog to revalidate nothing went wrong:
Mar 3 13:59:52 localhost collectd[11276]: Starting statistics collection and monitoring daemon: collectd.
Mar 3 13:59:52 localhost systemd[1]: Started LSB: manage the statistics collection daemon.
Mar 3 13:59:52 localhost collectd[11283]: Initialization complete, entering read-loop.
Collectd
adds the hostname to the directory address, so now we should have files like these:
-rw-r--r-- 1 root root 154888 Mar 2 16:53 /var/lib/collectd/rrd/localhost/curl_json-default/gauge-numberOfThreads15M.rrd
Now we start kcollectd
to view the values in the RRD file:
Since we started putting values in just now, we need to choose ‘last hour’ and zoom in a little more to inspect the values.
Finished with this dish, wait for more metrics to come in other recipes.