Monitoring ArangoDB Cluster network usage

Problem

We run a cluster and want to know whether the traffic is unbalanced or something like that. We want a cheap estimate which host has how much traffic.

Solution

As we already run Collectd as our metric-hub, we want to utilize it to also give us these figures. A very cheap way to generate these values are the counters in the IPTables firewall of our system.

Ingredients

For this recipe you need to install the following tools:

Getting the state and the Ports of your cluster

Now we need to find out the current configuration of our cluster. For the time being we assume you simply issued

  1. ./scripts/startLocalCluster.sh

to get you set up. So you know you’ve got two DB-Servers - one Coordinator, one Agent:

  1. ps -eaf |grep arango
  2. arangod 21406 1 1 16:59 pts/14 00:00:00 bin/etcd-arango --data-dir /var/tmp/tmp-21550-1347489353/shell_server/agentarango4001 --name agentarango4001 --bind-addr 127.0.0.1:4001 --addr 127.0.0.1:4001 --peer-bind-addr 127.0.0.1:7001 --peer-addr 127.0.0.1:7001 --initial-cluster-state new --initial-cluster agentarango4001=http://127.0.0.1:7001
  3. arangod 21408 1 4 16:56 pts/14 00:00:01 bin/arangod --database.directory cluster/data8629 --cluster.agency-endpoint tcp://localhost:4001 --cluster.my-address tcp://localhost:8629 --server.endpoint tcp://localhost:8629 --log.file cluster/8629.log
  4. arangod 21410 1 5 16:56 pts/14 00:00:02 bin/arangod --database.directory cluster/data8630 --cluster.agency-endpoint tcp://localhost:4001 --cluster.my-address tcp://localhost:8630 --server.endpoint tcp://localhost:8630 --log.file cluster/8630.log
  5. arangod 21416 1 5 16:56 pts/14 00:00:02 bin/arangod --database.directory cluster/data8530 --cluster.agency-endpoint tcp://localhost:4001 --cluster.my-address tcp://localhost:8530 --server.endpoint tcp://localhost:8530 --log.file cluster/8530.log

We can now check which ports they occupied:

  1. netstat -aplnt |grep arango
  2. tcp 0 0 127.0.0.1:7001 0.0.0.0:* LISTEN 21406/etcd-arango
  3. tcp 0 0 127.0.0.1:4001 0.0.0.0:* LISTEN 21406/etcd-arango
  4. tcp 0 0 127.0.0.1:8530 0.0.0.0:* LISTEN 21416/arangod
  5. tcp 0 0 127.0.0.1:8629 0.0.0.0:* LISTEN 21408/arangod
  6. tcp 0 0 127.0.0.1:8630 0.0.0.0:* LISTEN 21410/arangod
  • The Agent has 7001 and 4001. Since it’s running in single server mode its cluster port (7001) should not show any traffic, port 4001 is the interesting one.
  • Claus - This is the Coordinator. Your Application will talk to it on port 8530
  • Pavel - This is the first DB-Server; Claus will talk to it on port 8629
  • Perry - This is the second DB-Server; Claus will talk to it on port 8630

Configuring IPTables / ferm

Since the usual solution using shell scripts calling iptables brings the DRY principle to a grinding hold, we need something better. Here ferm comes to the rescue - It enables you to produce very compact and well readable firewall configurations.

According to the ports we found in the last section, we will configure our firewall in /etc/ferm/ferm.conf, and put the identities into the comments so we have a persistent naming scheme:

  1. # blindly forward these to the accounting chain:
  2. @def $ARANGO_RANGE=4000:9000;
  3. @def &TCP_ACCOUNTING($PORT, $COMMENT, $SRCCHAIN) = {
  4. @def $FULLCOMMENT=@cat($COMMENT, "_", $SRCCHAIN);
  5. dport $PORT mod comment comment $FULLCOMMENT NOP;
  6. }
  7. @def &ARANGO_ACCOUNTING($CHAINNAME) = {
  8. # The Coordinators:
  9. &TCP_ACCOUNTING(8530, "Claus", $CHAINNAME);
  10. # The DB-Servers:
  11. &TCP_ACCOUNTING(8629, "Pavel", $CHAINNAME);
  12. &TCP_ACCOUNTING(8630, "Perry", $CHAINNAME);
  13. # The Agency:
  14. &TCP_ACCOUNTING(4001, "etcd_client", $CHAINNAME);
  15. # it shouldn't talk to itself if it is only running with a single instance:
  16. &TCP_ACCOUNTING(7007, "etcd_cluster", $CHAINNAME);
  17. }
  18. table filter {
  19. chain INPUT {
  20. proto tcp dport $ARANGO_RANGE @subchain "Accounting" {
  21. &ARANGO_ACCOUNTING("input");
  22. }
  23. policy DROP;
  24. # connection tracking
  25. mod state state INVALID DROP;
  26. mod state state (ESTABLISHED RELATED) ACCEPT;
  27. # allow local packet
  28. interface lo ACCEPT;
  29. # respond to ping
  30. proto icmp ACCEPT;
  31. # allow IPsec
  32. proto udp dport 500 ACCEPT;
  33. proto (esp ah) ACCEPT;
  34. # allow SSH connections
  35. proto tcp dport ssh ACCEPT;
  36. }
  37. chain OUTPUT {
  38. policy ACCEPT;
  39. proto tcp dport $ARANGO_RANGE @subchain "Accounting" {
  40. &ARANGO_ACCOUNTING("output");
  41. }
  42. # connection tracking
  43. #mod state state INVALID DROP;
  44. mod state state (ESTABLISHED RELATED) ACCEPT;
  45. }
  46. chain FORWARD {
  47. policy DROP;
  48. # connection tracking
  49. mod state state INVALID DROP;
  50. mod state state (ESTABLISHED RELATED) ACCEPT;
  51. }
  52. }

Note: This is a very basic configuration, mainly with the purpose to demonstrate the accounting feature - so don’t run this in production)

After activating it interactively with

  1. ferm -i /etc/ferm/ferm.conf

We now use the iptables command line utility directly to review the status our current setting:

  1. iptables -L -nvx
  2. Chain INPUT (policy DROP 85 packets, 6046 bytes)
  3. pkts bytes target prot opt in out source destination
  4. 7636 1821798 Accounting tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpts:4000:9000
  5. 0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 state INVALID
  6. 14700 14857709 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
  7. 130 7800 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
  8. 0 0 ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0
  9. 0 0 ACCEPT udp -- * * 0.0.0.0/0 0.0.0.0/0 udp dpt:500
  10. 0 0 ACCEPT esp -- * * 0.0.0.0/0 0.0.0.0/0
  11. 0 0 ACCEPT ah -- * * 0.0.0.0/0 0.0.0.0/0
  12. 0 0 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22
  13. Chain FORWARD (policy DROP 0 packets, 0 bytes)
  14. pkts bytes target prot opt in out source destination
  15. 0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 state INVALID
  16. 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
  17. Chain OUTPUT (policy ACCEPT 296 packets, 19404 bytes)
  18. pkts bytes target prot opt in out source destination
  19. 7720 1882404 Accounting tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpts:4000:9000
  20. 14575 14884356 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
  21. Chain Accounting (2 references)
  22. pkts bytes target prot opt in out source destination
  23. 204 57750 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8530 /* Claus_input */
  24. 20 17890 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8629 /* Pavel_input */
  25. 262 97352 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8630 /* Perry_input */
  26. 2604 336184 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4001 /* etcd_client_input */
  27. 0 0 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:7007 /* etcd_cluster_input */
  28. 204 57750 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8530 /* Claus_output */
  29. 20 17890 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8629 /* Pavel_output */
  30. 262 97352 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8630 /* Perry_output */
  31. 2604 336184 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4001 /* etcd_client_output */
  32. 0 0 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:7007 /* etcd_cluster_output */

You can see nicely the Accounting sub-chain with our comments. These should be pretty straight forward to match. We also see the pkts and bytes columns. They contain the current value of these counters of your system.

Read more about linux firewalling and ferm configuration to be sure you do the right thing.

Configuring Collectd to pick up these values

Since your system now generates these numbers, we want to configure collectd with its iptables plugin to aggregate them.

We do so in the /etc/collectd/collectd.conf.d/iptables.conf:

  1. LoadPlugin iptables
  2. <Plugin iptables>
  3. Chain filter "Accounting" "Claus_input"
  4. Chain filter "Accounting" "Pavel_input"
  5. Chain filter "Accounting" "Perry_input"
  6. Chain filter "Accounting" "etcd_client_input"
  7. Chain filter "Accounting" "etcd_cluster_input"
  8. Chain filter "Accounting" "Claus_output"
  9. Chain filter "Accounting" "Pavel_output"
  10. Chain filter "Accounting" "Perry_output"
  11. Chain filter "Accounting" "etcd_client_output"
  12. Chain filter "Accounting" "etcd_cluster_output"
  13. </Plugin>

Now we restart collectd with /etc/init.d/collectd restart, watch the syslog for errors. If everything is OK, our values should show up in:

  1. /var/lib/collectd/rrd/localhost/iptables-filter-Accounting/ipt_packets-Claus_output.rrd

We can inspect our values with kcollectd:

Kcollectd screenshot