ArangoDB Starter Recovery Procedure

This procedure is intended to recover a cluster (that was started with the ArangoDBStarter) when a machine of that cluster is broken without the possibility to recoverit (e.g. complete HD failure). In the procedure is does not matter if a replacementmachine uses the old or a new IP address.

To recover from this scenario, you must:

  • Create a new (replacement) machine with ArangoDB (including Starter) installed.
  • Create a file called RECOVERY in the directory you are going to use as datadirectory of the Starter (the one that is passed via the option —starter.data-dir).This file must contain the IP address and port of the Starter that has beenbroken (and will be replaced with this new machine).

E.g.

  1. echo "192.168.1.25:8528" > $DATADIR/RECOVERY

After creating the RECOVERY file, start the Starter using all the normal commandline arguments.

The Starter will now:

  • Talk to the remaining Starters to find the ID of the Starter it replaces anduse that ID to join the remaining Starters.
  • Talk to the remaining Agents to find the ID of the Agent it replaces andadjust the command-line arguments of the Agent (it will start) to use that ID.This is skipped if the Starter was not running an Agent.
  • Remove the RECOVERY file from the data directory.The cluster will now recover automatically. It will however have one more Coordinators_and _DBServers than expected. Exactly one Coordinator and one DBServer willbe listed “red” in the web UI of the database. They will have to be removed manuallyusing the ArangoDB Web UI.