ArangoDB Starter Recovery Procedure

ArangoDB Starter Recovery Procedure

This procedure is intended to recover a cluster (that was started with the ArangoDBStarter) when a machine of that cluster is broken without the possibility to recoverit (e.g. complete HD failure). In the procedure is does not matter if a replacementmachine uses the old or a new IP address.

To recover from this scenario, you must:

Create a new (replacement) machine with ArangoDB (including Starter) installed.
Create a file called RECOVERY in the directory you are going to use as datadirectory of the Starter (the one that is passed via the option —starter.data-dir).This file must contain the IP address and port of the Starter that has beenbroken (and will be replaced with this new machine).E.g.

echo "192.168.1.25:8528" > $DATADIR/RECOVERY

After creating the RECOVERY file, start the Starter using all the normal commandline arguments.

The Starter will now:

Talk to the remaining Starters to find the ID of the Starter it replaces anduse that ID to join the remaining Starters.
Talk to the remaining Agents to find the ID of the Agent it replaces andadjust the command-line arguments of the Agent (it will start) to use that ID.This is skipped if the Starter was not running an Agent.
Remove the RECOVERY file from the data directory.The cluster will now recover automatically. It will however have one more Coordinators_and _DBServers than expected. Exactly one Coordinator and one DBServer willbe listed “red” in the web UI of the database. They will have to be removed manuallyusing the ArangoDB Web UI.