ArangoDB Starter Recovery Procedure
This procedure is intended to recover a cluster (that was started with the ArangoDBStarter) when a machine of that cluster is broken without the possibility to recoverit (e.g. complete HD failure). In the procedure is does not matter if a replacementmachine uses the old or a new IP address.
To recover from this scenario, you must:
- Create a new (replacement) machine with ArangoDB (including Starter) installed.
- Create a file called
RECOVERY
in the directory you are going to use as datadirectory of the Starter (the one that is passed via the option—starter.data-dir
).This file must contain the IP address and port of the Starter that has beenbroken (and will be replaced with this new machine).E.g.
echo "192.168.1.25:8528" > $DATADIR/RECOVERY
After creating the RECOVERY
file, start the Starter using all the normal commandline arguments.
The Starter will now:
- Talk to the remaining Starters to find the ID of the Starter it replaces anduse that ID to join the remaining Starters.
- Talk to the remaining Agents to find the ID of the Agent it replaces andadjust the command-line arguments of the Agent (it will start) to use that ID.This is skipped if the Starter was not running an Agent.
- Remove the
RECOVERY
file from the data directory.The cluster will now recover automatically. It will however have one more Coordinators_and _DBServers than expected. Exactly one Coordinator and one DBServer willbe listed “red” in the web UI of the database. They will have to be removed manuallyusing the ArangoDB Web UI.