ArangoDB Starter Recovery Procedure
This procedure is intended to recover a cluster (that was started with the ArangoDB Starter) when a machine of that cluster is broken without the possibility to recover it (e.g. complete HD failure). In the procedure is does not matter if a replacement machine uses the old or a new IP address.
To recover from this scenario, you must:
- Create a new (replacement) machine with ArangoDB (including Starter) installed.
- Create a file called
RECOVERY
in the directory you are going to use as data directory of the Starter (the one that is passed via the option--starter.data-dir
). This file must contain the IP address and port of the Starter that has been broken (and will be replaced with this new machine).
E.g.
echo "192.168.1.25:8528" > $DATADIR/RECOVERY
After creating the RECOVERY
file, start the Starter using all the normal command line arguments.
The Starter will now:
- Talk to the remaining Starters to find the ID of the Starter it replaces and use that ID to join the remaining Starters.
- Talk to the remaining Agents to find the ID of the Agent it replaces and adjust the command-line arguments of the Agent (it will start) to use that ID. This is skipped if the Starter was not running an Agent.
- Remove the
RECOVERY
file from the data directory.
The cluster will now recover automatically. It will however have one more Coordinators and DB-Servers than expected. Exactly one Coordinator and one DB-Server will be listed “red” in the web UI of the database. They will have to be removed manually using the ArangoDB Web UI.