Introduction

At some point in the grows of a database, there comes a need for replicating itacross multiple datacenters.

Reasons for that can be:

  • Fallback in case of a disaster in one datacenter
  • Regional availability
  • Separation of concerns

And many more.

Starting from version 3.3, ArangoDB supports datacenter to datacenterreplication, via the ArangoSync tool.

ArangoDB’s datacenter to datacenter replication is a solution that enables youto asynchronously replicate the entire structure and content in an ArangoDB Clusterin one place to a Cluster in another place. Typically it is used from one datacenterto another. It is possible to replicate to multiple other datacenters as well.It is not a solution for replicating single server instances.

ArangoDB DC2DC

The replication done by ArangoSync is asynchronous. That means that whena client is writing data into the source datacenter, it will consider therequest finished before the data has been replicated to the other datacenter.The time needed to completely replicate changes to the other datacenter istypically in the order of seconds, but this can vary significantly depending onload, network & computer capacity.

ArangoSync performs replication in a single direction only. That means thatyou can replicate data from cluster A to cluster B or from cluster B tocluster A, but never at the same time (one master, one or more slave clusters).Data modified in the destination cluster will be lost!

Replication is a completely autonomous process. Once it is configured it isdesigned to run 24/7 without frequent manual intervention.This does not mean that it requires no maintenance or attention at all.As with any distributed system some attention is needed to monitor its operationand keep it secure (e.g. certificate & password rotation).

In the event of an outage of the master cluster, user intervention is requiredto either bring the master back up or to decide on making a slave cluster thenew master. There is no automatic failover as slave clusters lag behind the masterbecause of network latency etc. and resuming operation with the state of a slavecluster can therefore result in the loss of recent writes. How much can be lostlargely depends on the data rate of the master cluster and the delay betweenthe master and the slaves. Slaves will typically be behind the master by a coupleof seconds or minutes.

Once configured, ArangoSync will replicate both structure and data of anentire cluster. This means that there is no need to make additional configurationchanges when adding/removing databases or collections.Also meta data such as users, Foxx application & jobs are automatically replicated.

A message queue is used for replication. You can use either of the following:

  • DirectMQ (recommended):Message queue developed by ArangoDB in Go. Tailored for DC2DC replicationwith efficient native networking routines. Available since ArangoSync version 0.5.0(shipped with ArangoDB Enterprise Edition v3.3.8).
  • Kafka:Complex general purpose message queue system. Requires Java and potentiallyfine-tuning. A too small message size can cause problems with ArangoSync.Supported by all ArangoSync versions (ArangoDB Enterprise Edition v3.3.0 and above).