Cluster Administration

Cluster Administration

This section includes information related to the administration of an ArangoDB Cluster.

For a general introduction to the ArangoDB Cluster, please refer to theCluster chapter.

There is also a detailedCluster Administration Coursefor download.

Clusters can be deployed by ArangoDB asmanaged servicewith full hosting, management, and monitoring.

Please check the following talks as well:

#	Date	Title	Who	Link
1	10th April 2018	Fundamentals and Best Practices of ArangoDB Cluster Administration	Kaveh Vahedipour, ArangoDB Cluster Team	Online Meetup Page & Video
2	29th May 2018	Fundamentals and Best Practices of ArangoDB Cluster Administration: Part II	Kaveh Vahedipour, ArangoDB Cluster Team	Online Meetup Page & Video

Enabling synchronous replication

For an introduction about Synchronous Replication in Cluster, please referto the Cluster Architecture section.

Synchronous replication can be enabled per collection. When creating acollection you may specify the number of replicas using thereplicationFactor parameter. The default value is set to 1 whicheffectively disables synchronous replication among DB-Servers.

Whenever you specify a replicationFactor greater than 1 when creating acollection, synchronous replication will be activated for this collection. The Cluster will determine suitable leaders and followers for every requested shard (numberOfShards) within the Cluster.

Example:

127.0.0.1:8530@_system> db._create("test", {"replicationFactor": 3})

In the above case, any write operation will require 3 replicas toreport success from now on.

Preparing growth

You may create a collection with higher replication factor thanavailable DB-Servers. When additional DB-Servers become available the shards are automatically replicated to the newly available DB-Servers.

To create a collection with higher replication factor thanavailable DB-Servers please set the option enforceReplicationFactor to false, when creating the collection from ArangoShell (the option is not availablefrom the web interface), e.g.:

db._create("test", { replicationFactor: 4 }, { enforceReplicationFactor: false });

The default value for enforceReplicationFactor is true.

Note: multiple replicas of the same shard can never coexist on the sameDBServer instance.

Sharding

For an introduction about Sharding in Cluster, please refer to theCluster Sharding section.

Number of shards can be configured at collection creation time, e.g. the UI,or the ArangoDB Shell:

db._create("sharded_collection", {"numberOfShards": 4});

To configure a custom hashing for another attribute (default is _key):

db._create("sharded_collection", {"numberOfShards": 4, "shardKeys": ["country"]});

The example above, where ‘country’ has been used as shardKeys can be usefulto keep data of every country in one shard, which would result in betterperformance for queries working on a per country base.

It is also possible to specify multiple shardKeys.

Note however that if you change the shard keys from their default ["_key"],then finding a document in the collection by its primary key involves a requestto every single shard. However this can be mitigated: All CRUD APIs and AQLsupport taking the shard keys as a lookup hint. Just make sure that the shardkey attributes are present in the documents you send, or in case of AQL, thatyou use a document reference or an object for the UPDATE, REPLACE or REMOVEoperation which includes the shard key attributes:

FOR doc IN sharded_collection
  FILTER doc._key == "123"
  UPDATE doc WITH { … } IN sharded_collection

UPDATE { _key: "123", country: "…" } WITH { … } IN sharded_collection

Using a string with just the document key as key expression instead will beprocessed without shard hints and thus perform slower:

UPDATE "123" WITH { … } IN sharded_collection

If custom shard keys are used, you can no longer specify the primary key valuefor a new document, but must let the server generated one automatically. Thisrestriction comes from the fact that ensuring uniqueness of the primary keywould be very inefficient if the user could specify the document key.

Unique indexes (hash, skiplist, persistent) on sharded collections areonly allowed if the fields used to determine the shard key are alsoincluded in the list of attribute paths for the index:

shardKeys	indexKeys
a	a	allowed
a	b	not allowed
a	a, b	allowed
a, b	a	not allowed
a, b	b	not allowed
a, b	a, b	allowed
a, b	a, b, c	allowed
a, b, c	a, b	not allowed
a, b, c	a, b, c	allowed

On which DB-Server in a Cluster a particular shard is kept is undefined.There is no option to configure an affinity based on certain shard keys.

Sharding strategy

Strategy to use for the collection. Since ArangoDB 3.4 there aredifferent sharding strategies to select from when creating a new collection. The selected shardingStrategy value will remainfixed for the collection and cannot be changed afterwards. This isimportant to make the collection keep its sharding settings andalways find documents already distributed to shards using the sameinitial sharding algorithm.

The available sharding strategies are:

community-compat: default sharding used by ArangoDBCommunity Edition before version 3.4
enterprise-compat: default sharding used by ArangoDBEnterprise Edition before version 3.4
enterprise-smart-edge-compat: default sharding used by smart edgecollections in ArangoDB Enterprise Edition before version 3.4
hash: default sharding used for new collections starting from version 3.4(excluding smart edge collections)
enterprise-hash-smart-edge: default sharding used for newsmart edge collections starting from version 3.4

If no sharding strategy is specified, the default will be hash forall collections, and enterprise-hash-smart-edge for all smart edgecollections (requires the Enterprise Edition of ArangoDB).Manually overriding the sharding strategy does not yet provide abenefit, but it may later in case other sharding strategies are added.

The OneShardfeature does not have its own sharding strategy, it uses hash instead.

Moving/Rebalancing shards

A shard can be moved from a DB-Server to another, and the entire shard distributioncan be rebalanced using the corresponding buttons in the web UI.

Replacing/Removing a Coordinator

Coordinators are effectively stateless and can be replaced, added andremoved without more consideration than meeting the necessities of theparticular installation.

To take out a Coordinator stop theCoordinator’s instance by issuing kill -SIGTERM <pid>.

Ca. 15 seconds later the cluster UI on any other Coordinator will markthe Coordinator in question as failed. Almost simultaneously, a trash binicon will appear to the right of the name of the Coordinator. Clickingthat icon will remove the Coordinator from the Coordinator registry.

Any new Coordinator instance that is informed of where to find any/allAgent(s), —cluster.agency-endpoint <some agent endpoint> will beintegrated as a new Coordinator into the cluster. You may also justrestart the Coordinator as before and it will reintegrate itself intothe cluster.

Replacing/Removing a DB-Server

DB-Servers are where the data of an ArangoDB cluster is stored. Theydo not publish a web UI and are not meant to be accessed by any otherentity than Coordinators to perform client requests or other _DB-Servers_to uphold replication and resilience.

The clean way of removing a DB-Server is to first relieve it of allits responsibilities for shards. This applies to followers as well asleaders of shards. The requirement for this operation is that nocollection in any of the databases has a replicationFactor greater orequal to the current number of DB-Servers minus one. For the purpose ofcleaning out DBServer004 for example would work as follows, whenissued to any Coordinator of the cluster:

curl <coord-ip:coord-port>/_admin/cluster/cleanOutServer -d '{"server":"DBServer004"}'

After the DB-Server has been cleaned out, you will find a trash binicon to the right of the name of the DB-Server on any Coordinators’UI. Clicking on it will remove the DB-Server in question from thecluster.

Firing up any DB-Server from a clean data directory by specifying theany of all Agency endpoints will integrate the new DB-Server into thecluster.

To distribute shards onto the new DB-Server either click on theDistribute Shards button at the bottom of the Shards page in everydatabase.

The clean out process can be monitored using the following script,which periodically prints the amount of shards that still need to be moved.It is basically a countdown to when the process finishes.

Save below code to a file named serverCleanMonitor.js:

var dblist = db._databases();
var internal = require("internal");
var arango = internal.arango;
var server = ARGUMENTS[0];
var sleep = ARGUMENTS[1] | 0;
if (!server) {
    print("\nNo server name specified. Provide it like:\n\narangosh <options> -- DBServerXXXX");
    process.exit();
}
if (sleep <= 0) sleep = 10;
console.log("Checking shard distribution every %d seconds...", sleep);
var count;
do {
    count = 0;
    for (dbase in dblist) {
        var sd = arango.GET("/_db/" + dblist[dbase] + "/_admin/cluster/shardDistribution");
        var collections = sd.results;
        for (collection in collections) {
        var current = collections[collection].Current;
        for (shard in current) {
            if (current[shard].leader == server) {
            ++count;
            }
        }
        }
    }
    console.log("Shards to be moved away from node %s: %d", server, count);
    if (count == 0) break;
    internal.wait(sleep);
} while (count > 0);

This script has to be executed in the arangoshby issuing the following command:

arangosh --server.username <username> --server.password <password> --javascript.execute <path/to/serverCleanMonitor.js> -- DBServer<number>

The output should be similar to the one below:

arangosh --server.username root --server.password pass --javascript.execute ~./serverCleanMonitor.js -- DBServer0002
[7836] INFO Checking shard distribution every 10 seconds...
[7836] INFO Shards to be moved away from node DBServer0002: 9
[7836] INFO Shards to be moved away from node DBServer0002: 4
[7836] INFO Shards to be moved away from node DBServer0002: 1
[7836] INFO Shards to be moved away from node DBServer0002: 0

The current status is logged every 10 seconds. You may adjust theinterval by passing a number after the DB-Server name, e.g.arangosh <options> — DBServer0002 60 for every 60 seconds.

Once the count is 0 all shards of the underlying DB-Server have been movedand the cleanOutServer process has finished.