Single Instance vs. Cluster
In general, a single server configuration and a cluster configurationof ArangoDB behave very similarly. However, there are differences due tothe different nature of these setups. This can lead to a discrepancy in behaviorbetween these two configurations. A summary of potential differences follows.
See Migrating from Single Instance to Clusterfor practical information.
Locking and dead-lock prevention
In a single server configuration all data is local and dead-locks caneasily be detected. In a cluster configuration data is distributed tomany servers and some conflicts cannot be detected easily. Thereforewe have to do some things (like locking shards) sequentially and in astrictly predefined order, to avoid dead-locks in this way by design.
Document Keys
In a cluster the autoincrement key generator is not supported. Youhave to use the traditional or user defined keys.
Indexes
Unique constraints
There are restrictions on the allowed unique constraints in a cluster.Any unique constraint which cannot be checked locally on a per shardbasis is not allowed in a cluster setup. More concretely, uniqueconstraints in a cluster are only allowed in the following situations:
- there is always a unique constraint on the primary key
_key
, ifthe collection is not sharded by_key
, then_key
must beautomatically generated by the database and cannot be prescribed bythe client - the collection has only one shard, in which case the same uniqueconstraints are allowed as in the single instance case
- if the collection is sharded by exactly one other attribute than
_key
, then there can be a unique constraint on that attributeThese restrictions are imposed, because otherwise checking for a uniqueconstraint violation would involve checking with all shards, which would havea considerable performance impact.
Renaming
It is not possible to rename collections or views in a cluster.
AQL
The AQL syntax for single server and cluster is identical. However,there is one additional requirement (regarding with) and possibleperformance differences.
WITH
The WITH
keyword in AQL must be used to declare which collectionsare used in the AQL. For most AQL requires the required collectionscan be deduced from the query itself. However, with traversals this isnot possible, if edge collections are used directly. SeeAQL WITH operationfor details. The WITH
statement is not necessary when using named graphsfor the traversals.
As deadlocks cannot be detected in a cluster environment easily, theWITH
keyword is mandatory for this particular situation in a cluster,but not in a single server.
Performance
Performance of AQL queries can vary between single server and cluster.If a query can be distributed to many DBserver and executed inparallel then cluster performance can be better. For example, if youdo a distributed COLLECT
aggregation or a distributed FILTER
operation.
On the other hand, if you do a join or a traversal and the data is notlocal to one server then the performance can be worse compared to asingle server. This is especially true for traversal if the data isnot sharded with care. Our smart graph feature helps with this fortraversals.
Single document operations can have a higher throughput in cluster butwill also have a higher latency, due to an additional network hop fromcoordinator to dbserver.
Any operation that needs to find documents by anything else but theshard key will have to fan out to all shards, so it will be a lotslower than when referring to the documents using the shardkey. Optimized lookups by shard key can only be used for equalitylookups, e.g. not for range lookups.
Memory usage
Some query results must be built up in memory on a coordinator, forexample if a dataset needs to be sorted on the fly. This can relativelyeasily overwhelm a coordinator if the dataset is sharded across multipledbservers. Use indexes and streaming cursors (>= 3.4) to circumvent thisproblem.
Transactions
Using a single instance of ArangoDB, multi-document / multi-collectionqueries are guaranteed to be fully ACID. This is more than many otherNoSQL database systems support. In cluster mode, single-documentoperations are also fully ACID. Multi-document / multi-collectionqueries in a cluster are not ACID, which is equally the case forcompeting database systems. See Transactionsfor details.
Batch operations for multiple documents in the same collection are onlyfully transactional in a single instance.
Smart graphs
In smart graphs there are restrictions on the values of the _key
attributes. Essentially, the _key
attribute values for vertices mustbe prefixed with the string value of the smart graph attribute and acolon. A similar restriction applies for the edges.
Foxx
Foxx apps run on the coordinators of a cluster. Since coordinators arestateless, one must not use regular file accesses in Foxx apps in acluster.
Agency
A cluster deployment needs a central, RAFT-based key/value store called“the agency” to keep the current cluster configuration and managefailover. Being RAFT-based, this is a real-time system. If your serversrunning the agency instances (typically three or five) receive too muchload, the RAFT protocol stops working and the whole stability of thecluster is endangered. If you foresee this problem, run the agencyinstances on separate nodes. All this is not necessary in a singleserver deployment.
Dump/Restore
At the time of this writing, the arangodump
utility in a clustercannot guarantee a consistent snapshot across multiple shards or evenmultiple collections. This is in line with most other current NoSQLdatabase systems. We are working on a consistent snapshot andincremental backup capability for 3.5. In a single server, arangodump
produces a consistent snapshot.