Features and Improvements in ArangoDB 3.6
The following list shows in detail which features have been added or improved inArangoDB 3.6. ArangoDB 3.6 also contains several bug fixes that are not listedhere.
AQL
Early pruning of non-matching documents
Previously, AQL queries with filter conditions that could not be satisfied byany index required all documents to be copied from the storage engine into theAQL scope in order to be fed into the filter.
An example query execution plan for such query from ArangoDB 3.5 looks like this:
Query String (75 chars, cacheable: true):
FOR doc IN test FILTER doc.value1 > 9 && doc.value2 == 'test854' RETURN doc
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 100000 - FOR doc IN test /* full collection scan */
3 CalculationNode 100000 - LET #1 = ((doc.`value1` > 9) && (doc.`value2` == "test854"))
4 FilterNode 100000 - FILTER #1
5 ReturnNode 100000 - RETURN doc
ArangoDB 3.6 adds an optimizer rule move-filters-into-enumerate
which allowsapplying the filter condition directly while scanning the documents, so copyingof any documents that don’t match the filter condition can be avoided.
The query execution plan for the above query from 3.6 with that optimizer ruleapplied looks as follows:
Query String (75 chars, cacheable: true):
FOR doc IN test FILTER doc.value1 > 9 && doc.value2 == 'test854' RETURN doc
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 100000 - FOR doc IN test /* full collection scan */ FILTER ((doc.`value1` > 9) && (doc.`value2` == "test854")) /* early pruning */
5 ReturnNode 100000 - RETURN doc
Note that in this execution plan the scanning and filtering are combined in onenode, so the copying of all non-matching documents from the storage engine intothe AQL scope is completely avoided.
This optimization will be beneficial if the filter condition is very selectiveand will filter out many documents, and if documents are large. In this case alot of copying will be avoided.
The optimizer rule also works if an index is used, but there are also filterconditions that cannot be satisfied by the index alone. Here is a 3.5 queryexecution plan for a query using a filter on an indexed value plus a filter ona non-indexed value:
Query String (101 chars, cacheable: true):
FOR doc IN test FILTER doc.value1 > 10000 && doc.value1 < 30000 && doc.value2 == 'test854' RETURN
doc
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 26666 - FOR doc IN test /* hash index scan */
7 CalculationNode 26666 - LET #1 = (doc.`value2` == "test854")
4 FilterNode 26666 - FILTER #1
5 ReturnNode 26666 - RETURN doc
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
6 idx_1649353982658740224 hash test false false 100.00 % [ `value1` ] ((doc.`value1` > 10000) && (doc.`value1` < 30000))
In 3.6, the same query will be executed using a combined index scan & filteringapproach, again avoiding any copies of non-matching documents:
Query String (101 chars, cacheable: true):
FOR doc IN test FILTER doc.value1 > 10000 && doc.value1 < 30000 && doc.value2 == 'test854' RETURN
doc
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 26666 - FOR doc IN test /* hash index scan */ FILTER (doc.`value2` == "test854") /* early pruning */
5 ReturnNode 26666 - RETURN doc
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
6 idx_1649353982658740224 hash test false false 100.00 % [ `value1` ] ((doc.`value1` > 10000) && (doc.`value1` < 30000))
Subquery Splicing Optimization
In earlier versions of ArangoDB, on every execution of a subquery the followinghappened for each input row:
- The subquery tree issues one initializeCursor cascade through all nodes
- The subquery node pulls rows until the subquery node is empty for this input
On subqueries with many results per input row (10000 or more) the above stepsdid not contribute significantly to query execution time. On subqueries withfew results per input, there was a serious performance impact.
Subquery splicing inlines the execution of subqueries using an optimizer rulecalled splice-subqueries
. Only suitable queries can be spliced.A subquery becomes unsuitable if it contains a LIMIT
node or aCOLLECT WITH COUNT INTO …
construct (but not due to aCOLLECT var = <expr> WITH COUNT INTO …
). A subquery also becomesunsuitable if it is contained in a (sub)query containing unsuitable partsafter the subquery.
Consider the following query to illustrate the difference.
FOR x IN c1
LET firstJoin = (
FOR y IN c2
FILTER y._id == x.c2_id
LIMIT 1
RETURN y
)
LET secondJoin = (
FOR z IN c3
FILTER z.value == x.value
RETURN z
)
RETURN { x, firstJoin, secondJoin }
The execution plan without subquery splicing:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 0 - FOR x IN c1 /* full collection scan */
9 SubqueryNode 0 - LET firstJoin = ... /* subquery */
3 SingletonNode 1 * ROOT
18 IndexNode 0 - FOR y IN c2 /* primary index scan */
7 LimitNode 0 - LIMIT 0, 1
8 ReturnNode 0 - RETURN y
15 SubqueryNode 0 - LET secondJoin = ... /* subquery */
10 SingletonNode 1 * ROOT
11 EnumerateCollectionNode 0 - FOR z IN c3 /* full collection scan */
12 CalculationNode 0 - LET #11 = (z.`value` == x.`value`) /* simple expression */ /* collections used: z : c3, x : c1 */
13 FilterNode 0 - FILTER #11
14 ReturnNode 0 - RETURN z
16 CalculationNode 0 - LET #13 = { "x" : x, "firstJoin" : firstJoin, "secondJoin" : secondJoin } /* simple expression */ /* collections used: x : c1 */
17 ReturnNode 0 - RETURN #13
Optimization rules applied:
Id RuleName
1 use-indexes
2 remove-filter-covered-by-index
3 remove-unnecessary-calculations-2
Note in particular the SubqueryNode
s, followed by a SingletonNode
inboth cases.
When using the optimizer rule splice-subqueries
the plan is as follows:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 0 - FOR x IN c1 /* full collection scan */
9 SubqueryNode 0 - LET firstJoin = ... /* subquery */
3 SingletonNode 1 * ROOT
18 IndexNode 0 - FOR y IN c2 /* primary index scan */
7 LimitNode 0 - LIMIT 0, 1
8 ReturnNode 0 - RETURN y
19 SubqueryStartNode 0 - LET secondJoin = ( /* subquery begin */
11 EnumerateCollectionNode 0 - FOR z IN c3 /* full collection scan */
12 CalculationNode 0 - LET #11 = (z.`value` == x.`value`) /* simple expression */ /* collections used: z : c3, x : c1 */
13 FilterNode 0 - FILTER #11
20 SubqueryEndNode 0 - ) /* subquery end */
16 CalculationNode 0 - LET #13 = { "x" : x, "firstJoin" : firstJoin, "secondJoin" : secondJoin } /* simple expression */ /* collections used: x : c1 */
17 ReturnNode 0 - RETURN #13
Optimization rules applied:
Id RuleName
1 use-indexes
2 remove-filter-covered-by-index
3 remove-unnecessary-calculations-2
4 splice-subqueries
The first subquery is unsuitable for the optimization because it containsa LIMIT
statement and is therefore not spliced. The second subquery issuitable and hence is spliced – which one can tell from the different nodetype SubqueryStartNode
(beginning of spliced subquery). Note how it isnot followed by a SingletonNode
. The end of the spliced subquery ismarked by a SubqueryEndNode
.
Late document materialization (RocksDB)
With the late document materialization optimization ArangoDB tries toread only documents that are absolutely necessary to compute the query result,reducing load to the storage engine. This is only supported for the RocksDBstorage engine.
In 3.6 the optimization can only be applied to queries containing aSORT
+LIMIT
combination, e.g:
FOR d IN documentSource // documentSource can be either a collection or an ArangoSearch View
SORT d.foo
LIMIT 100
RETURN d
For the collection case the optimization is possible if and only if:
- there is an index of type
primary
,hash
,skiplist
,persistent
oredge
picked by the optimizer - all attribute accesses can be covered by indexed attributes
// Given we have a hash index on attributes [ "foo", "bar", "baz" ]
FOR d IN myCollection
FILTER d.foo == "someValue" // hash index will be picked to optimize filtering
SORT d.baz DESC // field "baz" will be read from index
LIMIT 100 // only 100 documents will be materialized
RETURN d
For the ArangoSearch View case the optimization is possible if and only if:
- all attribute accesses can be covered by stored attributes(e.g. using
primarySort
)
FOR d IN myView
SEARCH d.foo == "baz"
SORT BM25(d) DESC // BM25(d) will be evaluated by the View node above
LIMIT 100 // only 100 documents will be materialized
RETURN d
// Given we have primary sort on fields ["foo", "bar"]
FOR d IN myView
SEARCH d.foo == "baz"
SORT d.bar DESC // field "bar" will be read from the View
LIMIT 100 // only 100 documents will be materialized
RETURN d
The respective optimizer rules are called late-document-materialization
(collection source) and late-document-materialization-arangosearch
(ArangoSearch View source). If applied, you will find MaterializeNode
sin execution plans.
Parallelization of cluster AQL queries
ArangoDB 3.6 can parallelize work in many cluster AQL queries when there aremultiple DB-Servers involved. The parallelization is done in theGatherNode, which then can send parallel cluster-internal requests to theDB-Servers attached. The DB-Servers can then work fully parallelfor the different shards involved.
When parallelization is used, one or multiple _GatherNode_s in a query’sexecution plan will be tagged with parallel
as follows:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
2 EnumerateCollectionNode DBS 1000000 - FOR doc IN test /* full collection scan, 5 shard(s) */
6 RemoteNode COOR 1000000 - REMOTE
7 GatherNode COOR 1000000 - GATHER /* parallel */
3 ReturnNode COOR 1000000 - RETURN doc
Parallelization is currently restricted to certain types and parts of queries.GatherNode_s will go into parallel mode only if the DB-Server query partabove it (in terms of query execution plan layout) is a terminal part of thequery. To trigger the optimization, there must not be other nodes of type _ScatterNode, GatherNode or DistributeNode present in the query.
Please note that the parallelization of AQL execution may lead to a differentresource usage pattern for eligible AQL queries in the cluster. In isolation,queries are expected to complete faster with parallelization than when executingtheir work serially on all involved DB-Servers. However, working onmultiple DB-Servers in parallel may also mean that more work than beforeis happening at the very same time. If this is not desired because of resourcescarcity, there are options to control the parallelization:
The startup option —query.parallelize-gather-writes
can be used to controlwhether eligible write operation parts will be parallelized. This optiondefaults to true
, meaning that eligible write operations are also parallelizedby default. This can be turned off so that potential I/O overuse can be avoidedfor write operations when used together with a high replication factor.
Additionally, the startup option —query.optimizer-rules
can be used toglobally toggle the usage of certain optimizer rules for all queries.By default, all optimizations are turned on. However, specific optimizationscan be turned off using the option.
For example, to turn off the parallelization entirely (including parallelgather writes), one can use the following configuration:
--query.optimizer-rules "-parallelize-gather"
This toggle works for any other non-mandatory optimizer rules as well.To specify multiple optimizer rules, the option can be used multiple times, e.g.
--query.optimizer-rules "-parallelize-gather" --query.optimizer-rules "-splice-subqueries"
You can overrule which optimizer rules to use or not use on a per-query basisstill. —query.optimizer-rules
merely defines a default. However,—query.parallelize-gather-writes false
turns off parallel gather writescompletely and it cannot be re-enabled for individual queries.
Optimizations for simple UPDATE and REPLACE queries
Cluster query execution plans for simple UPDATE
and REPLACE
queries thatmodify multiple documents and do not use LIMIT
are now more efficient asseveral steps were removed. The existing optimizer ruleundistribute-remove-after-enum-coll
has been extended to cover these casestoo, in case the collection is sharded by _key
and the UPDATE
/REPLACE
operation is using the full document or the _key
attribute to find it.
For example, a query such as:
FOR doc IN test UPDATE doc WITH { updated: true } IN test
… is executed as follows in 3.5:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
3 CalculationNode DBS 1 - LET #3 = { "updated" : true }
2 EnumerateCollectionNode DBS 1000000 - FOR doc IN test /* full collection scan, 5 shard(s) */
11 RemoteNode COOR 1000000 - REMOTE
12 GatherNode COOR 1000000 - GATHER
5 DistributeNode COOR 1000000 - DISTRIBUTE /* create keys: false, variable: doc */
6 RemoteNode DBS 1000000 - REMOTE
4 UpdateNode DBS 0 - UPDATE doc WITH #3 IN test
7 RemoteNode COOR 0 - REMOTE
8 GatherNode COOR 0 - GATHER
In 3.6 the execution plan is streamlined to just:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
3 CalculationNode DBS 1 - LET #3 = { "updated" : true }
13 IndexNode DBS 1000000 - FOR doc IN test /* primary index scan, index only, projections: `_key`, 5 shard(s) */
4 UpdateNode DBS 0 - UPDATE doc WITH #3 IN test
7 RemoteNode COOR 0 - REMOTE
8 GatherNode COOR 0 - GATHER /* parallel */
As can be seen above, the benefit of applying the optimization is that the extracommunication between the Coordinator and DB-Server is removed. This willmean less cluster-internal traffic and thus can result in faster execution.As an extra benefit, the optimization will also make the affected querieseligible for parallel execution. It is only applied in cluster deployments.
The optimization will also work when a filter is involved:
Query String (79 chars, cacheable: false):
FOR doc IN test FILTER doc.value == 4 UPDATE doc WITH { updated: true } IN test
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
5 CalculationNode DBS 1 - LET #5 = { "updated" : true }
2 EnumerateCollectionNode DBS 1000000 - FOR doc IN test /* full collection scan, projections: `_key`, `value`, 5 shard(s) */
3 CalculationNode DBS 1000000 - LET #3 = (doc.`value` == 4)
4 FilterNode DBS 1000000 - FILTER #3
6 UpdateNode DBS 0 - UPDATE doc WITH #5 IN test
9 RemoteNode COOR 0 - REMOTE
10 GatherNode COOR 0 - GATHER
AQL Date functionality
AQL now enforces a valid date range for working with date/time in AQL.The valid date ranges for any AQL date/time function are:
- for string date/time values:
"0000-01-01T00:00:00.000Z"
(including) up to"9999-12-31T23:59:59.999Z"
(including) - for numeric date/time values: -62167219200000 (including) up to 253402300799999(including). These values are the numeric equivalents of
"0000-01-01T00:00:00.000Z"
and"9999-12-31T23:59:59.999Z"
.
Any date/time values outside the given range that are passed into an AQL datefunction will make the function return null
and trigger a warning in thequery, which can optionally be escalated to an error and stop the query.
Any date/time operations that produce date/time outside the valid ranges statedabove will make the function return null
and trigger a warning too.An example for this is:
DATE_SUBTRACT("2018-08-22T10:49:00+02:00", 100000, "years")
The performance of AQL date operations that work ondate strings has been improvedcompared to previous versions.
Finally, ArangoDB 3.6 provides a new AQL functionDATE_ROUND()
to bin a date/time into a set of equal-distance buckets.
Miscellaneous AQL changes
In addition, ArangoDB 3.6 provides the following new AQL functionality:
a function
GEO_AREA()
for area calculations(also added to v3.5.1)a query option
maxRuntime
to restrict the execution to a given time in seconds.Also see HTTP API.a startup option
—query.optimizer-rules
to turn certain AQL query optimizerrules off (or on) by default. This can be used to turn off certain optimizationsthat would otherwise lead to undesired changes in server resource usage patterns.
ArangoSearch
Analyzers
- Added UTF-8 support and ability to mark beginning/end of the sequence tothe
ngram
Analyzer type.
The following optional properties can be provided for an ngram
Analyzerdefinition:
startMarker
:<string>
, default: ““this value will be prepended to n-grams at the beginning of input sequenceendMarker
:<string>
, default: ““this value will be appended to n-grams at the beginning of input sequencestreamType
:"binary"|"utf8"
, default: “binary”type of the input stream (support for UTF-8 is new)
- Added edge n-gram support to the
text
Analyzer type.The input gets tokenized as usual, but then n-grams are generated from eachtoken. UTF-8 encoding is assumed (whereas thengram
Analyzer has aconfigurable stream type and defaults to binary).
The following optional properties can be provided for a text
Analyzer definition:
edgeNgram
(object, optional):min
(number, optional): minimal n-gram lengthmax
(number, optional): maximal n-gram lengthpreserveOriginal
(boolean, optional): include the original tokenif its length is less than min or greater than max
Dynamic search expressions with arrays
ArangoSearch now accepts SEARCH expressionswith array comparison operators in the form of:
<array> [ ALL|ANY|NONE ] [ <=|<|==|!=|>|>=|IN ] doc.<attribute>
i.e. the left-hand side operand is always an array, which can be dynamic.
LET tokens = TOKENS("some input", "text_en") // ["some", "input"]
FOR doc IN myView SEARCH tokens ALL IN doc.title RETURN doc // dynamic conjunction
FOR doc IN myView SEARCH tokens ANY IN doc.title RETURN doc // dynamic disjunction
FOR doc IN myView SEARCH tokens NONE IN doc.title RETURN doc // dynamic negation
FOR doc IN myView SEARCH tokens ALL > doc.title RETURN doc // dynamic conjunction with comparison
FOR doc IN myView SEARCH tokens ANY <= doc.title RETURN doc // dynamic disjunction with comparison
In addition, both the TOKENS()
and the PHRASE()
functions wereextended with array support for convenience.
TOKENS() accepts recursive arrays ofstrings as the first argument:
TOKENS("quick brown fox", "text_en") // [ "quick", "brown", "fox" ]
TOKENS(["quick brown", "fox"], "text_en") // [ ["quick", "brown"], ["fox"] ]
TOKENS(["quick brown", ["fox"]], "text_en") // [ ["quick", "brown"], [["fox"]] ]
In most cases you will want to flatten the resulting array for further usage,because nested arrays are not accepted in SEARCH
statements such as<array> ALL IN doc.<attribute>
:
LET tokens = TOKENS(["quick brown", ["fox"]], "text_en") // [ ["quick", "brown"], [["fox"]] ]
LET tokens_flat = FLATTEN(tokens, 2) // [ "quick", "brown", "fox" ]
FOR doc IN myView SEARCH ANALYZER(tokens_flat ALL IN doc.title, "text_en") RETURN doc
PHRASE() accepts an array as thesecond argument:
FOR doc IN myView SEARCH PHRASE(doc.title, ["quick brown fox"], "text_en") RETURN doc
FOR doc IN myView SEARCH PHRASE(doc.title, ["quick", "brown", "fox"], "text_en") RETURN doc
LET tokens = TOKENS("quick brown fox", "text_en") // ["quick", "brown", "fox"]
FOR doc IN myView SEARCH PHRASE(doc.title, tokens, "text_en") RETURN doc
It is equivalent to the more cumbersome and static form:
FOR doc IN myView SEARCH PHRASE(doc.title, "quick", 0, "brown", 0, "fox", "text_en") RETURN doc
You can optionally specify the number of skipTokens in the array form beforeevery string element:
FOR doc IN myView SEARCH PHRASE(doc.title, ["quick", 1, "fox", "jumps"], "text_en") RETURN doc
It is the same as the following:
FOR doc IN myView SEARCH PHRASE(doc.title, "quick", 1, "fox", 0, "jumps", "text_en") RETURN doc
SmartJoins and Views
ArangoSearch Views are now eligible for SmartJoins in AQL,provided that their underlying collections are eligible too.
OneShard
This option is only available in theEnterprise Edition,also available as managed service.
Not all use cases require horizontal scalability. In such cases, a OneSharddeployment offers a practicable solution that enables significant performanceimprovements by massively reducing cluster-internal communication.
A database created with OneShard enabled is limited to a single DB-Server nodebut still replicated synchronously to ensure resilience. This configurationallows running transactions with ACID guarantees on shard leaders.
This setup is highly recommended for most graph use cases and join-heavyqueries.
Unlike a (flexibly) sharded cluster, where the Coordinator distributes accessto shards across different DB-Server nodes, collects and processes partialresults, the Coordinator in a OneShard setup moves the query execution directlyto the respective DB-Server for local query execution. The Coordinator receivesonly the final result. This can drastically reduce resource consumption andcommunication effort for the Coordinator.
An entire cluster, selected databases or selected collections can be madeeligible for the OneShard optimization. SeeOneShard cluster architecturefor details and usage examples.
HTTP API
The following APIs have been expanded / changed:
- Database creation API,HTTP route
POST /_api/database
The database creation API now handles the replicationFactor
, writeConcern
and sharding
attributes. All these attributes are optional, and onlymeaningful in a cluster.
The values provided for the attributes replicationFactor
and writeConcern
will be used as default values when creating collections in that database,allowing to omit these attributes when creating collections. However, thevalues set here are just defaults for new collections in the database.The values can still be adjusted per collection when creating new collectionsin that database via the web UI, the arangosh or drivers.
In an Enterprise Edition cluster, the sharding
attribute can be given avalue of "single"
, which will make all new collections in that database usethe same shard distribution and use one shard by default (OneShardconfiguration). This can still be overridden by setting the values ofnumberOfShards
and distributeShardsLike
when creating new collections inthat database via the web UI, arangosh or drivers (unless the startup option—cluster.force-one-shard
is enabled).
- Database properties API,HTTP route
GET /_api/database/current
The database properties endpoint returns the new additional attributesreplicationFactor
, writeConcern
and sharding
in a cluster.A description of these attributes can be found above.
- Collection / Graph APIs,HTTP routes
POST /_api/collection
,GET /_api/collection/{collection-name}/properties
and various/_api/gharial/*
endpoints
minReplicationFactor
has been renamed to writeConcern
for consistency.The old attribute name is still accepted and returned for compatibility.
- Hot Backup API,HTTP route
POST /_admin/backup/create
New attribute force
, see Hot Backup below.
- New Metrics API,HTTP route
GET /_admin/metrics
Returns the instance’s current metrics in Prometheus format. The returneddocument collects all instance metrics, which are measured at any giventime and exposes them for collection by Prometheus.
The new endpoint can be used instead of the additional toolarangodb-exporter.
Web interface
The web interface now shows the shards of all collections (including systemcollections) in the shard distribution view. Displaying system collections hereis necessary to access the prototype collections of a collection sharded viadistributeShardsLike
in case the prototype is a system collection, and theprototype collection shall be moved to another server using the web interface.
The web interface now also allows setting a default replication factor when acreating a new database. This default replication factor will be used for allcollections created in the new database, unless explicitly overridden.
Startup options
Metrics API option
The new option—server.enable-metrics-api
allows you to disable the metrics API by settingit to false
, which is otherwise turned on by default.
OneShard cluster option
The option—cluster.force-one-shard
enables the new OneShard feature for the entirecluster deployment. It forces the cluster into creating all future collectionswith only a single shard and using the same DB-Server as these collections’shards leader. All collections created this way will be eligible for specificAQL query optimizations that can improve query performance and provide advancedtransactional guarantees.
Cluster upgrade option
The new option —cluster.upgrade
toggles the cluster upgrade mode for Coordinators. It supports the followingvalues:
auto
:perform a cluster upgrade and shut down afterwards if the startup option—database.auto-upgrade
is set to true. Otherwise, don’t perform an upgrade.disable
:never perform a cluster upgrade, regardless of the value of—database.auto-upgrade
.force
:always perform a cluster upgrade and shut down, regardless of the value of—database.auto-upgrade
.online
:always perform a cluster upgrade but don’t shut down afterwards
The default value is auto
. The option only affects Coordinators. It does nothave any affect on single servers, Agents or DB-Servers.
Other cluster options
The following options have been added:
—cluster.max-replication-factor
: maximum replication factor for newcollections. A value of0
means that there is no restriction.The default value is10
.—cluster.min-replication-factor
: minimum replication factor for newcollections. The default value is1
. This option can be used to prevent thecreation of collections that do not have any or enough replicas.—cluster.write-concern
: default write concern value used for newcollections. This option controls the number of replicas that mustsuccessfully acknowledge writes to a collection. If any write operation getsless acknowledgements than configured here, the collection will go intoread-only mode until the configured number of replicas are available again.The default value is1
, meaning that writes to just the leader aresufficient. To ensure that there is at least one extra copy (i.e. onefollower), set this option to2
.—cluster.max-number-of-shards
: maximum number of shards allowed for newcollections. A value of0
means that there is no restriction.The default value is1000
.
Note that the above options only have an effect when set for Coordinators, andonly for collections that are created after the options have been set. They donot affect already existing collections.
Furthermore, the following network related optionshave been added:
—network.idle-connection-ttl
: default time-to-live for idle cluster-internalconnections (in milliseconds). The default value is60000
.—network.io-threads
: number of I/O threads for cluster-internal networkrequests. The default value is2
.—network.max-open-connections
: maximum number of open network connectionsfor cluster-internal requests. The default value is1024
.—network.verify-hosts
: if set totrue
, this will verify peer certificatesfor cluster-internal requests when TLS is used. The default value isfalse
.
RocksDB exclusive writes option
The new option —rocksdb.exclusive-writes
allows to make all writes to theRocksDB storage exclusive and therefore avoids write-write conflicts.This option was introduced to open a way to upgrade from MMFiles to RocksDBstorage engine without modifying client application code. Otherwise it shouldbest be avoided as the use of exclusive locks on collections will introduce anoticeable throughput penalty.
Note that the MMFiles engine is deprecatedfrom v3.6.0 on and will be removed in a future release. So will be this option,which is a stopgap measure only.
AQL options
The new startup option —query.optimizer-rules
can be used to to selectivelyenable or disable AQL query optimizer rules by default. The option can bespecified multiple times, and takes the same input as the query option of thesame name.
For example, to turn off the rule use-indexes-for-sort
, use
--query.optimizer-rules "-use-indexes-for-sort"
The purpose of this startup optionis to be able to enable potential future experimental optimizer rules, whichmay be shipped in a disabled-by-default state.
Hot Backup
- Force Backup
When creating backups there is an additional option —force
forarangobackup and in the HTTP API.This option aborts ongoing write transactions to obtain the global lockfor creating the backup. Most likely this is not what you want to dobecause it will abort valid ongoing write operations, but it makes sure thatbackups can be acquired more quickly. The force flag currently only abortsStream Transactions but noJavaScript Transactions.
- View Data
HotBackup now includes View data. Previously the Views had to be rebuiltafter a restore. Now the Views are available immediately.
TLS v1.3
Added support for TLS 1.3 for the arangod serverand the client tools (also added to v3.5.1).
The arangod server can be started with option —ssl.protocol 6
to make it requireTLS 1.3 for incoming client connections. The server can be started with option—ssl.protocol 5
to make it require TLS 1.2, as in previous versions of arangod.
The default TLS protocol for the arangod server is now generic TLS(—ssl.protocol 9
), which will allow the negotiation of the TLS version betweenthe client and the server.
All client tools also support TLS 1.3, by using the —ssl.protocol 6
option wheninvoking them. The client tools will use TLS 1.2 by default, in order to becompatible with older versions of ArangoDB that may be contacted by these tools.
To configure the TLS version for arangod instances started by the ArangoDB starter,one can use the —all.ssl.protocol=VALUE
startup option for the ArangoDB starter,where VALUE is one of the following:
- 4 = TLSv1
- 5 = TLSv1.2
- 6 = TLSv1.3
- 9 = generic TLS
Note: TLS v1.3 support has been added in ArangoDB v3.5.1 already, but the default TLSversion in ArangoDB 3.5 was still TLS v1.2. ArangoDB v3.6 uses “generic TLS” as itsdefault TLS version, which will allows clients to negotiate the TLS version with theserver, dynamically choosing the highest mutually supported version of TLS.
Miscellaneous
Remove operations for documents in the cluster will now use an optimization,if all sharding keys are specified. Should the sharding keys not match thevalues in the actual document, a not found error will be returned.
Collection namesin ArangoDB can now be up to 256 characters long, instead of 64 characters inprevious versions.
Disallow using
_id
or_rev
as shard keys in clustered collections.
Using these attributes for sharding was not supported before, but didn’t triggerany errors. Instead, collections were created and silently using _key
asthe shard key, without making the caller aware of that an unsupported shardkey was used.
- Make the scheduler enforce the configured queue lengths. The values of theoptions
—server.scheduler-queue-size
,—server.prio1-size
and—server.maximal-queue-size
will now be honored and not exceeded.
The default queue sizes in the scheduler for requests buffering havealso been changed as follows:
request type before now
request type before now
high priority 128 4096medium priority 1048576 4096low priority 4096 4096
The queue sizes can still be adjusted at server start using the above-mentioned startup options.
Internal
Release packages for Linux are now built using inter-proceduraloptimizations (IPO).
We have moved from C++14 to C++17, which allows us to use some of thesimplifications, features and guarantees that this standard has in stock.To compile ArangoDB 3.6 from source, a compiler that supports C++17 is nowrequired.
The bundled JEMalloc memory allocator used in ArangoDB release packages hasbeen upgraded from version 5.2.0 to version 5.2.1.
The bundled version of the Boost library has been upgraded from 1.69.0 to1.71.0.
The bundled version of xxhash has been upgraded from 0.5.1 to 0.7.2.