Features and Improvements in ArangoDB 3.0

The following list shows in detail which features have been added or improved inArangoDB 3.0. ArangoDB 3.0 also contains several bugfixes that are not listedhere.

Internal data format changes

ArangoDB now uses VelocyPack forstoring documents, query results and temporarily computed values. Using a singledata format removed the need for some data conversions in the core that slowedoperations down previously.

The VelocyPack format is also quite compact, and reduces storage spacerequirements for “small” values such as boolean, integers, short strings. Thiscan speed up several operations inside AQL queries.

VelocyPack document entries stored on disk are also self-contained, in the sensethat each stored document will contain all of its data type and attribute namedescriptions. While this may require a bit more space for storing the documents,it removes the overhead of fetching attribute names and document layout fromshared structures as in previous versions of ArangoDB. It also simplifies thecode paths for storing and reading documents.

AQL improvements

Syntax improvements

LIKE string-comparison operator

AQL now provides a LIKE operator and can be used to compare strings like this,for example inside filter conditions:

  1. value LIKE search

This change makes LIKE an AQL keyword. Using LIKE as an attribute or collection name in AQL thus requires quoting the name from now on.

The LIKE operator is currently implemented by calling the already existing AQLfunction LIKE, which also remains operational in 3.0. Use the LIKE functionin case you want to search case-insensitive (optional parameter), as the LIKEoperator always compares case-sensitive.

AQL array comparison operators

All AQL comparison operators now also exist in an array variant. In thearray variant, the operator is preceded with one of the keywords ALL, ANYor NONE. Using one of these keywords changes the operator behavior toexecute the comparison operation for all, any, or none of its left handargument values. It is therefore expected that the left hand argumentof an array operator is an array.

Examples:

  1. [ 1, 2, 3 ] ALL IN [ 2, 3, 4 ] // false
  2. [ 1, 2, 3 ] ALL IN [ 1, 2, 3 ] // true
  3. [ 1, 2, 3 ] NONE IN [ 3 ] // false
  4. [ 1, 2, 3 ] NONE IN [ 23, 42 ] // true
  5. [ 1, 2, 3 ] ANY IN [ 4, 5, 6 ] // false
  6. [ 1, 2, 3 ] ANY IN [ 1, 42 ] // true
  7. [ 1, 2, 3 ] ANY == 2 // true
  8. [ 1, 2, 3 ] ANY == 4 // false
  9. [ 1, 2, 3 ] ANY > 0 // true
  10. [ 1, 2, 3 ] ANY <= 1 // true
  11. [ 1, 2, 3 ] NONE < 99 // false
  12. [ 1, 2, 3 ] NONE > 10 // true
  13. [ 1, 2, 3 ] ALL > 2 // false
  14. [ 1, 2, 3 ] ALL > 0 // true
  15. [ 1, 2, 3 ] ALL >= 3 // false
  16. ["foo", "bar"] ALL != "moo" // true
  17. ["foo", "bar"] NONE == "bar" // false
  18. ["foo", "bar"] ANY == "foo" // true

Regular expression string-comparison operators

AQL now supports the operators =~ and !~ for testing strings against regularexpressions. =~ tests if a string value matches a regular expression, and !~ tests if a string value does not match a regular expression.

The two operators expect their left-hand operands to be strings, and their right-hand operands to be strings containing valid regular expressions as specified below.

The regular expressions may consist of literal characters and the following characters and sequences:

  • . – the dot matches any single character except line terminators.To include line terminators, use [\s\S] instead to simulate . with DOTALL flag.
  • \d – matches a single digit, equivalent to [0-9]
  • \s – matches a single whitespace character
  • \S – matches a single non-whitespace character
  • \t – matches a tab character
  • \r – matches a carriage return
  • \n – matches a line-feed character
  • [xyz] – set of characters. matches any of the enclosed characters (i.e.x, y or z in this case
  • [^xyz] – negated set of characters. matches any other character than theenclosed ones (i.e. anything but x, y or z in this case)
  • [x-z] – range of characters. Matches any of the characters in the specified range, e.g. [0-9A-F] to match any character in0123456789ABCDEF
  • [^x-z] – negated range of characters. Matches any other character than theones specified in the range
  • (xyz) – defines and matches a pattern group
  • (x|y) – matches either x or y
  • ^ – matches the beginning of the string (e.g. ^xyz)
  • $ – matches the end of the string (e.g. xyz$)

Note that the characters ., *, ?, [, ], (, ), {, }, ^, and $ have a special meaning in regular expressions and may need to be escaped using a backslash (\). A literal backslash should also be escapedusing another backslash, i.e. \\.

Characters and sequences may optionally be repeated using the followingquantifiers:

  • x* – matches zero or more occurrences of x
  • x+ – matches one or more occurrences of x
  • x? – matches one or zero occurrences of x
  • x{y} – matches exactly y occurrences of x
  • x{y,z} – matches between y and z occurrences of x
  • x{y,} – matches at least y occurrences of x

Enclosing identifiers in forward ticks

AQL identifiers can now optionally be enclosed in forward ticks in addition to usingbackward ticks. This allows convenient writing of AQL queries in JavaScript template strings (which are delimited with backticks themselves), e.g.

  1. var q = `FOR doc IN ´collection´ RETURN doc.´name´`;

Functions added

The following AQL functions have been added in 3.0:

  • REGEX_TEST(value, regex): tests whether the string value matches the regular expressionspecified in regex. Returns true if it matches, and false otherwise.

The syntax for regular expressions is the same as for the regular expression operators=~ and !~.

  • HASH(value): Calculates a hash value for value. value is not required to be a string, but can have any data type. The calculated hash value will take the data typeof value into account, so for example the number 1 and the string “1” will have different hash values. For arrays the hash values will be creared if the arrays containexactly the same values (including value types) in the same order. For objects the same hash values will be created if the objects have exactly the same attribute names and values (including value types). The order in which attributes appear inside objectsis not important for hashing.The hash value returned by this function is a number. The hash algorithm is not guaranteedto remain the same in future versions of ArangoDB. The hash values should therefore beused only for temporary calculations, e.g. to compare if two documents are the same, orfor grouping values in queries.

  • TYPENAME(value): Returns the data type name of value. The data type name canbe either null, bool, number, string, array or object.

  • LOG(value): Returns the natural logarithm of value. The base is Euler’s constant (2.71828…).

  • LOG2(value): Returns the base 2 logarithm of value.

  • LOG10(value): Returns the base 10 logarithm of value.

  • EXP(value): Returns Euler’s constant (2.71828…) raised to the power of value.

  • EXP2(value): Returns 2 raised to the power of value.

  • SIN(value): Returns the sine of value.

  • COS(value): Returns the cosine of value.

  • TAN(value): Returns the tangent of value.

  • ASIN(value): Returns the arcsine of value.

  • ACOS(value): Returns the arccosine of value.

  • ATAN(value): Returns the arctangent of value.

  • ATAN2(y, x): Returns the arctangent of the quotient of y and x.

  • RADIANS(value): Returns the angle converted from degrees to radians.

  • DEGREES(value): Returns the angle converted from radians to degrees.

Optimizer improvements

“inline-subqueries” rule

The AQL optimizer rule “inline-subqueries” has been added. This rule can pullout certain subqueries that are used as an operand to a FOR loop one levelhigher, eliminating the subquery completely. This reduces complexity of thequery’s execution plan and will likely enable further optimizations. Forexample, the query

  1. FOR i IN (
  2. FOR j IN [1,2,3]
  3. RETURN j
  4. )
  5. RETURN i

will be transformed by the rule to:

  1. FOR i IN [1,2,3]
  2. RETURN i

The query

  1. FOR name IN (
  2. FOR doc IN _users
  3. FILTER doc.status == 1
  4. RETURN doc.name
  5. )
  6. LIMIT 2
  7. RETURN name

will be transformed into

  1. FOR tmp IN _users
  2. FILTER tmp.status == 1
  3. LIMIT 2
  4. RETURN tmp.name

The rule will only fire when the subquery is used as an operand to a FOR loop,and if the subquery does not contain a COLLECT with an INTO variable.

“remove-unnecessary-calculations” rule

The AQL optimizer rule “remove-unnecessary-calculations” now fires in more casesthan in previous versions. This rule removes calculations from execution plans,and by having less calculations done, a query may execute faster or requiresless memory.

The rule will now remove calculations that are used exactly once in otherexpressions (e.g. LET a = doc RETURN a.value) and calculations, or calculations that are just references to other variables (e.g. LET a = b).

“optimize-traversals” rule

The AQL optimizer rule “merge-traversal-filter” was renamed to “optimize-traversals”.The rule will remove unused edge and path result variables from the traversal in case they are specified in the FOR section of the traversal, but not referenced later in the query. This saves constructing edges and paths results that are not used later.

AQL now uses VelocyPack internally for storing intermediate values. For many value typesit can now get away without extra memory allocations and less internal conversions.Values can be passed into internal AQL functions without copying them. This can lead toreduced query execution times for queries that use C++-based AQL functions.

“replace-or-with-in” and “use-index-for-sort” rules

These rules now fire in some additional cases, which allows simplifying index lookupconditions and removing SortNodes from execution plans.

Cluster state management

The cluster’s internal state information is now also managed by ArangoDB instances.Earlier versions relied on third party software being installed for the storing thecluster state.The state is managed by dedicated ArangoDB instances, which can be started in a specialagency mode. These instances can operate in a distributed fashion. They willautomatically elect one of them to become their leader, being responsible for storingthe state changes sent from servers in the cluster. The other instances will automaticallyfollow the leader and will transparently stand in should it become unavailable.The Agency instances are also self-organizing: they will continuously probe eachother and re-elect leaders. The communication between the Agency instances use theconsensus-based RAFT protocol.

The operations for storing and retrieving cluster state information are now much lessexpensive from an ArangoDB cluster node perspective, which in turn allows for fastercluster operations that need to fetch or update the overall cluster state.

_from and _to attributes of edges are updatable and usable in indexes

In ArangoDB prior to 3.0 the attributes _from and _to of edges were treatedspecially when loading or storing edges. That special handling led to these attributesbeing not as flexible as regular document attributes. For example, the _from and_to attribute values of an existing edge could not be updated once the edge wascreated. Now this is possible via the single-document APIs and via AQL.

Additionally, the _from and _to attributes could not be indexed inuser-defined indexes, e.g. to make each combination of _from and _to unique.Finally, as _from and _to referenced the linked collections by collection idand not by collection name, their meaning became unclear once a referenced collectionwas dropped. The collection id stored in edges then became unusable, and whenaccessing such edge the collection name part of it was always translated to _undefined.

In ArangoDB 3.0, the _from and _to values of edges are saved as regular strings.This allows using _from and _to in user-defined indexes. Additionally, this allowsto update the _from and _to values of existing edges. Furthermore, collectionsreferenced by _from and _to values may be dropped and re-created later. Any_from and _to values of edges pointing to such dropped collection are unaffectedby the drop operation now.

Unified APIs for CRUD operations

The CRUD APIs for documents and edge have been unified. Edges can now be insertedand modified via the same APIs as documents. _from and _to attribute values canbe passed as regular document attributes now:

  1. db.myedges.insert({ _from: "myvertices/some", _to: "myvertices/other", ... });

Passing _from and _to separately as it was required in earlier versions is notnecessary anymore but will still work:

  1. db.myedges.insert("myvertices/some", "myvertices/other", { ... });

The CRUD operations now also support batch variants that works on arrays ofdocuments/edges, e.g.

  1. db.myedges.insert([
  2. { _from: "myvertices/some", _to: "myvertices/other", ... },
  3. { _from: "myvertices/who", _to: "myvertices/friend", ... },
  4. { _from: "myvertices/one", _to: "myvertices/two", ... },
  5. ]);

The batch variants are also available in ArangoDB’s HTTP API. They can be used tomore efficiently carry out operations with multiple documents than their single-documentequivalents, which required one HTTP request per operation. With the batch operations,the HTTP request/response overhead can be amortized across multiple operations.

Persistent indexes

ArangoDB 3.0 provides an experimental persistent index feature. Persistent indexes storethe index values on disk instead of in-memory only. This means the indexes do not needto be rebuilt in-memory when a collection is loaded or reloaded, which should improvecollection loading times.

The persistent indexes in ArangoDB are based on the RocksDB engine.To create a persistent index for a collection, create an index of type “rocksdb” asfollows:

  1. db.mycollection.ensureIndex({ type: "rocksdb", fields: [ "fieldname" ]});

The persistent indexes are sorted, so they allow equality lookups and range queries.Note that the feature is still highly experimental and has some known deficiencies. It will be finalized until the release of the 3.0 stable version.

Upgraded V8 version

The V8 engine that is used inside ArangoDB to execute JavaScript code has been upgraded fromversion 4.3.61 to 5.0.71.39. The new version makes several more ES6 features available bydefault, including

  • arrow functions
  • computed property names
  • rest parameters
  • array destructuring
  • numeric and object literals

Web Admin Interface

The ArangoDB 3.0 web interface is significantly improved. It now comes with a moreresponsive design, making it easier to use on different devices. Navigation and menushave been simplified, and related items have been regrouped to stay closer togetherand allow tighter workflows.

The AQL query editor is now much easier to use. Multiple queries can be started and tracked in parallel, while results of earlier queries are still preserved. Queries still running can be canceled directly from the editor. The AQL query editor now allows the usage of bind parameters too, and provides a helper for finding collection names, AQL function names and keywords quickly.

The web interface now keeps track of whether the server is offline and of which server-side operations have been started and are still running. It now remains usable while suchlonger-running operations are ongoing. It also keeps more state about user’s choices (e.g. windows sizes, whether the tree or the code view was last used in the document editor).

Cluster statistics are now integrated into the web interface as well. Additionally, amenu item “Help us” has been added to easily provide the ArangoDB team feedback aboutthe product.

The frontend may now be mounted behind a reverse proxy on a different path. For this to workthe proxy should send a X-Script-Name header containing the path.

A backend configuration for haproxy might look like this:

  1. reqadd X-Script-Name:\ /arangodb

The frontend will recognize the subpath and produce appropriate links. ArangoDB will onlyaccept paths from trusted frontend proxies. Trusted proxies may be added on startup:

  1. --frontend.proxy-request-check true --frontend.trusted-proxy 192.168.1.117

—frontend.trusted-proxy may be any address or netmask.

To disable the check and blindly accept any x-script-name set —frontend.proxy-request-checkto false.

Foxx improvements

The Foxx framework has been completely rewritten for 3.0 with a new, simpler and more familiar API. The most notable changes are:

  • Legacy mode for 2.8 services

Stuck with old code? You can continue using your 2.8-compatible Foxx services with 3.0 by adding "engines": {"arangodb": "^2.8.0"} (or similar version ranges that exclude 3.0 and up) to the service manifest.

  • No more global variables and magical comments

The applicationContext is now module.context. Instead of magical comments just use the summary and description methods to document your routes.

  • Repository and Model have been removed

Instead of repositories just use ArangoDB collections directly. For validation simply use the joi schemas (but wrapped in joi.object()) that previously lived inside the model. Collections and queries return plain JavaScript objects.

  • Controllers have been replaced with nestable routers

Create routers with require('@arangodb/foxx/router')(), attach them to your service with module.context.use(router). Because routers are no longer mounted automagically, you can export and import them like any other object. Use router.use('/path', subRouter) to nest routers as deeply as you want.

  • Routes can be named and reversed

No more memorizing URLs: add a name to your route likerouter.get('/hello/:name', function () {…}, 'hello') and redirect to the full URL with res.redirect(req.resolve('hello', {name: 'world'})).

  • Simpler express-like middleware

If you already know express, this should be familiar. Here’s a request logger in three lines of code:

  1. router.use(function (req, res, next) {
  2. var start = Date.now();
  3. try {next();}
  4. finally {console.log(`${req.method} ${req.url} ${res.statusCode} ${Date.now() - start}ms`);}
  5. });
  • Sessions and auth without dependencies

To make it easier to get started, the functionality previously provided by the simple-auth, oauth2, sessions-local and sessions-jwt services have been moved into Foxx as the @arangodb/foxx/auth, @arangodb/foxx/oauth2 and @arangodb/foxx/sessions modules.

Logging

ArangoDB’s logging is now grouped into topics. The log verbosity and output files canbe adjusted per log topic. For example

  1. --log.level startup=trace --log.level queries=trace --log.level info

will log messages concerning startup at trace level, AQL queries at trace level andeverything else at info level. —log.level can be specified multiple times at startup,for as many topics as needed.

Some relevant log topics available in 3.0 are:

  • collector: information about the WAL collector’s state
  • compactor: information about the collection datafile compactor
  • datafiles: datafile-related operations
  • mmap: information about memory-mapping operations (including msync)
  • queries: executed AQL queries, slow queries
  • replication: replication-related info
  • requests: HTTP requests
  • startup: information about server startup and shutdown
  • threads: information about threads

This also allows directing log output to different files based on topics. Forexample, to log all AQL queries to a file “queries.log” one can use the options:

  1. --log.level queries=trace --log.output queries=file:///path/to/queries.log

To additionally log HTTP request to a file named “requests.log” add the options:

  1. --log.level requests=info --log.output requests=file:///path/to/requests.log

Build system

ArangoDB now uses the cross-platform build system CMake for all its builds.Previous versions used two different build systems, making development andcontributions harder than necessary. Now the build system is unified, andall targets (Linux, Windows, macOS) are built from the same set of buildinstructions.

Documentation

The documentation has been enhanced and re-organized to be more intuitive.

A new introduction for beginners should bring you up to speed with ArangoDBin less than an hour. Additional topics have been introduced and will beextended with upcoming releases.

The topics AQL and HTTP API are now separated from the manual for bettersearchability and less confusion. A version switcher makes it easier tojump to the version of the docs you are interested in.