Reducing the Memory Footprint of ArangoDB servers
ArangoDB’s memory usage can be restricted and the CPU utilization be reducedby different configuration options:
- storage engine (this tutorial focuses on the RocksDB engine)
- edge cache
- server statistics
- background threads
- V8 (JavaScript features)
- operating system / memory allocator (Linux)
There are settings to make it run on systems with very limited resources, butthey may also be interesting for your development machine if you want to make itless taxing for the hardware and do not work with much data. For productionenvironments, we recommend to use less restrictive settings, tobenchmark your setup and fine-tunethe settings for maximal performance.
Let us assume our test system is a big server with many cores and a lot ofmemory. However, we intend to run other services on this machine as well.Therefore we want to restrict the memory usage of ArangoDB. By default, ArangoDBin version 3.4 tries to use as much memory as possible. Using memory accessesinstead of disk accesses is faster and in the database business performancerules. ArangoDB comes with a default configuration with that in mind. Butsometimes being a little less grabby on system resources may still be fastenough, for example if your working data set is not huge. The goal is to reducethe overall memory footprint.
There are the following big areas, which might eat up memory:
- RocksDB
- WAL (Write Ahead Log)
- Write Buffers
WAL & Write Buffers
RocksDB writes intomemory buffers mapped to on-disk blocksfirst. At some point, the memory buffers will be full and have to be writtento disk. In order to support high write loads, RocksDB might open a lot of thesememory buffers.
Under normal write load, the write buffers will use less than 1 GByte of memory.If you are tight on memory, or your usage pattern does not require this, you canreduce these RocksDB settings:
--rocksdb.max-total-wal-size 1024000
--rocksdb.write-buffer-size 2048000
--rocksdb.max-write-buffer-number 2
--rocksdb.total-write-buffer-size 81920000
--rocksdb.dynamic-level-bytes false
Above settings will
- restrict the number of outstanding memory buffers
- limit the memory usage to around 100 MByte
During import or updates, the memory consumption may still grow bigger. On theother hand, these restrictions will have an impact on the maximal writeperformance. You should not go below above numbers.
Read-Buffers
--rocksdb.block-cache-size 2560000
--rocksdb.enforce-block-cache-size-limit true
These settings are the counterpart of the settings from the previous section.As soon as the memory buffers have been persisted to disk, answering readqueries implies to read them back into memory. The above option will limit thenumber of cached buffers to a few megabytes. If possible, this setting should beconfigured as large as the hot-set size of your dataset.
These restrictions may have an impact on query performance.
Edge-Cache
--cache.size 10485760
This option limits the ArangoDB edge cache to 10MB. If you do not have a graph use-case and do not use edge collections, it ispossible to use the minimum without a performance impact. In general, thisshould correspond to the size of the hot-set.
In addition to all buffers, a query will use additional memory during itsexecution, to process your data and build up your result set. This memory isused during the query execution only and will be released afterwards, incontrast to the held memory for buffers.
Query Memory Usage
By default, queries will build up their full results in memory. While you canfetch the results batch by batch by using a cursor, every query needs to computethe entire result first before you can retrieve the first batch. The serveralso needs to hold the results in memory until the corresponding cursor is fullyconsumed or times out. Building up the full results reduces the time the serverhas to work with collections at the cost of main memory.
In ArangoDB version 3.4 we introducedstreaming cursors withsomewhat inverted properties: less peak memory usage, longer access to thecollections. Streaming is possible on document level, which means that it can notbe applied to all query parts. For example, a MERGE() of all results of asubquery can not be streamed (the result of the operation has to be built up fully).Nonetheless, the surrounding query may be eligible for streaming.
Aside from streaming cursors, ArangoDB offers the possibility to specify amemory limit which a query should not exceed. If it does, the query will beaborted. Memory statistics are checked between execution blocks, whichcorrespond to lines in the explain output. That means queries which requirefunctions may require more memory for intermediate processing, but this will notkill the query because the memory.
You can use LIMIT operations in AQL queries to reduce the number of documentsthat need to be inspected and processed. This is not always what happens underthe hood however. Other operations may lead to an intermediate result beingcomputed before any limit is applied. Recently, we added a new ability to theoptimizer: theSort-Limit Optimization in AQL.In short, a SORT combined with a LIMIT operation only keeps as many documentsin memory during sorting as the subsequent LIMIT requires. This optimizationis applied automatically beginning with ArangoDB v3.5.0.
Statistics
The server collectsstatistics regularly,which it shows you in the web interface. You will have a light query load evenif your application is idle because of the statistics. You can disable them ifdesired:
--server.statistics false
JavaScript & Foxx
JavaScript is executed in the ArangoDBprocess using the embedded V8 engine:
- Backend parts of the web interface
- Foxx Apps
- Foxx Queues
- GraphQL
- JavaScript-based transactions
- User-defined AQL functions
There are several V8 contexts for parallel execution. You can think of them asa thread pool. They are also called isolates. Each isolate has a heap of a fewgigabytes by default. You can restrict V8 if you use no or very littleJavaScript:
--javascript.v8-contexts 2
--javascript.v8-max-heap 512
This will limit the number of V8 isolates to two. All JavaScript relatedrequests will be queued up until one of the isolates becomes available for thenew task. It also restricts the heap size to 512 MByte, so that both V8 contextscombined can not use more than 1 GByte of memory in the worst case.
V8 for the Desperate
You should not use the following settings unless there are very good reasons,like a local development system on which performance is not critical or anembedded system with very limited hardware resources!
--javascript.v8-contexts 1
--javascript.v8-max-heap 256
You can reduce the memory usage of V8 to 256 MB and just one thread. There is achance that some operations will be aborted because they run out of memory, inthe web interface for instance. Also, JavaScript requests will be executed oneby one.
If you are very tight on memory, and you are sure that you do not need V8, youcan disable it completely:
--javascript.enabled false
--foxx.queues false
In consequence, the following features will not be available:
- Backend parts of the web interface
- Foxx Apps
- Foxx Queues
- GraphQL
- JavaScript-based transactions
- User-defined AQL functions
Note that JavaScript / V8 can be disabled for DB-Server and Agency nodes in acluster without these limitations. They apply to single server instances. Theyalso apply to Coordinator nodes, but you should not disable V8 on Coordinatorsbecause certain cluster operations depend on it.
CPU usage
We can not really reduce CPU usage, but the number of threads running in parallel.Again, you should not do this unless there are very good reasons, like anembedded system. Note that this will limit the performance for concurrentrequests, which may be okay for a local development system with you as only user.
The number of background threads can be limited in the following way:
--arangosearch.threads-limit 1
--rocksdb.max-background-jobs 4
--server.maintenance-threads 2
--server.maximal-threads 4
--server.minimal-threads 1
In general, the number of threads is selected to fit the machine. However, eachthread requires at least 8 MB of stack memory. By sacrificing some performancefor parallel execution it is possible to reduce this.
This option will make logging synchronous:
--log.force-direct true
This is not recommended unless you only log errors and warnings.
Examples
In general, you should adjust the read buffers and edge cache on a standardserver. If you have a graph use-case, you should go for a larger edge cache. Forexample, split the memory 50:50 between read buffers and edge cache. If youhave no edges then go for a minimal edge cache and use most of the memory forthe read buffers.
For example, if you have a machine with 40 GByte of memory and you want torestrict ArangoDB to 20 GB of that, use 10 GB for the edge cache and 10 GB forthe read buffers if you use graph features.
Please keep in mind that during query execution additional memory will be usedfor query results temporarily. If you are tight on memory, you may want to gofor 7 GB each instead.
If you have an embedded system or your development laptop, you can use all ofthe above settings to lower the memory footprint further. For normal operation,especially production, these settings are not recommended.
Linux System Configuration
The main deployment target for ArangoDB is Linux. As you have learned aboveArangoDB and its innards work a lot with memory. Thus its vital to know howArangoDB and the Linux kernel interact on that matter. The linux kernel offersseveral modes of how it will manage memory. You can influence this via the procfilesystem, the file /etc/sysctl.conf
or a file in /etc/sysctl.conf.d/
whichyour system will apply to the kernel settings at boot time. The settings asnamed below are intended for the sysctl infrastructure, meaning that they mapto the proc
filesystem as /proc/sys/vm/overcommit_memory
.
A vm.overcommit_memory
setting of 2 can cause issues in some environmentsin combination with the bundled memory allocator ArangoDB ships with (jemalloc).
The allocator demands consecutive blocks of memory from the kernel, which arealso mapped to on-disk blocks. This is done on behalf of the server process(arangod). The process may use some chunks of a block for a long time span, butothers only for a short while and therefore release the memory. It is then up tothe allocator to return the freed parts back to the kernel. Because it can onlygive back consecutive blocks of memory, it has to split the large block intomultiple small blocks and can then return the unused ones.
With an vm.overcommit_memory
kernel settings value of 2, the allocator mayhave trouble with splitting existing memory mappings, which makes the _number_of memory mappings of an arangod server process grow over time. This can lead tothe kernel refusing to hand out more memory to the arangod process, even if morephysical memory is available. The kernel will only grant up to vm.max_map_count
memory mappings to each process, which defaults to 65530 on many Linuxenvironments.
Another issue when running jemalloc with vm.overcommitmemory
set to 2 isthat for some workloads the amount of memory that the Linux kernel tracks as_committed memory also grows over time and never decreases. Eventually,arangod may not get any more memory simply because it reaches the configuredovercommit limit (physical RAM * overcommit_ratio
+ swap space).
The solution is tomodify the value of vm.overcommit_memory
from 2 to either 0 or 1. This will fix both of these problems.We still observe ever-increasing virtual memory consumption when usingjemalloc regardless of the overcommit setting, but in practice this should notcause any issues. 0 is the Linux kernel default and also the setting we recommend.
For the sake of completeness, let us also mention another way to address theproblem: use a different memory allocator. This requires to compile ArangoDBfrom the source code without jemalloc (-DUSE_JEMALLOC=Off
in the call to cmake).With the system’s libc allocator you should see quite stable memory usage. Wealso tried another allocator, precisely the one from libmusl
, and this alsoshows quite stable memory usage over time. What holds us back to change thebundled allocator are that it is a non-trivial change and because jemalloc hasvery nice performance characteristics for massively multi-threaded processesotherwise.
Testing the Effects of Reduced I/O Buffers
- 15:50 – Start bigger import
- 16:00 – Start writing documents of ~60 KB size one at a time
- 16:45 – Add similar second writer
- 16:55 – Restart ArangoDB with the RocksDB write buffer configuration suggested above
- 17:20 – Buffers are full, write performance drops
- 17:38 – WAL rotation
What you see in above performance graph are the consequences of restricting thewrite buffers. Until we reach a 90% fill rate of the write buffers the servercan almost follow the load pattern for a while at the cost of constantlyincreasing buffers. Once RocksDB reaches 90% buffer fill rate, it willsignificantly throttle the load to ~50%. This is expected according to theupstream documentation:
[…] a flush will be triggered […] if total mutable memtable size exceeds 90%of the limit. If the actual memory is over the limit, more aggressive flushmay also be triggered even if total mutable memtable size is below 90%.
Since we only measured the disk I/O bytes, we do not see that the document saveoperations also doubled in request time.