RocksDB engine options
RocksDB is a highly configurable key-value store used to power our RocksDBstorage engine. Most of the options on this page are pass-through options to theunderlying RocksDB instance, and we change very few of their default settings.
Depending on the storage engine you have chosen the availabilityand the scope of these options changes.
In case you have chosen mmfiles
some of the following options apply to persistent indexes.In case of rocksdb
it will apply to all data stored as well as indexes.
Pass-through options
—rocksdb.wal-directory
Absolute path for the RocksDB WAL files. If left empty, this will use a subdirectoryjournals
inside the data directory.
Write buffers
—rocksdb.write-buffer-size
The amount of data to build up in each in-memory buffer (backed by a log file)before closing the buffer and queuing it to be flushed into standard storage.Default: 64MiB. Larger values may improve performance, especially for bulkloads.
—rocksdb.max-write-buffer-number
The maximum number of write buffers that built up in memory. If this number isreached before the buffers can be flushed, writes will be slowed or stalled.Default: 2.
—rocksdb.total-write-buffer-size
Introduced in: v3.3.20
The total amount of data to build up in all in-memory buffers (backed by logfiles, in bytes). This option, together with the block cache size configuration option, can be used to limit memory usage.
If set to 0, the memory usage is not limited. This is the default setting in 3.3. Please note that the default setting may be adjusted in future versions of ArangoDB.
If set to a value greater than 0, this will cap the memory usage for write buffers,but may have an effect on write performance.
—rocksdb.min-write-buffer-number-to-merge
Minimum number of write buffers that will be merged together when flushing tonormal storage. Default: 1.
—rocksdb.max-total-wal-size
Maximum total size of WAL files that, when reached, will force a flush of allcolumn families whose data is backed by the oldest WAL files. Setting thisto a low value will trigger regular flushing of column family data from memtables, so that WAL files can be moved to the archive.Setting this to a high value will avoid regular flushing but may prevent WALfiles from being moved to the archive and being removed.
—rocksdb.delayed-write-rate
Limited write rate to DB (in bytes per second) if we are writing to the lastin-memory buffer allowed and we allow more than 3 buffers. Default: 16MiB/s.
LSM tree structure
—rocksdb.num-levels
The number of levels for the database in the LSM tree. Default: 7.
—rocksdb.num-uncompressed-levels
The number of levels that do not use compression. The default value is 2.Levels above this number will use Snappy compression to reduce the diskspace requirements for storing data in these levels.
—rocksdb.dynamic-level-bytes
If true, the amount of data in each level of the LSM tree is determineddynamically so as to minimize the space amplification; otherwise, the levelsizes are fixed. The dynamic sizing allows RocksDB to maintain a well-structuredLSM tree regardless of total data size. Default: true.
—rocksdb.max-bytes-for-level-base
The maximum total data size in bytes in level-1 of the LSM tree. Only effectiveif —rocksdb.dynamic-level-bytes
is false. Default: 256MiB.
—rocksdb.max-bytes-for-level-multiplier
The maximum total data size in bytes for level L of the LSM tree can becalculated as max-bytes-for-level-base * (max-bytes-for-level-multiplier ^(L-1))
. Only effective if —rocksdb.dynamic-level-bytes
is false. Default:10.
—rocksdb.level0-compaction-trigger
Compaction of level-0 to level-1 is triggered when this many files exist inlevel-0. Setting this to a higher number may help bulk writes at the expense ofslowing down reads. Default: 2.
—rocksdb.level0-slowdown-trigger
When this many files accumulate in level-0, writes will be slowed down to—rocksdb.delayed-write-rate
to allow compaction to catch up. Default: 20.
—rocksdb.level0-stop-trigger
When this many files accumulate in level-0, writes will be stopped to allowcompaction to catch up. Default: 36.
File I/O
—rocksdb.compaction-read-ahead-size
If non-zero, we perform bigger reads when doing compaction. If you’re runningRocksDB on spinning disks, you should set this to at least 2MiB. That wayRocksDB’s compaction is doing sequential instead of random reads. Default: 0.
—rocksdb.use-direct-reads
Only meaningful on Linux. If set, use O_DIRECT
for reading files. Default:false.
—rocksdb.use-direct-io-for-flush-and-compaction
Only meaningful on Linux. If set, use O_DIRECT
for writing files. Default: false.
—rocksdb.use-fsync
If set, issue an fsync
call when writing to disk (set to false to issuefdatasync
only. Default: false.
Background tasks
—rocksdb.max-background-jobs
Maximum number of concurrent background compaction jobs, submitted to the lowpriority thread pool. Default: number of processors.
—rocksdb.num-threads-priority-high
Number of threads for high priority operations (e.g. flush). We recommendsetting this equal to max-background-flushes
. Default: number of processors / 2.
—rocksdb.num-threads-priority-low
Number of threads for low priority operations (e.g. compaction). Default: number of processors / 2.
Caching
—rocksdb.block-cache-size
This is the maximum size of the block cache in bytes. Increasing this may improveperformance. If there is less than 4GiB of RAM on the system, the default valueis 256MiB. If there is more, the default is (system RAM size - 2GiB) * 0.3
.
—rocksdb.enforce-block-cache-size-limit
This option is ignored because it could cause errors in RocksDB 5.6.Consider upgrading to ArangoDB v3.4 or later with a more recent RocksDB bundled.
Whether or not the maximum size of the RocksDB block cache is strictly enforced.This option can be set to limit the memory usage of the block cache to at most thespecified size. If then inserting a data block into the cache would exceed the cache’s capacity, the data block will not be inserted. If the flag is not set,a data block may still get inserted into the cache. It is evicted later, but thecache may temporarily grow beyond its capacity limit.
—rocksdb.block-cache-shard-bits
The number of bits used to shard the block cache to allow concurrent operations.To keep individual shards at a reasonable size (i.e. at least 512KB), keep thisvalue to at most block-cache-shard-bits / 512KB
. Default: block-cache-size /2^19
.
—rocksdb.table-block-size
Approximate size of user data (in bytes) packed per block for uncompressed data.
—rocksdb.recycle-log-file-num
Number of log files to keep around for recycling. Default: 0.
Miscellaneous
—rocksdb.optimize-filters-for-hits
This flag specifies that the implementation should optimize the filters mainlyfor cases where keys are found rather than also optimize for the case wherekeys are not. This would be used in cases where the application knows thatthere are very few misses or the performance in the case of misses is not asimportant. Default: false.
—rocksdb.wal-recovery-skip-corrupted
If true, skip corrupted records in WAL recovery. Default: false.
Non-Pass-Through Options
—rocksdb.wal-file-timeout
Timeout after which unused WAL files are deleted (in seconds). Default: 10.0s.
Data of ongoing transactions is stored in RAM. Transactions that get too big(in terms of number of operations involved or the total size of data created ormodified by the transaction) will be committed automatically. Effectively thismeans that big user transactions are split into multiple smaller RocksDBtransactions that are committed individually. The entire user transaction willnot necessarily have ACID properties in this case.
The following options can be used to control the RAM usage and automaticintermediate commits for the RocksDB engine:
—rocksdb.max-transaction-size
Transaction size limit (in bytes). Transactions store all keys and values inRAM, so large transactions run the risk of causing out-of-memory sitations.This setting allows you to ensure that does not happen by limiting the size ofany individual transaction. Transactions whose operations would consume moreRAM than this threshold value will abort automatically with error 32 (“resourcelimit exceeded”).
—rocksdb.intermediate-commit-size
If the size of all operations in a transaction reaches this threshold, thetransaction is committed automatically and a new transaction is started. Thevalue is specified in bytes.
—rocksdb.intermediate-commit-count
If the number of operations in a transaction reaches this value, the transactionis committed automatically and a new transaction is started.
—rocksdb.throttle
If enabled, throttles the ingest rate of writes if necessary to reduce chances of compactions getting too far behind and blocking incoming writes. This optionis true
by default.
—rocksdb.sync-interval
The interval (in milliseconds) that ArangoDB will use to automaticallysynchronize data in RocksDB’s write-ahead logs to disk. Automatic syncs willonly be performed for not-yet synchronized data, and only for operations thathave been executed without the waitForSync attribute.
The default sync interval in 3.3 is 0, meaning that automatic backgroundsyncing is turned off. Automatic syncing was added in the middle of the ArangoDB3.3 release cycle, so it is opt-in. The default sync interval will change to _100_milliseconds in ArangoDB 3.4 however.
Note: this option is not supported on Windows platforms. Setting the option toa value greater 0 will produce a startup warning.
—rocksdb.use-file-logging
When set to true, enables writing of RocksDB’s own informational LOG files into RocksDB’s database directory.
This option is turned off by default, but can be enabled for debugging RocksDBinternals and performance.
—rocksdb.debug-logging
When set to true, enables verbose logging of RocksDB’s actions into the logfilewritten by ArangoDB (if option —rocksdb.use-file-logging
is off) or RocksDB’sown log (if option —rocksdb.use-file-logging
is on).
This option is turned off by default, but can be enabled for debugging RocksDBinternals and performance.