Write Batch With Index

The WBWI (Write Batch With Index) encapsulates a WriteBatch and an Index into that WriteBatch. The index in use is a Skip List. The purpose of the WBWI is to sit above the DB, and offer the same basic operations as the DB, i.e. Writes - Put, Delete, and Merge, and Reads - Get, and newIterator.

Write operations on the WBWI are serialized into the WriteBatch (of the WBWI) rather than acting directly on the DB. The WriteBatch can later be written atomically to the DB by calling db.write(wbwi).

Read operations can either be solely against the WriteBatch (e.g. GetFromBatch), or they can be read-through operations. A read-through operation, (e.g. GetFromBatchAndDB), first tries to read from the WriteBatch, if there is no updated entry in the WriteBatch then it subsequently reads from the DB.

The WriteBatch itself is a thin wrapper around a std::string field called rep. The WriteBatch can be thought of as append-only. Each new write operation simply appends to the WriteBatch rep, see: https://github.com/facebook/rocksdb/blob/main/db/write_batch.cc#L10

The purpose of the Index within the WBWI is to track which keys that have been written to the WriteBatch. The index is a mapping from the key to the offset of the serialized operation within the WriteBatch (i.e. the position within the rep string).

This index is used for read-operations against the WriteBatchWithIndex to determine whether a write operation on the WriteBatch has previously occurred for the key that is being written. If there is an entry in the Index the offset into the WriteBatch is used to read the value, if not, then the database can be consulted. The Index allows us to avoid having to scan the WriteBatch when performing Get or Seek (for iterator) operations.

Transaction Building Block

The WBWI can be used as a component if one wishes to build Transaction Semantics atop RocksDB. The WBWI by itself isolates the Write Path to a local in-memory store and allows you to RYOW (Read-Your-Own-Writes) before data is atomically written to the database.

It is a key component in RocksDB’s Pessimistic and Optimistic Transaction utility classes.

Modes of Operation

The WBWI has two modes of operation, 1) where all write operations to the same key can be retrieved by iteration, and 2) where only the latest write operation for a key can be retrieved by iteration. Regardless of the mode, all write operations themselves are preserved in the append-only WriteBatch, the mode only controls what is retrievable from the WriteBatch when RYOW.

The mode of operation is controlled by use of the WriteBatchWithIndex constructor argument named overwrite_key. The default mode of operation is to allow all write operations to be retrieved (i.e. overwrite_key=false).

Not all WBWI functions are supported in both modes, see the table below:

WBWI Functionoverwrite_key=falseoverwrite_key=true
PutYesYes
DeleteYesYes
DeleteRangeNoNo
MergeYesYes
GetFromBatchYesYes
GetFromBatchAndDBYesYes
NewIteratorOne-or-more entries per Key in the batchOne entry per Key in the batch
NewIteratorWithBaseYesYes

For GetFromBatch and GetFromBatchAndDB, if transaction contains Merge, then they return Status::MergeInProgress if no base value is found, and return merged result if base value is found.

Index Entries when overwrite_key=false

TODO

Index Entries when overwrite_key=true

TODO