Chapter 11. Berkeley DB Data Store and Concurrent Data Store Applications
Concurrent Data Store introduction
It is often desirable to have concurrent read-write access to a database when there is no need for full recoverability or transaction semantics. For this class of applications, Berkeley DB provides interfaces supporting deadlock-free, multiple-reader/single writer access to the database. This means that at any instant in time, there may be either multiple readers accessing data or a single writer modifying data. The application is entirely unaware of which is happening, and Berkeley DB implements the necessary locking and blocking to ensure this behavior.
To create Berkeley DB Concurrent Data Store applications, you must first initialize an environment by calling DB_ENV->open(). You must specify the DB_INIT_CDB and DB_INIT_MPOOL flags to that method. It is an error to specify any of the other DB_ENV->open() subsystem or recovery configuration flags, for example, DB_INIT_LOCK, DB_INIT_TXN or DB_RECOVER All databases must, of course, be created in this environment by using the db_create() function or Db constructor, and specifying the environment as an argument.
Berkeley DB performs appropriate locking so that safe enforcement of the deadlock-free, multiple-reader/single-writer semantic is transparent to the application. However, a basic understanding of Berkeley DB Concurrent Data Store locking behavior is helpful when writing Berkeley DB Concurrent Data Store applications.
Berkeley DB Concurrent Data Store avoids deadlocks without the need for a deadlock detector by performing all locking on an entire database at once (or on an entire environment in the case of the DB_CDB_ALLDB flag), and by ensuring that at any given time only one thread of control is allowed to simultaneously hold a read (shared) lock and attempt to acquire a write (exclusive) lock.
All open Berkeley DB cursors hold a read lock, which serves as a guarantee that the database will not change beneath them; likewise, all non-cursor DB->get() operations temporarily acquire and release a read lock that is held during the actual traversal of the database. Because read locks will not conflict with each other, any number of cursors in any number of threads of control may be open simultaneously, and any number of DB->get() operations may be concurrently in progress.
To enforce the rule that only one thread of control at a time can attempt to upgrade a read lock to a write lock, however, Berkeley DB must forbid multiple cursors from attempting to write concurrently. This is done using the DB_WRITECURSOR flag to the DB->cursor() method. This is the only difference between access method calls in Berkeley DB Concurrent Data Store and in the other Berkeley DB products. The DB_WRITECURSOR flag causes the newly created cursor to be a “write” cursor; that is, a cursor capable of performing writes as well as reads. Only cursors thus created are permitted to perform write operations (either deletes or puts), and only one such cursor can exist at any given time.
Any attempt to create a second write cursor or to perform a non-cursor write operation while a write cursor is open will block until that write cursor is closed. Read cursors may open and perform reads without blocking while a write cursor is extant. However, any attempts to actually perform a write, either using the write cursor or directly using the DB->put() or DB->del() methods, will block until all read cursors are closed. This is how the multiple-reader/single-writer semantic is enforced, and prevents reads from seeing an inconsistent database state that may be an intermediate stage of a write operation.
By default, Berkeley DB Concurrent Data Store does locking on a per-database basis. For this reason, using cursors to access multiple databases in different orders in different threads or processes, or leaving cursors open on one database while accessing another database, can cause an application to hang. If this behavior is a requirement for the application, Berkeley DB should be configured to do locking on an environment-wide basis. See the DB_CDB_ALLDB flag of the DB_ENV->set_flags() method for more information.
With these behaviors, Berkeley DB can guarantee deadlock-free concurrent database access, so that multiple threads of control are free to perform reads and writes without needing to handle synchronization themselves or having to run a deadlock detector. Berkeley DB has no direct knowledge of which cursors belong to which threads, so some care must be taken to ensure that applications do not inadvertently block themselves, causing the application to hang and be unable to proceed.
As a consequence of the Berkeley DB Concurrent Data Store locking model, the following sequences of operations will cause a thread to block itself indefinitely:
- Keeping a cursor open while issuing a DB->put() or DB->del() access method call.
- Attempting to open a write cursor while another cursor is already being held open by the same thread of control. Note that it is correct operation for one thread of control to attempt to open a write cursor or to perform a non-cursor write (DB->put() or DB->del()) while a write cursor is already active in another thread. It is only a problem if these things are done within a single thread of control — in which case that thread will block and never be able to release the lock that is blocking it.
- Not testing Berkeley DB error return codes (if any cursor operation returns an unexpected error, that cursor must still be closed).
If the application needs to open multiple cursors in a single thread to perform an operation, it can indicate to Berkeley DB that the cursor locks should not block each other by creating a Berkeley DB Concurrent Data Store group, using DB_ENV->cdsgroup_begin(). This creates a locker ID that is shared by all cursors opened in the group.
Berkeley DB Concurrent Data Store groups use a TXN handle to indicate the shared locker ID to Berkeley DB calls, and call DB_TXN->commit() to end the group. This is a convenient way to pass the locked ID to the calls where it is needed, but should not be confused with the real transactional semantics provided by Berkeley DB Transactional Data Store. In particular, Berkeley DB Concurrent Data Store groups do not provide any abort or recovery facilities, and have no impact on durability of operations.