Access method FAQ
Is a Berkeley DB database the same as a “table”?
Yes; “tables” are databases, “rows” are key/data pairs, and “columns” are application-encapsulated fields within a data item (to which Berkeley DB does not directly provide access).
I’m getting an error return in my application, but I can’t figure out what the library is complaining about.
See DB_ENV->set_errcall(), DB_ENV->set_errfile() and DB->set_errfile() for ways to get additional information about error returns from Berkeley DB.
Are Berkeley DB databases portable between architectures with different integer sizes and different byte orders ?
Yes. Specifically, databases can be moved between 32- and 64-bit machines, as well as between little- and big-endian machines. See Selecting a byte order for more information.
I’m seeing database corruption when creating multiple databases in a single physical file.
This problem is usually the result of DB handles not sharing an underlying database environment. See Opening multiple databases in a single file for more information.
I’m using integers as keys for a Btree database, and even though the key/data pairs are entered in sorted order, the page-fill factor is low.
This is usually the result of using integer keys on little-endian architectures such as the x86. Berkeley DB sorts keys as byte strings, and little-endian integers don’t sort well when viewed as byte strings. For example, take the numbers 254 through 257. Their byte patterns on a little-endian system are:
254 fe 0 0 0
255 ff 0 0 0
256 0 1 0 0
257 1 1 0 0
If you treat them as strings, then they sort badly:
256
257
254
255
On a big-endian system, their byte patterns are:
254 0 0 0 fe
255 0 0 0 ff
256 0 0 1 0
257 0 0 1 1
and so, if you treat them as strings they sort nicely. Which means, if you use steadily increasing integers as keys on a big-endian system Berkeley DB behaves well and you get compact trees, but on a little-endian system Berkeley DB produces much less compact trees. To avoid this problem, you may want to convert the keys to flat text or big-endian representations, or provide your own Btree comparison
Is there any way to avoid double buffering in the Berkeley DB system?
Some operating systems provide the support necessary to avoid double buffering. On those systems, you can attempt to avoid double buffering by specifying the DB_DIRECT_DB and DB_LOG_DIRECT flags. Where that support is not available, or where experimentation with it shows that is does not improve performance, there are a few other things you can do to address this issue:
First, the Berkeley DB cache size can be explicitly set. Rather than allocate additional space in the Berkeley DB cache to cover unexpectedly heavy load or large table sizes, double buffering may suggest you size the cache to function well under normal conditions, and then depend on the file buffer cache to cover abnormal conditions. Obviously, this is a trade-off, as Berkeley DB may not then perform as well as usual under abnormal conditions.
Second, depending on the underlying operating system you’re using, you may be able to alter the amount of physical memory devoted to the system’s file buffer cache. Altering this type of resource configuration may require appropriate privileges, or even operating system reboots and/or rebuilds, on some systems.
Microsoft Windows provides a
SetSystemFileCacheSize
function which can be used to limit its cache size; without that limit the Windows file cache can grow to nearly fill physical memory, forcing the working sets of processes out to disk, reducing system performance.Third, changing the size of the Berkeley DB environment regions can change the amount of space the operating system makes available for the file buffer cache, and it’s often worth considering exactly how the operating system is dividing up its available memory. Further, moving the Berkeley DB database environment regions from filesystem backed memory into system memory (or heap memory), can often make additional system memory available for the file buffer cache, especially on systems without a unified buffer cache and VM system.
I’m seeing database corruption when I run out of disk space.
Berkeley DB can continue to run when when out-of-disk-space errors occur, but it requires the application to be transaction protected. Applications which do not enclose update operations in transactions cannot recover from out-of-disk-space errors, and the result of running out of disk space may be database corruption.
How can I associate application information with a DB or DB_ENV handle?
In the C API, the DB and DB_ENV structures each contain an “app_private” field intended to be used to reference application-specific information. See the db_create() and db_env_create() documentation for more information.
In the C++ or Java APIs, the easiest way to associate application-specific data with a handle is to subclass the Db or DbEnv, for example subclassing Db to get MyDb. Objects of type MyDb will still have the Berkeley DB API methods available on them, and you can put any extra data or methods you want into the MyDb class. If you are using “callback” APIs that take Db or DbEnv arguments (for example, DB->set_bt_compare()) these will always be called with the Db or DbEnv objects you create. So if you always use MyDb objects, you will be able to take the first argument to the callback function and cast it to a MyDb (in C++, cast it to (MyDb*)). That will allow you to access your data members or methods.