1.1. Technical Overview
1.1.1. Document Storage
A CouchDB server hosts named databases, which store documents.Each document is uniquely named in the database, and CouchDB providesa RESTfulHTTP API for reading and updating (add, edit,delete) database documents.
Documents are the primary unit of data in CouchDB and consist of any numberof fields and attachments. Documents also include metadata that’s maintainedby the database system. Document fields are uniquely named and contain valuesof varying types (text, number, boolean, lists, etc),and there is no set limit to text size or element count.
The CouchDB document update model is lockless and optimistic.Document edits are made by client applications loading documents,applying changes, and saving them back to the database. If another clientediting the same document saves their changes first, the client gets an editconflict error on save. To resolve the update conflict, the latest documentversion can be opened, the edits reapplied and the update tried again.
Document updates (add, edit, delete) are all or nothing, either succeedingentirely or failing completely. The database never contains partially savedor edited documents.
1.1.2. ACID Properties
The CouchDB file layout and commitment system features all Atomic ConsistentIsolated Durable (ACID) properties. On-disk, CouchDB never overwritescommitted data or associated structures, ensuring the database file is alwaysin a consistent state. This is a “crash-only” design where the CouchDBserver does not go through a shut down process, it’s simply terminated.
Document updates (add, edit, delete) are serialized, except for binary blobswhich are written concurrently. Database readers are never locked out andnever have to wait on writers or other readers. Any number of clients can bereading documents without being locked out or interrupted by concurrentupdates, even on the same document. CouchDB read operations use aMulti-Version Concurrency Control (MVCC) model where each client sees aconsistent snapshot of the database from the beginning to the end of the readoperation.
Documents are indexed in B-trees by their name (DocID) and a Sequence ID.Each update to a database instance generates a new sequential number.Sequence IDs are used later for incrementally finding changes in a database.These B-tree indexes are updated simultaneously when documents are saved ordeleted. The index updates always occur at the end of the file (append-onlyupdates).
Documents have the advantage of data being already conveniently packaged forstorage rather than split out across numerous tables and rows in mostdatabase systems. When documents are committed to disk, the document fieldsand metadata are packed into buffers, sequentially one document after another(helpful later for efficient building of views).
When CouchDB documents are updated, all data and associated indexes areflushed to disk and the transactional commit always leaves the databasein a completely consistent state. Commits occur in two steps:
- All document data and associated index updates are synchronously flushedto disk.
- The updated database header is written in two consecutive, identical chunksto make up the first 4k of the file, and then synchronously flushed to disk.
In the event of an OS crash or power failure during step 1,the partially flushed updates are simply forgotten on restart. If such acrash happens during step 2 (committing the header), a surviving copy of theprevious identical headers will remain, ensuring coherency of all previouslycommitted data. Excepting the header area, consistency checks or fix-upsafter a crash or a power failure are never necessary.
1.1.3. Compaction
Wasted space is recovered by occasional compaction. On schedule, or when thedatabase file exceeds a certain amount of wasted space, the compaction processclones all the active data to a new file and then discards the old file.The database remains completely online the entire time and all updates andreads are allowed to complete successfully. The old database file is deletedonly when all the data has been copied and all users transitioned to the newfile.
1.1.4. Views
ACID properties only deal with storage and updates, but we also need the abilityto show our data in interesting and useful ways. Unlike SQL databases wheredata must be carefully decomposed into tables, data in CouchDB is stored insemi-structured documents. CouchDB documents are flexible and each has itsown implicit structure, which alleviates the most difficult problems andpitfalls of bi-directionally replicating table schemas and their contained data.
But beyond acting as a fancy file server, a simple document model for datastorage and sharing is too simple to build real applications on – it simplydoesn’t do enough of the things we want and expect. We want to slice and diceand see our data in many different ways. What is needed is a way to filter,organize and report on data that hasn’t been decomposed into tables.
See also
1.1.4.1. View Model
To address this problem of adding structure back to unstructured andsemi-structured data, CouchDB integrates a view model. Views are the methodof aggregating and reporting on the documents in a database, and are builton-demand to aggregate, join and report on database documents. Because viewsare built dynamically and don’t affect the underlying document, you can haveas many different view representations of the same data as you like.
View definitions are strictly virtual and only display the documents from thecurrent database instance, making them separate from the data they displayand compatible with replication. CouchDB views are defined inside specialdesign documents and can replicate across database instances likeregular documents, so that not only data replicates in CouchDB,but entire application designs replicate too.
1.1.4.2. JavaScript View Functions
Views are defined using JavaScript functions acting as the map part in amap-reduce system. A view function takes a CouchDB documentas an argument and then does whatever computation it needs to do to determinethe data that is to be made available through the view, if any.It can add multiple rows to the view based on a single document,or it can add no rows at all.
See also
1.1.4.3. View Indexes
Views are a dynamic representation of the actual document contents of adatabase, and CouchDB makes it easy to create useful views of data.But generating a view of a database with hundreds of thousands or millions ofdocuments is time and resource consuming, it’s not something the systemshould do from scratch each time.
To keep view querying fast, the view engine maintains indexes of its views,and incrementally updates them to reflect changes in the database.CouchDB’s core design is largely optimized around the need for efficient,incremental creation of views and their indexes.
Views and their functions are defined inside special “design” documents,and a design document may contain any number of uniquely named view functions.When a user opens a view and its index is automatically updated, all the viewsin the same design document are indexed as a single group.
The view builder uses the database sequence ID to determine if the view groupis fully up-to-date with the database. If not, the view engine examinesall database documents (in packed sequential order) changed since the lastrefresh. Documents are read in the order they occur in the disk file,reducing the frequency and cost of disk head seeks.
The views can be read and queried simultaneously while also being refreshed.If a client is slowly streaming out the contents of a large view,the same view can be concurrently opened and refreshed for another clientwithout blocking the first client. This is true for any number ofsimultaneous client readers, who can read and query the view while the indexis concurrently being refreshed for other clients without causing problemsfor the readers.
As documents are processed by the view engine through your ‘map’ and ‘reduce’functions, their previous row values are removed from the view indexes, ifthey exist. If the document is selected by a view function, the function resultsare inserted into the view as a new row.
When view index changes are written to disk, the updates are always appendedat the end of the file, serving to both reduce disk head seek times duringdisk commits and to ensure crashes and power failures can not causecorruption of indexes. If a crash occurs while updating a view index,the incomplete index updates are simply lost and rebuilt incrementally fromits previously committed state.
1.1.5. Security and Validation
To protect who can read and update documents, CouchDB has a simple readeraccess and update validation model that can be extended to implement customsecurity models.
See also
1.1.5.1. Administrator Access
CouchDB database instances have administrator accounts. Administratoraccounts can create other administrator accounts and update design documents.Design documents are special documents containing view definitions and otherspecial formulas, as well as regular fields and blobs.
1.1.5.2. Update Validation
As documents are written to disk, they can be validated dynamically byJavaScript functions for both security and data validation. When the documentpasses all the formula validation criteria, the update is allowed to continue.If the validation fails, the update is aborted and the user client gets anerror response.
Both the user’s credentials and the updated document are given as inputs tothe validation formula, and can be used to implement custom security modelsby validating a user’s permissions to update a document.
A basic “author only” update document model is trivial to implement,where document updates are validated to check if the user is listed in an“author” field in the existing document. More dynamic models are also possible,like checking a separate user account profile for permission settings.
The update validations are enforced for both live usage and replicatedupdates, ensuring security and data validation in a shared, distributed system.
See also
Validate Document Update Functions
1.1.6. Distributed Updates and Replication
CouchDB is a peer-based distributed database system. It allows users and serversto access and update the same shared data while disconnected. Those changes canthen be replicated bi-directionally later.
The CouchDB document storage, view and security models are designed to worktogether to make true bi-directional replication efficient and reliable.Both documents and designs can replicate, allowing full database applications(including application design, logic and data) to be replicated to laptopsfor offline use, or replicated to servers in remote offices where slow orunreliable connections make sharing data difficult.
The replication process is incremental. At the database level,replication only examines documents updated since the last replication.If replication fails at any step, due to networkproblems or crash for example, the next replication restarts at the lastcheckpoint.
Partial replicas can be created and maintained. Replication can be filteredby a JavaScript function, so that only particular documents or those meetingspecific criteria are replicated. This can allow users to take subsets of alarge shared database application offline for their own use, while maintainingnormal interaction with the application and that subset of data.
1.1.6.1. Conflicts
Conflict detection and management are key issues for any distributed editsystem. The CouchDB storage system treats edit conflicts as a common state,not an exceptional one. The conflict handling model is simple and“non-destructive” while preserving single document semantics and allowing fordecentralized conflict resolution.
CouchDB allows for any number of conflicting documents to existsimultaneously in the database, with each database instance deterministicallydeciding which document is the “winner” and which are conflicts. Only thewinning document can appear in views, while “losing” conflicts are stillaccessible and remain in the database until deleted or purged duringdatabase compaction. Because conflict documents are still regular documents,they replicate just like regular documents and are subject to the samesecurity and validation rules.
When distributed edit conflicts occur, every database replica sees the samewinning revision and each has the opportunity to resolve the conflict.Resolving conflicts can be done manually or, depending on the nature of thedata and the conflict, by automated agents. The system makes decentralizedconflict resolution possible while maintaining single document databasesemantics.
Conflict management continues to work even if multiple disconnected users oragents attempt to resolve the same conflicts. If resolved conflicts result inmore conflicts, the system accommodates them in the same manner, determiningthe same winner on each machine and maintaining single document semantics.
See also
Replication and conflict model
1.1.6.2. Applications
Using just the basic replication model, many traditionally single serverdatabase applications can be made distributed with almost no extra work.CouchDB replication is designed to be immediately useful for basic databaseapplications, while also being extendable for more elaborate and full-featureduses.
With very little database work, it is possible to build a distributeddocument management application with granular security and full revisionhistories. Updates to documents can be implemented to exploit incrementalfield and blob replication, where replicated updates are nearly as efficientand incremental as the actual edit differences (“diffs”).
The CouchDB replication model can be modified for other distributed updatemodels. If the storage engine is enhanced to allow multi-document updatetransactions, it is possible to perform Subversion-like “all or nothing”atomic commits when replicating with an upstream server, such that any singledocument conflict or validation failure will cause the entire update to fail.Like Subversion, conflicts would be resolved by doing a “pull” replication toforce the conflicts locally, then merging and re-replicating to the upstreamserver.
1.1.7. Implementation
CouchDB is built on the Erlang OTP platform, a functional,concurrent programming language and development platform. Erlang wasdeveloped for real-time telecom applications with an extreme emphasis onreliability and availability.
Both in syntax and semantics, Erlang is very different from conventionalprogramming languages like C or Java. Erlang uses lightweight “processes” andmessage passing for concurrency, it has no shared state threading and alldata is immutable. The robust, concurrent nature of Erlang is ideal for adatabase server.
CouchDB is designed for lock-free concurrency, in the conceptual model andthe actual Erlang implementation. Reducing bottlenecks and avoiding lockskeeps the entire system working predictably under heavy loads. CouchDB canaccommodate many clients replicating changes, opening and updating documents,and querying views whose indexes are simultaneously being refreshed forother clients, without needing locks.
For higher availability and more concurrent users, CouchDB is designed for“shared nothing” clustering. In a “shared nothing” cluster, each machineis independent and replicates data with its cluster mates, allowing individualserver failures with zero downtime. And because consistency scansand fix-ups aren’t needed on restart,if the entire cluster fails – due to a power outage in a datacenter,for example – the entire CouchDB distributed system becomes immediatelyavailable after a restart.
CouchDB is built from the start with a consistent vision of a distributeddocument database system. Unlike cumbersome attempts to bolt distributedfeatures on top of the same legacy models and databases,it is the result of careful ground-up design, engineering and integration.The document, view, security and replication models, the special purpose querylanguage, the efficient and robust disk layout and the concurrent and reliablenature of the Erlang platform are all carefully integrated for a reliableand efficient system.