1.6. The Core API
This document explores the CouchDB in minute detail. It shows all thenitty-gritty and clever bits. We show you best practices and guide you aroundcommon pitfalls.
We start out by revisiting the basic operations we ran in the previous documentGetting Started, looking behind the scenes. We also show what Fauxton needs todo behind its user interface to give us the nice features we saw earlier.
This document is both an introduction to the core CouchDB API as well as areference. If you can’t remember how to run a particular request or why someparameters are needed, you can always come back here and look things up (weare probably the heaviest users of this document).
While explaining the API bits and pieces, we sometimes need to take a largerdetour to explain the reasoning for a particular request. This is a goodopportunity for us to tell you why CouchDB works the way it does.
The API can be subdivided into the following sections. We’ll explore themindividually:
1.6.1. Server
This one is basic and simple. It can serve as a sanity check to see ifCouchDB is running at all. It can also act as a safety guard for librariesthat require a certain version of CouchDB. We’re using the curl utilityagain:
- curl http://127.0.0.1:5984/
CouchDB replies, all excited to get going:
- {
- "couchdb": "Welcome",
- "version": "2.0.0",
- "vendor": {
- "name": "The Apache Software Foundation"
- }
- }
You get back a JSON string, that, if parsed into a native object or datastructure of your programming language, gives you access to the welcomestring and version information.
This is not terribly useful, but it illustrates nicely the way CouchDBbehaves. You send an HTTP request and you receive a JSON string in the HTTPresponse as a result.
1.6.2. Databases
Now let’s do something a little more useful: create databases.For the strict, CouchDB is a database management system (DMS). That means itcan hold multiple databases. A database is a bucket that holds “related data”.We’ll explore later what that means exactly. In practice, the terminology isoverlapping – often people refer to a DMS as “a database” and also a databasewithin the DMS as “a database.” We might follow that slight oddity, so don’tget confused by it. In general, it should be clear from the context if we aretalking about the whole of CouchDB or a single database within CouchDB.
Now let’s make one! We want to store our favorite music albums,and we creatively give our database the name albums. Note that we’re nowusing the -X
option again to tell curl to send a PUT requestinstead of the default GET request:
- curl -X PUT http://127.0.0.1:5984/albums
CouchDB replies:
- {"ok":true}
That’s it. You created a database and CouchDB told you that all went well.What happens if you try to create a database that already exists? Let’s tryto create that database again:
- curl -X PUT http://127.0.0.1:5984/albums
CouchDB replies:
- {"error":"file_exists","reason":"The database could not be created, the file already exists."}
We get back an error. This is pretty convenient. We also learn a little bitabout how CouchDB works. CouchDB stores each database in a single file.Very simple.
Let’s create another database, this time with curl’s -v
(for “verbose”)option. The verbose option tells curl to show us not only the essentials –the HTTP response body – but all the underlying request and response details:
- curl -vX PUT http://127.0.0.1:5984/albums-backup
curl elaborates:
- * About to connect() to 127.0.0.1 port 5984 (#0)
- * Trying 127.0.0.1... connected
- * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
- > PUT /albums-backup HTTP/1.1
- > User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
- > Host: 127.0.0.1:5984
- > Accept: */*
- >
- < HTTP/1.1 201 Created
- < Server: CouchDB (Erlang/OTP)
- < Date: Sun, 05 Jul 2009 22:48:28 GMT
- < Content-Type: text/plain;charset=utf-8
- < Content-Length: 12
- < Cache-Control: must-revalidate
- <
- {"ok":true}
- * Connection #0 to host 127.0.0.1 left intact
- * Closing connection #0
What a mouthful. Let’s step through this line by line to understand what’sgoing on and find out what’s important. Once you’ve seen this output a fewtimes, you’ll be able to spot the important bits more easily.
- * About to connect() to 127.0.0.1 port 5984 (#0)
This is curl telling us that it is going to establish a TCP connection to theCouchDB server we specified in our request URI. Not at all important,except when debugging networking issues.
- * Trying 127.0.0.1... connected
- * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
curl tells us it successfully connected to CouchDB. Again,not important if you aren’t trying to find problems with your network.
The following lines are prefixed with >
and <
characters.The >
means the line was sent to CouchDB verbatim (without the actual>
). The <
means the line was sent back to curl by CouchDB.
- > PUT /albums-backup HTTP/1.1
This initiates an HTTP request. Its method is PUT, the URI is/albums-backup
, and the HTTP version is HTTP/1.1
. There is alsoHTTP/1.0
, which is simpler in some cases, but for all practical reasonsyou should be using HTTP/1.1
.
Next, we see a number of request headers. These are used to provideadditional details about the request to CouchDB.
- > User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
The User-Agent header tells CouchDB which piece of client software is doingthe HTTP request. We don’t learn anything new: it’s curl. This header isoften useful in web development when there are known errors in clientimplementations that a server might want to prepare the response for.It also helps to determine which platform a user is on. This informationcan be used for technical and statistical reasons. For CouchDB, theUser-Agent header is irrelevant.
- > Host: 127.0.0.1:5984
The Host header is required by HTTP 1.1
. It tells the serverthe hostname that came with the request.
- > Accept: */*
The Accept header tells CouchDB that curl accepts any media type.We’ll look into why this is useful a little later.
- >
An empty line denotes that the request headers are now finished and the restof the request contains data we’re sending to the server. In this case,we’re not sending any data, so the rest of the curl output is dedicated tothe HTTP response.
- < HTTP/1.1 201 Created
The first line of CouchDB’s HTTP response includes the HTTP versioninformation (again, to acknowledge that the requested version could beprocessed), an HTTP status code, and a status code message.Different requests trigger different response codes. There’s a whole range ofthem telling the client (curl in our case) what effect the request had on theserver. Or, if an error occurred, what kind of error. RFC 2616 (the HTTP 1.1specification) defines clear behavior for response codes. CouchDB fullyfollows the RFC.
The 201 Created status code tells the client that the resourcethe request was made against was successfully created. No surprise here,but if you remember that we got an error message when we tried to create thisdatabase twice, you now know that this response could include a differentresponse code. Acting upon responses based on response codes is a commonpractice. For example, all response codes of 400 Bad Request or largertell you that some error occurred. If you want to shortcut your logic andimmediately deal with the error, you could just check a >= 400
responsecode.
- < Server: CouchDB (Erlang/OTP)
The Server header is good for diagnostics. It tells us whichCouchDB version and which underlying Erlang version we are talking to.In general, you can ignore this header, but it is good to know it’s there ifyou need it.
- < Date: Sun, 05 Jul 2009 22:48:28 GMT
The Date header tells you the time of the server. Since clientand server time are not necessarily synchronized, this header is purelyinformational. You shouldn’t build any critical application logic on topof this!
- < Content-Type: text/plain;charset=utf-8
The Content-Type header tells you which MIME typethe HTTP response body is and its encoding. We already know CouchDB returnsJSON strings. The appropriate Content-Type header isapplication/json. Why do we see text/plain?This is where pragmatism wins over purity. Sending anapplication/jsonContent-Type header will makea browser offer you the returned JSON for download instead ofjust displaying it. Since it is extremely useful to be able to test CouchDBfrom a browser, CouchDB sends a text/plain content type, so allbrowsers will display the JSON as text.
Note
There are some extensions that make your browser JSON-aware,but they are not installed by default. For more information, look atthe popular JSONView extension, available for both Firefox and Chrome.
Do you remember the Accept request header and how it is set to/
to express interest in any MIME type? If you send Accept:
in your request, CouchDB knows that you can deal with a pureJSON response with the proper Content-Type header and willuse it instead of text/plain.
application/json
- < Content-Length: 12
The Content-Length header simply tells us how many bytesthe response body has.
- < Cache-Control: must-revalidate
This Cache-Control header tells you, or any proxy server betweenCouchDB and you, not to cache this response.
- <
This empty line tells us we’re done with the response headers and whatfollows now is the response body.
- {"ok":true}
We’ve seen this before.
- * Connection #0 to host 127.0.0.1 left intact
- * Closing connection #0
The last two lines are curl telling us that it kept the TCP connection itopened in the beginning open for a moment, but then closed it after itreceived the entire response.
Throughout the documents, we’ll show more requests with the -v
option,but we’ll omit some of the headers we’ve seen here and include only thosethat are important for the particular request.
Creating databases is all fine, but how do we get rid of one? Easy – justchange the HTTP method:
- > curl -vX DELETE http://127.0.0.1:5984/albums-backup
This deletes a CouchDB database. The request will remove the file that thedatabase contents are stored in. There is no “Are you sure?” safety net orany “Empty the trash” magic you’ve got to do to delete a database. Use thiscommand with care. Your data will be deleted without a chance to bring itback easily if you don’t have a backup copy.
This section went knee-deep into HTTP and set the stage for discussing therest of the core CouchDB API. Next stop: documents.
1.6.3. Documents
Documents are CouchDB’s central data structure. The idea behind a documentis, unsurprisingly, that of a real-world document – a sheet of paper such asan invoice, a recipe, or a business card. We already learned that CouchDB usesthe JSON format to store documents. Let’s see how this storing works at thelowest level.
Each document in CouchDB has an ID. This ID is unique per database. You arefree to choose any string to be the ID, but for best results we recommend aUUID (or GUID), i.e., a Universally (or Globally) Unique IDentifier.UUIDs are random numbers that have such a low collision probability thateverybody can make thousands of UUIDs a minute for millions of years withoutever creating a duplicate. This is a great way to ensure two independent peoplecannot create two different documents with the same ID. Why should you carewhat somebody else is doing? For one, that somebody else could be you at alater time or on a different computer; secondly, CouchDB replication lets youshare documents with others and using UUIDs ensures that it all works.But more on that later; let’s make some documents:
- curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af -d '{"title":"There is Nothing Left to Lose","artist":"Foo Fighters"}'
CouchDB replies:
- {"ok":true,"id":"6e1295ed6c29495e54cc05947f18c8af","rev":"1-2902191555"}
The curl command appears complex, but let’s break it down.First, -X PUT
tells curl to make a PUT request.It is followed by the URL that specifies your CouchDB IP address and port.The resource part of the URL /albums/6e1295ed6c29495e54cc05947f18c8af
specifies the location of a document inside our albums database.The wild collection of numbers and characters is a UUID. This UUID is yourdocument’s ID. Finally, the -d
flag tells curl to use the followingstring as the body for the PUT request. The string is a simple JSONstructure including title
and artist
attributes with their respectivevalues.
Note
If you don’t have a UUID handy, you can ask CouchDB to give you one (infact, that is what we did just now without showing you). Simply send aGET /_uuids
request:
- curl -X GET http://127.0.0.1:5984/_uuids
CouchDB replies:
- {"uuids":["6e1295ed6c29495e54cc05947f18c8af"]}
Voilà, a UUID. If you need more than one, you can pass in the ?count=10
HTTP parameter to request 10 UUIDs, or really, any number you need.
To double-check that CouchDB isn’t lying about having saved your document (itusually doesn’t), try to retrieve it by sending a GET request:
- curl -X GET http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af
We hope you see a pattern here. Everything in CouchDB has an address, a URI,and you use the different HTTP methods to operate on these URIs.
CouchDB replies:
- {"_id":"6e1295ed6c29495e54cc05947f18c8af","_rev":"1-2902191555","title":"There is Nothing Left to Lose","artist":"Foo Fighters"}
This looks a lot like the document you asked CouchDB to save, which is good.But you should notice that CouchDB added two fields to your JSON structure.The first is _id
, which holds the UUID we asked CouchDB to save our documentunder. We always know the ID of a document if it is included, which is veryconvenient.
The second field is rev
. It stands for _revision.
1.6.3.1. Revisions
If you want to change a document in CouchDB, you don’t tell it to go and finda field in a specific document and insert a new value. Instead, you loadthe full document out of CouchDB, make your changes in the JSON structure(or object, when you are doing actual programming), and save the entire newrevision (or version) of that document back into CouchDB. Each revision isidentified by a new _rev
value.
If you want to update or delete a document, CouchDB expects you to includethe _rev
field of the revision you wish to change. When CouchDB acceptsthe change, it will generate a new revision number. This mechanism ensures that,in case somebody else made a change without you knowing before you got torequest the document update, CouchDB will not accept your update because youare likely to overwrite data you didn’t know existed. Or simplified: whoeversaves a change to a document first, wins. Let’s see what happens if we don’tprovide a _rev
field (which is equivalent to providing a outdated value):
- curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af \
- -d '{"title":"There is Nothing Left to Lose","artist":"Foo Fighters","year":"1997"}'
CouchDB replies:
- {"error":"conflict","reason":"Document update conflict."}
If you see this, add the latest revision number of your document to the JSONstructure:
- curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af \
- -d '{"_rev":"1-2902191555","title":"There is Nothing Left to Lose","artist":"Foo Fighters","year":"1997"}'
Now you see why it was handy that CouchDB returned that _rev
when we madethe initial request. CouchDB replies:
- {"ok":true,"id":"6e1295ed6c29495e54cc05947f18c8af","rev":"2-8aff9ee9d06671fa89c99d20a4b3ae"}
CouchDB accepted your write and also generated a new revision number.The revision number is the MD5 hash of the transport representation of adocument with an N-
prefix denoting the number of times a document gotupdated. This is useful for replication. See Replication and conflict model formore information.
There are multiple reasons why CouchDB uses this revision system,which is also called Multi-Version Concurrency Control (MVCC). They all workhand-in-hand, and this is a good opportunity to explain some of them.
One of the aspects of the HTTP protocol that CouchDB uses is that it isstateless. What does that mean? When talking to CouchDB you need to makerequests. Making a request includes opening a network connection to CouchDB,exchanging bytes, and closing the connection. This is done every time youmake a request. Other protocols allow you to open a connection, exchange bytes,keep the connection open, exchange more bytes later – maybe depending on thebytes you exchanged at the beginning – and eventually close the connection.Holding a connection open for later use requires the server to do extra work.One common pattern is that for the lifetime of a connection, the client hasa consistent and static view of the data on the server. Managing huge amountsof parallel connections is a significant amount of work. HTTP connections areusually short-lived, and making the same guarantees is a lot easier.As a result, CouchDB can handle many more concurrent connections.
Another reason CouchDB uses MVCC is that this model is simpler conceptuallyand, as a consequence, easier to program. CouchDB uses less code to make thiswork, and less code is always good because the ratio of defects per lines ofcode is static.
The revision system also has positive effects on replication and storagemechanisms, but we’ll explore these later in the documents.
Warning
The terms version and revision might sound familiar (if you areprogramming without version control, stop reading this guide right now andstart learning one of the popular systems). Using new versions for documentchanges works a lot like version control, but there’s an importantdifference: CouchDB does not guarantee that older versions are keptaround.
1.6.3.2. Documents in Detail
Now let’s have a closer look at our document creation requests with the curl-v
flag that was helpful when we explored the database API earlier.This is also a good opportunity to create more documents that we can use inlater examples.
We’ll add some more of our favorite music albums. Get a fresh UUID from the/_uuids
resource. If you don’t remember how that works, you can look it upa few pages back.
- curl -vX PUT http://127.0.0.1:5984/albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 \
- -d '{"title":"Blackened Sky","artist":"Biffy Clyro","year":2002}'
Note
By the way, if you happen to know more information about your favoritealbums, don’t hesitate to add more properties. And don’t worry about notknowing all the information for all the albums. CouchDB’s schema-lessdocuments can contain whatever you know. After all, you should relax and notworry about data.
Now with the -v
option, CouchDB’s reply (with only the important bits shown)looks like this:
- > PUT /albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 HTTP/1.1
- >
- < HTTP/1.1 201 Created
- < Location: http://127.0.0.1:5984/albums/70b50bfa0a4b3aed1f8aff9e92dc16a0
- < ETag: "1-e89c99d29d06671fa0a4b3ae8aff9e"
- <
- {"ok":true,"id":"70b50bfa0a4b3aed1f8aff9e92dc16a0","rev":"1-e89c99d29d06671fa0a4b3ae8aff9e"}
We’re getting back the 201 Created HTTP status code in the responseheaders, as we saw earlier when we created a database. The Locationheader gives us a full URL to our newly created document. And there’s a newheader. An ETag in HTTP-speak identifies a specific version of aresource. In this case, it identifies a specific version (the first one) of ournew document. Sound familiar? Yes, conceptually, an ETag is the sameas a CouchDB document revision number, and it shouldn’t come as a surprise thatCouchDB uses revision numbers for ETags. ETags are useful for cachinginfrastructures.
1.6.3.3. Attachments
CouchDB documents can have attachments just like an email message can haveattachments. An attachment is identified by a name and includes its MIME type(or Content-Type) and the number of bytes the attachmentcontains. Attachments can be any data. It is easiest to think about attachmentsas files attached to a document. These files can be text, images, Worddocuments, music, or movie files. Let’s make one.
Attachments get their own URL where you can upload data. Say we want to addthe album artwork to the 6e1295ed6c29495e54cc05947f18c8af
document(“There is Nothing Left to Lose”), and let’s also say the artwork is in a fileartwork.jpg in the current directory:
- curl -vX PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/artwork.jpg?rev=2-2739352689 \
- --data-binary @artwork.jpg -H "Content-Type:image/jpg"
Note
The —data-binary
@
option tells curl to read a file’s contents intothe HTTP request body. We’re using the -H
option to tell CouchDB thatwe’re uploading a JPEG file. CouchDB will keep this information around andwill send the appropriate header when requesting this attachment; in case ofan image like this, a browser will render the image instead of offering youthe data for download. This will come in handy later. Note that you needto provide the current revision number of the document you’re attachingthe artwork to, just as if you would update the document. Because, afterall, attaching some data is changing the document.
You should now see your artwork image if you point your browser tohttp://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/artwork.jpg
If you request the document again, you’ll see a new member:
- curl http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af
CouchDB replies:
- {
- "_id": "6e1295ed6c29495e54cc05947f18c8af",
- "_rev": "3-131533518",
- "title": "There is Nothing Left to Lose",
- "artist": "Foo Fighters",
- "year": "1997",
- "_attachments": {
- "artwork.jpg": {
- "stub": true,
- "content_type": "image/jpg",
- "length": 52450
- }
- }
- }
_attachments
is a list of keys and values where the values are JSON objectscontaining the attachment metadata. stub=true
tells us that this entry isjust the metadata. If we use the ?attachments=true
HTTP option whenrequesting this document, we’d get a Base64 encoded string containing theattachment data.
We’ll have a look at more document request options later as we explore morefeatures of CouchDB, such as replication, which is the next topic.
1.6.4. Replication
CouchDB replication is a mechanism to synchronize databases. Much like rsyncsynchronizes two directories locally or over a network, replication synchronizestwo databases locally or remotely.
In a simple POST request, you tell CouchDB the source and thetarget of a replication and CouchDB will figure out which documents and newdocument revisions are on source that are not yet on target, and willproceed to move the missing documents and revisions over.
We’ll take an in-depth look at replication in the documentIntroduction to Replication; in this document, we’ll just show you how to use it.
First, we’ll create a target database. Note that CouchDB won’t automaticallycreate a target database for you, and will return a replication failure ifthe target doesn’t exist (likewise for the source, but that mistake isn’t aseasy to make):
- curl -X PUT http://127.0.0.1:5984/albums-replica
Now we can use the database albums-replica as a replication target:
- curl -vX POST http://127.0.0.1:5984/_replicate \
- -d '{"source":"albums","target":"albums-replica"}' \
- -H "Content-Type: application/json"
Note
CouchDB supports the option "create_target":true
placed in the JSONPOSTed to the _replicate URL. It implicitlycreates the target database if it doesn’t exist.
CouchDB replies (this time we formatted the output so you can read it moreeasily):
- {
- "history": [
- {
- "start_last_seq": 0,
- "missing_found": 2,
- "docs_read": 2,
- "end_last_seq": 5,
- "missing_checked": 2,
- "docs_written": 2,
- "doc_write_failures": 0,
- "end_time": "Sat, 11 Jul 2009 17:36:21 GMT",
- "start_time": "Sat, 11 Jul 2009 17:36:20 GMT"
- }
- ],
- "source_last_seq": 5,
- "session_id": "924e75e914392343de89c99d29d06671",
- "ok": true
- }
CouchDB maintains a session history of replications. The response for areplication request contains the history entry for this replication session.It is also worth noting that the request for replication will stay open untilreplication closes. If you have a lot of documents, it’ll take a while untilthey are all replicated and you won’t get back the replication responseuntil all documents are replicated. It is important to note thatreplication replicates the database only as it was at the point in timewhen replication was started. So, any additions, modifications,or deletions subsequent to the start of replication will not be replicated.
We’ll punt on the details again – the "ok": true
at the end tells us allwent well. If you now have a look at the albums-replica database,you should see all the documents that you created in the albums database.Neat, eh?
What you just did is called local replication in CouchDB terms. You created alocal copy of a database. This is useful for backups or to keep snapshots ofa specific state of your data around for later. You might want to do thisif you are developing your applications but want to be able to roll back toa stable version of your code and data.
There are more types of replication useful in other situations. The sourceand target members of our replication request are actually links (like inHTML) and so far we’ve seen links relative to the server we’re working on(hence local). You can also specify a remote database as the target:
- curl -vX POST http://127.0.0.1:5984/_replicate \
- -d '{"source":"albums","target":"http://example.org:5984/albums-replica"}' \
- -H "Content-Type:application/json"
Using a local source and a remote target database is called pushreplication. We’re pushing changes to a remote server.
Note
Since we don’t have a second CouchDB server around just yet, we’ll just usethe absolute address of our single server, but you should be able to inferfrom this that you can put any remote server in there.
This is great for sharing local changes with remote servers or buddies nextdoor.
You can also use a remote source and a local target to do a pullreplication. This is great for getting the latest changes from a server thatis used by others:
- curl -vX POST http://127.0.0.1:5984/_replicate \
- -d '{"source":"http://example.org:5984/albums-replica","target":"albums"}' \
- -H "Content-Type:application/json"
Finally, you can run remote replication, which is mostly useful for managementoperations:
- curl -vX POST http://127.0.0.1:5984/_replicate \
- -d '{"source":"http://example.org:5984/albums","target":"http://example.org:5984/albums-replica"}' \
- -H"Content-Type: application/json"
Note
CouchDB and REST
CouchDB prides itself on having a RESTful API, but these replicationrequests don’t look very RESTy to the trained eye. What’s up with that?While CouchDB’s core database, document, and attachment API are RESTful,not all of CouchDB’s API is. The replication API is one example. There aremore, as we’ll see later in the documents.
Why are there RESTful and non-RESTful APIs mixed up here? Have thedevelopers been too lazy to go REST all the way? Remember, REST is anarchitectural style that lends itself to certain architectures (such as theCouchDB document API). But it is not a one-size-fits-all. Triggering anevent like replication does not make a whole lot of sense in the REST world.It is more like a traditional remote procedure call. And there is nothingwrong with this.
We very much believe in the “use the right tool for the job” philosophy,and REST does not fit every job. For support, we refer to Leonard Richardsonand Sam Ruby who wrote RESTful Web Services (O’Reilly), as they share ourview.
1.6.5. Wrapping Up
This is still not the full CouchDB API, but we discussed the essentials ingreat detail. We’re going to fill in the blanks as we go. For now, we believeyou’re ready to start building CouchDB applications.
See also