Read Isolation, Consistency, and Recency
Isolation Guarantees
Read Uncommitted
Depending on the read concern, clients can see the results of writesbefore the writes are durable:
- Regardless of a write’s write concern, otherclients using
"local"
or"available"
read concern can see the result of a write operation before the writeoperation is acknowledged to the issuing client. - Clients using
"local"
or"available"
read concern can read data which may be subsequently rolledback during replica set failovers.
For operations in a multi-document transaction, when a transaction commits, all data changesmade in the transaction are saved and visible outside the transaction.That is, a transaction will not commit some of its changes whilerolling back others.
Until a transaction commits, the data changes made in thetransaction are not visible outside the transaction.
However, when a transaction writes to multiple shards, not alloutside read operations need to wait for the result of the committedtransaction to be visible across the shards. For example, if atransaction is committed and write 1 is visible on shard A but write2 is not yet visible on shard B, an outside read at read concern"local"
can read the results of write 1 withoutseeing write 2.
Read uncommitted is the default isolation level and applies tomongod
standalone instances as well as to replica sets andsharded clusters.
Read Uncommitted And Single Document Atomicity
Write operations are atomic with respect to a single document; i.e.if a write is updating multiple fields in the document, a read operationwill never see the document with only some of the fields updated.However, although a client may not see a partially updateddocument, read uncommitted means that concurrent read operations may stillsee the updated document before the changes are made durable.
With a standalone mongod
instance, a set of read and writeoperations to a single document is serializable. With a replica set,a set of read and write operations to a single document is serializableonly in the absence of a rollback.
Read Uncommitted And Multiple Document Write
When a single write operation (e.g.db.collection.updateMany()
) modifies multiple documents,the modification of each document is atomic, but the operation as awhole is not atomic.
When performing multi-document write operations, whether through asingle write operation or multiple write operations, otheroperations may interleave.
For situations that require atomicity of reads and writes to multipledocuments (in a single or multiple collections), MongoDB supportsmulti-document transactions:
- In version 4.0, MongoDB supports multi-document transactions onreplica sets.
- In version 4.2, MongoDB introduces distributed transactions,which adds support for multi-document transactions on shardedclusters and incorporates the existing support formulti-document transactions on replica sets.
For details regarding transactions in MongoDB, see theTransactions page.
Important
In most cases, multi-document transaction incurs a greaterperformance cost over single document writes, and theavailability of multi-document transactions should not be areplacement for effective schema design. For many scenarios, thedenormalized data model (embedded documents and arrays) will continue to be optimal for yourdata and use cases. That is, for many scenarios, modeling your dataappropriately will minimize the need for multi-documenttransactions.
For additional transactions usage considerations(such as runtime limit and oplog size limit), see alsoProduction Considerations.
Without isolating the multi-document write operations, MongoDB exhibitsthe following behavior:
- Non-point-in-time read operations. Suppose a read operationbegins at time _t_1 and starts reading documents. Awrite operation then commits an update to one of the documents atsome later time _t_2. The reader may see the updatedversion of the document, and therefore does not see apoint-in-time snapshot of the data.
- Non-serializable operations. Suppose a read operation reads adocument d_1 at time _t_1 and a writeoperation updates _d_1 at some later time _t_3.This introduces a read-write dependency such that, if theoperations were to be serialized, the read operation must precedethe write operation. But also suppose that the write operationupdates document _d_2 at time _t_2 and the readoperation subsequently reads _d_2 at some later time_t_4. This introduces a write-read dependency whichwould instead require the read operation to come _after thewrite operation in a serializable schedule. There is a dependencycycle which makes serializability impossible.
- Reads may miss matching documents that are updated during thecourse of the read operation.
Cursor Snapshot
MongoDB cursors can return the same document more than once in somesituations. As a cursor returns documents other operations mayinterleave with the query. If some of these operations change theindexed field on the index used by the query; then the cursor willreturn the same document more than once.
If your collection has a field or fields that arenever modified, you can use a unique index on this field or thesefields so that the query will return each document no more thanonce. Query with hint()
to explicitly force thequery to use that index.
Monotonic Writes
MongoDB provides monotonic write guarantees, by default, forstandalone mongod
instances and replica set.
For monotonic writes and sharded clusters, seeCausal Consistency.
Real Time Order
New in version 3.4.
For read and write operations on the primary, issuing read operationswith "linearizable"
read concern and write operationswith "majority"
write concern enables multiple threadsto perform reads and writes on a single document as if a single threadperformed these operations in real time; that is, the correspondingschedule for these reads and writes is considered linearizable.
See also
Causal Consistency
New in version 3.6.
If an operation logically depends on a preceding operation, there is acausal relationship between the operations. For example, a writeoperation that deletes all documents based on a specified condition anda subsequent read operation that verifies the delete operation have acausal relationship.
With causally consistent sessions, MongoDB executes causal operationsin an order that respect their causal relationships, and clientsobserve results that are consistent with the causal relationships.
Client Sessions and Causal Consistency Guarantees
To provide causal consistency, MongoDB 3.6 enables causal consistencyin client sessions. A causally consistent session denotes that theassociated sequence of read operations with "majority"
read concern and write operations with "majority"
writeconcern have a causal relationship that is reflected by their ordering.Applications must ensure that only one thread at a time executes theseoperations in a client session.
For causally related operations:
- A client starts a client session.
Important
Client sessions only guarantee causal consistency for:
- Read operations with
"majority"
; i.e. the returndata has been acknowledged by a majority of the replica setmembers and is durable. - Write operations with
"majority"
write concern;i.e. the write operations that request acknowledgement that theoperation has been applied to a majority of the replica set’svoting members.For more information on causal consistency and various read andwrite concerns, seeCausal Consistency and Read and Write Concerns.
As the client issues a sequence of read with
"majority"
read concern and write operations (with"majority"
write concern), the client includes thesession information with each operation.For each read operation with
"majority"
read concernand write operation with"majority"
write concernassociated with the session, MongoDB returns the operation time andthe cluster time, even if the operation errors. The client sessionkeeps track of the operation time and the cluster time.
Note
MongoDB does not return the operation time and the cluster timefor unacknowledged (w: 0
) write operations. Unacknowledgedwrites do not imply any causal relationship.
Although, MongoDB returns the operation time and the cluster timefor read operations and acknowledged write operations in a clientsession, only the read operations with"majority"
read concern and write operations with"majority"
write concern can guarantee causalconsistency. For details, seeCausal Consistency and Read and Write Concerns.
- The associated client session tracks these two time fields.
Note
Operations can be causally consistent across different sessions.MongoDB drivers and the mongo
shell provide themethods to advance the operation time and the cluster time for aclient session. So, a client can advance the cluster time and theoperation time of one client session to be consistent with theoperations of another client session.
Causal Consistency Guarantees
The following table lists the causal consistency guarantees provided bycausally consistent sessions for read operations with"majority"
read concern andwrite operations with "majority"
write concern.
Guarantees | Description |
---|---|
Read your writes | Read operations reflect the results of writeoperations that precede them. |
Monotonic reads | Read operations do not return results that correspond to anearlier state of the data than a preceding read operation.For example, if in a session:- write1 precedes write2,- read1 precedes read2, and- read1 returns results that reflect write2then read2 cannot return results of write1. |
Monotonic writes | Write operations that must precede other writes are executedbefore those other writes.For example, if write1 must precede write2 ina session, the state of the data at the time of write2must reflect the state of the data post write1. Otherwrites can interleave between write1 and write write2, but write2 cannot occur before write1. |
Writes follow reads | Write operations that must occur after read operations areexecuted after those read operations. That is, the state of thedata at the time of the write must incorporate the state of thedata of the preceding read operations. |
Read Preference
These guarantees hold across all members of the MongoDB deployment. Forexample, if, in a causally consistent session, you issue a write with"majority"
write concern followed by a read that readsfrom a secondary (i.e. read preference secondary
) with"majority"
read concern, the read operation will reflectthe state of the database after the write operation.
Isolation
Operations within a causally consistent session are not isolated fromoperations outside the session. If a concurrent write operationinterleaves between the session’s write and read operations, thesession’s read operation may return results that reflect a writeoperation that occurred after the session’s write operation.
Feature Compatibility Version
The featureCompatibilityVersion
(fCV) must be set to “3.6” orgreater. To check the fCV, run the following command:
- db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )
For more information, see View FeatureCompatibilityVersion andsetFeatureCompatibilityVersion
.
MongoDB Drivers
Clients require MongoDB drivers updated for MongoDB 3.6 or later:
Java 3.6+Python 3.6+C 1.9+ | C# 2.5+Node 3.0+Ruby 2.5+ | Perl 2.0+PHPC 1.4+Scala 2.2+ |
Examples
Important
Causally consistentsessions can only guarantee causal consistency for reads with"majority"
read concern and writes with"majority"
write concern.
Consider a collection items
that maintains the current andhistorical data for various items. Only the historical data has anon-null end
date. If the sku
value for an item changes, thedocument with the old sku
value needs to be updated with theend
date, after which the new document is inserted with the currentsku
value. The client can use a causally consistent session toensure that the update occurs before the insert.
- Python
- Java (Sync)
- PHP
- Motor
- C
- Other
- C#
- Perl
- with client.start_session(causal_consistency=True) as s1:
- current_date = datetime.datetime.today()
- items = client.get_database(
- 'test', read_concern=ReadConcern('majority'),
- write_concern=WriteConcern('majority', wtimeout=1000)).items
- items.update_one(
- {'sku': "111", 'end': None},
- {'$set': {'end': current_date}}, session=s1)
- items.insert_one(
- {'sku': "nuts-111", 'name': "Pecans",
- 'start': current_date}, session=s1)
- // Example 1: Use a causally consistent session to ensure that the update occurs before the insert.
- ClientSession session1 = client.startSession(ClientSessionOptions.builder().causallyConsistent(true).build());
- Date currentDate = new Date();
- MongoCollection<Document> items = client.getDatabase("test")
- .withReadConcern(ReadConcern.MAJORITY)
- .withWriteConcern(WriteConcern.MAJORITY.withWTimeout(1000, TimeUnit.MILLISECONDS))
- .getCollection("test");
- items.updateOne(session1, eq("sku", "111"), set("end", currentDate));
- Document document = new Document("sku", "nuts-111")
- .append("name", "Pecans")
- .append("start", currentDate);
- items.insertOne(session1, document);
- $items = $client->selectDatabase(
- 'test',
- [
- 'readConcern' => new \MongoDB\Driver\ReadConcern(\MongoDB\Driver\ReadConcern::MAJORITY),
- 'writeConcern' => new \MongoDB\Driver\WriteConcern(\MongoDB\Driver\WriteConcern::MAJORITY, 1000),
- ]
- )->items;
- $s1 = $client->startSession(
- [ 'causalConsistency' => true ]
- );
- $currentDate = new \MongoDB\BSON\UTCDateTime();
- $items->updateOne(
- [ 'sku' => '111', 'end' => [ '$exists' => false ] ],
- [ '$set' => [ 'end' => $currentDate ] ],
- [ 'session' => $s1 ]
- );
- $items->insertOne(
- [ 'sku' => '111-nuts', 'name' => 'Pecans', 'start' => $currentDate ],
- [ 'session' => $s1 ]
- );
- async with await client.start_session(causal_consistency=True) as s1:
- current_date = datetime.datetime.today()
- items = client.get_database(
- 'test', read_concern=ReadConcern('majority'),
- write_concern=WriteConcern('majority', wtimeout=1000)).items
- await items.update_one(
- {'sku': "111", 'end': None},
- {'$set': {'end': current_date}}, session=s1)
- await items.insert_one(
- {'sku': "nuts-111", 'name': "Pecans",
- 'start': current_date}, session=s1)
- /* Use a causally-consistent session to run some operations. */
- wc = mongoc_write_concern_new ();
- mongoc_write_concern_set_wmajority (wc, 1000);
- mongoc_collection_set_write_concern (coll, wc);
- rc = mongoc_read_concern_new ();
- mongoc_read_concern_set_level (rc, MONGOC_READ_CONCERN_LEVEL_MAJORITY);
- mongoc_collection_set_read_concern (coll, rc);
- session_opts = mongoc_session_opts_new ();
- mongoc_session_opts_set_causal_consistency (session_opts, true);
- session1 = mongoc_client_start_session (client, session_opts, &error);
- if (!session1) {
- fprintf (stderr, "couldn't start session: %s\n", error.message);
- goto cleanup;
- }
- /* Run an update_one with our causally-consistent session. */
- update_opts = bson_new ();
- res = mongoc_client_session_append (session1, update_opts, &error);
- if (!res) {
- fprintf (stderr, "couldn't add session to opts: %s\n", error.message);
- goto cleanup;
- }
- query = BCON_NEW ("sku", "111");
- update = BCON_NEW ("$set", "{", "end",
- BCON_DATE_TIME (bson_get_monotonic_time ()), "}");
- res = mongoc_collection_update_one (coll,
- query,
- update,
- update_opts,
- NULL, /* reply */
- &error);
- if (!res) {
- fprintf (stderr, "update failed: %s\n", error.message);
- goto cleanup;
- }
- /* Run an insert with our causally-consistent session */
- insert_opts = bson_new ();
- res = mongoc_client_session_append (session1, insert_opts, &error);
- if (!res) {
- fprintf (stderr, "couldn't add session to opts: %s\n", error.message);
- goto cleanup;
- }
- insert = BCON_NEW ("sku", "nuts-111", "name", "Pecans",
- "start", BCON_DATE_TIME (bson_get_monotonic_time ()));
- res = mongoc_collection_insert_one (coll, insert, insert_opts, NULL, &error);
- if (!res) {
- fprintf (stderr, "insert failed: %s\n", error.message);
- goto cleanup;
- }
- using (var session1 = client.StartSession(new ClientSessionOptions { CausalConsistency = true }))
- {
- var currentDate = DateTime.UtcNow.Date;
- var items = client.GetDatabase("test", new MongoDatabaseSettings
- {
- ReadConcern = ReadConcern.Majority,
- WriteConcern = new WriteConcern(
- WriteConcern.WMode.Majority,
- TimeSpan.FromMilliseconds(1000))
- })
- .GetCollection<BsonDocument>("items");
- items.UpdateOne(session1,
- Builders<BsonDocument>.Filter.And(
- Builders<BsonDocument>.Filter.Eq("sku", "111"),
- Builders<BsonDocument>.Filter.Eq("end", BsonNull.Value)),
- Builders<BsonDocument>.Update.Set("end", currentDate));
- items.InsertOne(session1, new BsonDocument
- {
- {"sku", "nuts-111"},
- {"name", "Pecans"},
- {"start", currentDate}
- });
- }
- my $s1 = $conn->start_session({ causalConsistency => 1 });
- $items = $conn->get_database(
- "test", {
- read_concern => { level => 'majority' },
- write_concern => { w => 'majority', wtimeout => 10000 },
- }
- )->get_collection("items");
- $items->update_one(
- {
- sku => 111,
- end => undef
- },
- {
- '$set' => { end => $current_date}
- },
- {
- session => $s1
- }
- );
- $items->insert_one(
- {
- sku => "nuts-111",
- name => "Pecans",
- start => $current_date
- },
- {
- session => $s1
- }
- );
If another client needs to read all current sku
values, you canadvance the cluster time and the operation time to that of the othersession to ensure that this client is causally consistent with theother session and read after the two writes:
- Python
- Java (Sync)
- PHP
- Motor
- C
- Other
- C#
- Perl
- with client.start_session(causal_consistency=True) as s2:
- s2.advance_cluster_time(s1.cluster_time)
- s2.advance_operation_time(s1.operation_time)
- items = client.get_database(
- 'test', read_preference=ReadPreference.SECONDARY,
- read_concern=ReadConcern('majority'),
- write_concern=WriteConcern('majority', wtimeout=1000)).items
- for item in items.find({'end': None}, session=s2):
- print(item)
- // Example 2: Advance the cluster time and the operation time to that of the other session to ensure that
- // this client is causally consistent with the other session and read after the two writes.
- ClientSession session2 = client.startSession(ClientSessionOptions.builder().causallyConsistent(true).build());
- session2.advanceClusterTime(session1.getClusterTime());
- session2.advanceOperationTime(session1.getOperationTime());
- items = client.getDatabase("test")
- .withReadPreference(ReadPreference.secondary())
- .withReadConcern(ReadConcern.MAJORITY)
- .withWriteConcern(WriteConcern.MAJORITY.withWTimeout(1000, TimeUnit.MILLISECONDS))
- .getCollection("items");
- for (Document item: items.find(session2, eq("end", BsonNull.VALUE))) {
- System.out.println(item);
- }
- $s2 = $client->startSession(
- [ 'causalConsistency' => true ]
- );
- $s2->advanceClusterTime($s1->getClusterTime());
- $s2->advanceOperationTime($s1->getOperationTime());
- $items = $client->selectDatabase(
- 'test',
- [
- 'readPreference' => new \MongoDB\Driver\ReadPreference(\MongoDB\Driver\ReadPreference::RP_SECONDARY),
- 'readConcern' => new \MongoDB\Driver\ReadConcern(\MongoDB\Driver\ReadConcern::MAJORITY),
- 'writeConcern' => new \MongoDB\Driver\WriteConcern(\MongoDB\Driver\WriteConcern::MAJORITY, 1000),
- ]
- )->items;
- $result = $items->find(
- [ 'end' => [ '$exists' => false ] ],
- [ 'session' => $s2 ]
- );
- foreach ($result as $item) {
- var_dump($item);
- }
- async with await client.start_session(causal_consistency=True) as s2:
- s2.advance_cluster_time(s1.cluster_time)
- s2.advance_operation_time(s1.operation_time)
- items = client.get_database(
- 'test', read_preference=ReadPreference.SECONDARY,
- read_concern=ReadConcern('majority'),
- write_concern=WriteConcern('majority', wtimeout=1000)).items
- async for item in items.find({'end': None}, session=s2):
- print(item)
- /* Make a new session, session2, and make it causally-consistent
- * with session1, so that session2 will read session1's writes. */
- session2 = mongoc_client_start_session (client, session_opts, &error);
- if (!session2) {
- fprintf (stderr, "couldn't start session: %s\n", error.message);
- goto cleanup;
- }
- /* Set the cluster time for session2 to session1's cluster time */
- cluster_time = mongoc_client_session_get_cluster_time (session1);
- mongoc_client_session_advance_cluster_time (session2, cluster_time);
- /* Set the operation time for session2 to session2's operation time */
- mongoc_client_session_get_operation_time (session1, ×tamp, &increment);
- mongoc_client_session_advance_operation_time (session2,
- timestamp,
- increment);
- /* Run a find on session2, which should now find all writes done
- * inside of session1 */
- find_opts = bson_new ();
- res = mongoc_client_session_append (session2, find_opts, &error);
- if (!res) {
- fprintf (stderr, "couldn't add session to opts: %s\n", error.message);
- goto cleanup;
- }
- find_query = BCON_NEW ("end", BCON_NULL);
- read_prefs = mongoc_read_prefs_new (MONGOC_READ_SECONDARY);
- cursor = mongoc_collection_find_with_opts (coll,
- query,
- find_opts,
- read_prefs);
- while (mongoc_cursor_next (cursor, &result)) {
- json = bson_as_json (result, NULL);
- fprintf (stdout, "Document: %s\n", json);
- bson_free (json);
- }
- if (mongoc_cursor_error (cursor, &error)) {
- fprintf (stderr, "cursor failure: %s\n", error.message);
- goto cleanup;
- }
- using (var session2 = client.StartSession(new ClientSessionOptions { CausalConsistency = true }))
- {
- session2.AdvanceClusterTime(session1.ClusterTime);
- session2.AdvanceOperationTime(session1.OperationTime);
- var items = client.GetDatabase("test", new MongoDatabaseSettings
- {
- ReadPreference = ReadPreference.Secondary,
- ReadConcern = ReadConcern.Majority,
- WriteConcern = new WriteConcern(WriteConcern.WMode.Majority, TimeSpan.FromMilliseconds(1000))
- })
- .GetCollection<BsonDocument>("items");
- var filter = Builders<BsonDocument>.Filter.Eq("end", BsonNull.Value);
- foreach (var item in items.Find(session2, filter).ToEnumerable())
- {
- // process item
- }
- }
- my $s2 = $conn->start_session({ causalConsistency => 1 });
- $s2->advance_cluster_time( $s1->cluster_time );
- $s2->advance_operation_time( $s1->operation_time );
- $items = $conn->get_database(
- "test", {
- read_preference => 'secondary',
- read_concern => { level => 'majority' },
- write_concern => { w => 'majority', wtimeout => 10000 },
- }
- )->get_collection("items");
- $cursor = $items->find( { end => undef }, { session => $s2 } );
- for my $item ( $cursor->all ) {
- say join(" ", %$item);
- }
Limitations
The following operations that build in-memory structures are notcausally consistent:
Operation | Notes |
---|---|
collStats | |
$collStats with latencyStats option. | |
$currentOp | Returns an error if the operation is associated with a causallyconsistent client session. |
createIndexes | |
dbHash | Starting in MongoDB 4.2 |
dbStats | |
getMore | Returns an error if the operation is associated with a causallyconsistent client session. |
$indexStats | |
mapReduce | Starting in MongoDB 4.2 |
ping | Returns an error if the operation is associated with a causallyconsistent client session. |
serverStatus | Returns an error if the operation is associated with a causallyconsistent client session. |
validate | Starting in MongoDB 4.2 |