- Frequently Asked Questions
- Is PyMongo thread-safe?
- Is PyMongo fork-safe?
- How does connection pooling work in PyMongo?
- Does PyMongo support Python 3?
- Does PyMongo support asynchronous frameworks like Gevent, asyncio, Tornado, or Twisted?
- Why does PyMongo add an _id field to all of my documents?
- Key order in subdocuments – why does my query work in the shell but not PyMongo?
- What does CursorNotFound cursor id not valid at server mean?
- How do I change the timeout value for cursors?
- How can I store decimal.Decimal instances?
- I’m saving 9.99 but when I query my document contains 9.9900000000000002 - what’s going on here?
- Can you add attribute style access for documents?
- What is the correct way to handle time zones with PyMongo?
- How can I save a datetime.date instance?
- When I query for a document by ObjectId in my web application I get no result
- How can I use PyMongo from Django?
- Does PyMongo work with mod_wsgi?
- Does PyMongo work with PythonAnywhere?
- How can I use something like Python’s json module to encode my documents to JSON?
- Why do I get OverflowError decoding dates stored by another language’s driver?
- Using PyMongo with Multiprocessing
Frequently Asked Questions
Is PyMongo thread-safe?
PyMongo is thread-safe and provides built-in connection poolingfor threaded applications.
Is PyMongo fork-safe?
PyMongo is not fork-safe. Care must be taken when using instances ofMongoClient
with fork()
. Specifically,instances of MongoClient must not be copied from a parent process toa child process. Instead, the parent process and each child process mustcreate their own instances of MongoClient. Instances of MongoClient copied fromthe parent process have a high probability of deadlock in the child process dueto the inherent incompatibilities between fork()
, threads, and locksdescribed below. PyMongo will attempt toissue a warning if there is a chance of this deadlock occurring.
MongoClient spawns multiple threads to run background tasks such as monitoringconnected servers. These threads share state that is protected by instances ofLock
, which are themselves not fork-safe. Thedriver is therefore subject to the same limitations as any other multithreadedcode that uses Lock
(and mutexes in general). One of theselimitations is that the locks become useless after fork()
. During the fork,all locks are copied over to the child process in the same state as they werein the parent: if they were locked, the copied locks are also locked. The childcreated by fork()
only has one thread, so any locks that were taken out byother threads in the parent will never be released in the child. The next timethe child process attempts to acquire one of these locks, deadlock occurs.
For a long but interesting read about the problems of Python locks inmultithreaded contexts with fork()
, see http://bugs.python.org/issue6721.
How does connection pooling work in PyMongo?
Every MongoClient
instance has a built-inconnection pool per server in your MongoDB topology. These pools open socketson demand to support the number of concurrent MongoDB operations that yourmulti-threaded application requires. There is no thread-affinity for sockets.
The size of each connection pool is capped at maxPoolSize
, which defaultsto 100. If there are maxPoolSize
connections to a server and all are inuse, the next request to that server will wait until one of the connectionsbecomes available.
The client instance opens one additional socket per server in your MongoDBtopology for monitoring the server’s state.
For example, a client connected to a 3-node replica set opens 3 monitoringsockets. It also opens as many sockets as needed to support a multi-threadedapplication’s concurrent operations on each server, up to maxPoolSize
. Witha maxPoolSize
of 100, if the application only uses the primary (thedefault), then only the primary connection pool grows and the total connectionsis at most 103. If the application uses aReadPreference
to query the secondaries,their pools also grow and the total connections can reach 303.
It is possible to set the minimum number of concurrent connections to eachserver with minPoolSize
, which defaults to 0. The connection pool will beinitialized with this number of sockets. If sockets are closed due to anynetwork errors, causing the total number of sockets (both in use and idle) todrop below the minimum, more sockets are opened until the minimum is reached.
The maximum number of milliseconds that a connection can remain idle in thepool before being removed and replaced can be set with maxIdleTime
, whichdefaults to None (no limit).
The default configuration for a MongoClient
works for most applications:
- client = MongoClient(host, port)
Create this client once for each process, and reuse it for alloperations. It is a common mistake to create a new client for each request,which is very inefficient.
To support extremely high numbers of concurrent MongoDB operations within oneprocess, increase maxPoolSize
:
- client = MongoClient(host, port, maxPoolSize=200)
… or make it unbounded:
- client = MongoClient(host, port, maxPoolSize=None)
Once the pool reaches its maximum size, additional threads have to wait forsockets to become available. PyMongo does not limit the number of threadsthat can wait for sockets to become available and it is the application’sresponsibility to limit the size of its thread pool to bound queuing during aload spike. Threads are allowed to wait for any length of time unlesswaitQueueTimeoutMS
is defined:
- client = MongoClient(host, port, waitQueueTimeoutMS=100)
A thread that waits more than 100ms (in this example) for a socket raisesConnectionFailure
. Use this option if it is moreimportant to bound the duration of operations during a load spike than it is tocomplete every operation.
When close()
is called by any thread,all idle sockets are closed, and all sockets that are in use will be closed asthey are returned to the pool.
Does PyMongo support Python 3?
PyMongo supports CPython 3.4+ and PyPy3.5+. See the Python 3 FAQ for details.
Does PyMongo support asynchronous frameworks like Gevent, asyncio, Tornado, or Twisted?
PyMongo fully supports Gevent.
To use MongoDB with asyncioor Tornado, see theMotor project.
For Twisted, see TxMongo. Its stated mission is to keep featureparity with PyMongo.
Why does PyMongo add an _id field to all of my documents?
When a document is inserted to MongoDB usinginsert_one()
,insert_many()
, orbulk_write()
, and that document does notinclude an _id
field, PyMongo automatically adds one for you, set to aninstance of ObjectId
. For example:
- >>> my_doc = {'x': 1}
- >>> collection.insert_one(my_doc)
- <pymongo.results.InsertOneResult object at 0x7f3fc25bd640>
- >>> my_doc
- {'x': 1, '_id': ObjectId('560db337fba522189f171720')}
Users often discover this behavior when callinginsert_many()
with a list of referencesto a single document raises BulkWriteError
. SeveralPython idioms lead to this pitfall:
- >>> doc = {}
- >>> collection.insert_many(doc for _ in range(10))
- Traceback (most recent call last):
- ...
- pymongo.errors.BulkWriteError: batch op errors occurred
- >>> doc
- {'_id': ObjectId('560f171cfba52279f0b0da0c')}
- >>> docs = [{}]
- >>> collection.insert_many(docs * 10)
- Traceback (most recent call last):
- ...
- pymongo.errors.BulkWriteError: batch op errors occurred
- >>> docs
- [{'_id': ObjectId('560f1933fba52279f0b0da0e')}]
PyMongo adds an _id
field in this manner for a few reasons:
- All MongoDB documents are required to have an
_id
field. - If PyMongo were to insert a document without an
_id
MongoDB would add oneitself, but it would not report the value back to PyMongo. - Copying the document to insert before adding the
_id
field would beprohibitively expensive for most high write volume applications.
If you don’t want PyMongo to add an _id
to your documents, insert onlydocuments that already have an _id
field, added by your application.
Key order in subdocuments – why does my query work in the shell but not PyMongo?
The key-value pairs in a BSON document can have any order (except that _id
is always first). The mongo shell preserves key order when reading and writingdata. Observe that “b” comes before “a” when we create the document and when itis displayed:
- > // mongo shell.
- > db.collection.insert( { "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } } )
- WriteResult({ "nInserted" : 1 })
- > db.collection.find()
- { "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }
PyMongo represents BSON documents as Python dicts by default, and the orderof keys in dicts is not defined. That is, a dict declared with the “a” keyfirst is the same, to Python, as one with “b” first:
- >>> print({'a': 1.0, 'b': 1.0})
- {'a': 1.0, 'b': 1.0}
- >>> print({'b': 1.0, 'a': 1.0})
- {'a': 1.0, 'b': 1.0}
Therefore, Python dicts are not guaranteed to show keys in the order they arestored in BSON. Here, “a” is shown before “b”:
- >>> print(collection.find_one())
- {u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}
To preserve order when reading BSON, use the SON
class,which is a dict that remembers its key order. First, get a handle to thecollection, configured to use SON
instead of dict:
- >>> from bson import CodecOptions, SON
- >>> opts = CodecOptions(document_class=SON)
- >>> opts
- CodecOptions(document_class=<class 'bson.son.SON'>,
- tz_aware=False,
- uuid_representation=PYTHON_LEGACY,
- unicode_decode_error_handler='strict',
- tzinfo=None, type_registry=TypeRegistry(type_codecs=[],
- fallback_encoder=None))
- >>> collection_son = collection.with_options(codec_options=opts)
Now, documents and subdocuments in query results are represented withSON
objects:
- >>> print(collection_son.find_one())
- SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])
The subdocument’s actual storage layout is now visible: “b” is before “a”.
Because a dict’s key order is not defined, you cannot predict how it will beserialized to BSON. But MongoDB considers subdocuments equal only if theirkeys have the same order. So if you use a dict to query on a subdocument it maynot match:
- >>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None
- True
Swapping the key order in your query makes no difference:
- >>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None
- True
… because, as we saw above, Python considers the two dicts the same.
There are two solutions. First, you can match the subdocument field-by-field:
- >>> collection.find_one({'subdocument.a': 1.0,
- ... 'subdocument.b': 1.0})
- {u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}
The query matches any subdocument with an “a” of 1.0 and a “b” of 1.0,regardless of the order you specify them in Python or the order they are storedin BSON. Additionally, this query now matches subdocuments with additionalkeys besides “a” and “b”, whereas the previous query required an exact match.
The second solution is to use a SON
to specify the key order:
- >>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
- >>> collection.find_one(query)
- {u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}
The key order you use when you create a SON
is preservedwhen it is serialized to BSON and used as a query. Thus you can create asubdocument that exactly matches the subdocument in the collection.
See also
MongoDB Manual entry on subdocument matching.
What does CursorNotFound cursor id not valid at server mean?
Cursors in MongoDB can timeout on the server if they’ve been open fora long time without any operations being performed on them. This canlead to an CursorNotFound
exception beingraised when attempting to iterate the cursor.
How do I change the timeout value for cursors?
MongoDB doesn’t support custom timeouts for cursors, but cursortimeouts can be turned off entirely. Pass no_cursor_timeout=True
tofind()
.
How can I store decimal.Decimal instances?
PyMongo >= 3.4 supports the Decimal128 BSON type introduced in MongoDB 3.4.See decimal128
for more information.
MongoDB <= 3.2 only supports IEEE 754 floating points - the same as thePython float type. The only way PyMongo could store Decimal instances tothese versions of MongoDB would be to convert them to this standard, soyou’d really only be storing floats anyway - we force users to do thisconversion explicitly so that they are aware that it is happening.
I’m saving 9.99 but when I query my document contains 9.9900000000000002 - what’s going on here?
The database representation is 9.99
as an IEEE floating point (whichis common to MongoDB and Python as well as most other modernlanguages). The problem is that 9.99
cannot be represented exactlywith a double precision floating point - this is true in some versions ofPython as well:
- >>> 9.99
- 9.9900000000000002
The result that you get when you save 9.99
with PyMongo is exactly thesame as the result you’d get saving it with the JavaScript shell orany of the other languages (and as the data you’re working with whenyou type 9.99
into a Python program).
Can you add attribute style access for documents?
This request has come up a number of times but we’ve decided not toimplement anything like this. The relevant jira case has some informationabout the decision, but here is a brief summary:
- This will pollute the attribute namespace for documents, so couldlead to subtle bugs / confusing errors when using a key with thesame name as a dictionary method.
- The only reason we even use SON objects instead of regulardictionaries is to maintain key ordering, since the serverrequires this for certain operations. So we’re hesitant toneedlessly complicate SON (at some point it’s hypotheticallypossible we might want to revert back to using dictionaries alone,without breaking backwards compatibility for everyone).
- It’s easy (and Pythonic) for new users to deal with documents,since they behave just like dictionaries. If we start changingtheir behavior it adds a barrier to entry for new users - anotherclass to learn.
What is the correct way to handle time zones with PyMongo?
See Datetimes and Timezones for examples on how to handledatetime
objects correctly.
How can I save a datetime.date instance?
PyMongo doesn’t support saving datetime.date
instances, sincethere is no BSON type for dates without times. Rather than having thedriver enforce a convention for converting datetime.date
instances to datetime.datetime
instances for you, anyconversion should be performed in your client code.
When I query for a document by ObjectId in my web application I get no result
It’s common in web applications to encode documents’ ObjectIds in URLs, like:
- "/posts/50b3bda58a02fb9a84d8991e"
Your web framework will pass the ObjectId portion of the URL to your requesthandler as a string, so it must be converted to ObjectId
before it is passed to find_one()
. It is acommon mistake to forget to do this conversion. Here’s how to do it correctlyin Flask (other web frameworks are similar):
- from pymongo import MongoClient
- from bson.objectid import ObjectId
- from flask import Flask, render_template
- client = MongoClient()
- app = Flask(__name__)
- @app.route("/posts/<_id>")
- def show_post(_id):
- # NOTE!: converting _id from string to ObjectId before passing to find_one
- post = client.db.posts.find_one({'_id': ObjectId(_id)})
- return render_template('post.html', post=post)
- if __name__ == "__main__":
- app.run()
See also
How can I use PyMongo from Django?
Django is a popular Python webframework. Django includes an ORM, django.db
. Currently,there’s no official MongoDB backend for Django.
django-mongodb-engineis an unofficial MongoDB backend that supports Django aggregations, (atomic)updates, embedded objects, Map/Reduce and GridFS. It allows you to use mostof Django’s built-in features, including the ORM, admin, authentication, siteand session frameworks and caching.
However, it’s easy to use MongoDB (and PyMongo) from Djangowithout using a Django backend. Certain features of Django that requiredjango.db
(admin, authentication and sessions) will not workusing just MongoDB, but most of what Django provides can still beused.
One project which should make working with MongoDB and Django easieris mango. Mango is a set ofMongoDB backends for Django sessions and authentication (bypassingdjango.db
entirely).
Does PyMongo work with mod_wsgi?
Yes. See the configuration guide for PyMongo and mod_wsgi.
Does PyMongo work with PythonAnywhere?
No. PyMongo creates Python threads whichPythonAnywhere does not support. For moreinformation see PYTHON-1495.
How can I use something like Python’s json module to encode my documents to JSON?
json_util
is PyMongo’s built in, flexible tool for usingPython’s json
module with BSON documents and MongoDB Extended JSON. Thejson
module won’t work out of the box with all documents from PyMongoas PyMongo supports some special types (like ObjectId
and DBRef
) that are not supported in JSON.
python-bsonjs is a fastBSON to MongoDB Extended JSON converter built on top oflibbson. python-bsonjs does notdepend on PyMongo and can offer a nice performance improvement overjson_util
. python-bsonjs works best with PyMongo when usingRawBSONDocument
.
Why do I get OverflowError decoding dates stored by another language’s driver?
PyMongo decodes BSON datetime values to instances of Python’sdatetime.datetime
. Instances of datetime.datetime
arelimited to years between datetime.MINYEAR
(usually 1) anddatetime.MAXYEAR
(usually 9999). Some MongoDB drivers (e.g. the PHPdriver) can store BSON datetimes with year values far outside those supportedby datetime.datetime
.
There are a few ways to work around this issue. One option is to filterout documents with values outside of the range supported bydatetime.datetime
:
- >>> from datetime import datetime
- >>> coll = client.test.dates
- >>> cur = coll.find({'dt': {'$gte': datetime.min, '$lte': datetime.max}})
Another option, assuming you don’t need the datetime field, is to filter outjust that field:
- >>> cur = coll.find({}, projection={'dt': False})
Using PyMongo with Multiprocessing
On Unix systems the multiprocessing module spawns processes using fork()
.Care must be taken when using instances ofMongoClient
with fork()
. Specifically,instances of MongoClient must not be copied from a parent process to a childprocess. Instead, the parent process and each child process must create theirown instances of MongoClient. For example:
- # Each process creates its own instance of MongoClient.
- def func():
- db = pymongo.MongoClient().mydb
- # Do something with db.
- proc = multiprocessing.Process(target=func)
- proc.start()
Never do this:
- client = pymongo.MongoClient()
- # Each child process attempts to copy a global MongoClient
- # created in the parent process. Never do this.
- def func():
- db = client.mydb
- # Do something with db.
- proc = multiprocessing.Process(target=func)
- proc.start()
Instances of MongoClient copied from the parent process have a high probabilityof deadlock in the child process due toinherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if thereis a chance of this deadlock occurring.
See also