gridfs – Tools for working with GridFS
GridFS is a specification for storing large objects in Mongo.
The gridfs
package is an implementation of GridFS on top ofpymongo
, exposing a file-like interface.
See also
The MongoDB documentation on
- class
gridfs.
GridFS
(database, collection='fs', disable_md5=False) - Create a new instance of
GridFS
.
Raises TypeError
if database is not an instance ofDatabase
.
Parameters:
- database: database to use
- collection (optional): root collection to use
- disable_md5 (optional): When True, MD5 checksums will not becomputed for uploaded files. Useful in environments where MD5cannot be used for regulatory or other reasons. Defaults to False.
Changed in version 3.1: Indexes are only ensured on the first write to the DB.
Changed in version 3.0: database must use an acknowledgedwrite_concern
See also
The MongoDB documentation on
Deletes all data belonging to the file with "id"
:_file_id.
Warning
Any processes/threads reading from the file whilethis method is executing will likely see an invalid/corruptfile. Care should be taken to avoid concurrent reads to a filewhile it is being deleted.
Note
Deletes of non-existent files are considered successfulsince the end result is the same: no file with that _id remains.
Parameters:
- _file_id_: <code>"_id"</code> of the file to delete
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
Changed in version 3.1: delete
no longer ensures indexes.
exists
(document_or_id=None, session=None, **kwargs)- Check if a file exists in this instance of
GridFS
.
The file to check for can be specified by the value of its_id
key, or by passing in a query document. A querydocument can be passed in as dictionary, or by using keywordarguments. Thus, the following three calls are equivalent:
- >>> fs.exists(file_id)
- >>> fs.exists({"_id": file_id})
- >>> fs.exists(_id=file_id)
As are the following two calls:
- >>> fs.exists({"filename": "mike.txt"})
- >>> fs.exists(filename="mike.txt")
And the following two:
- >>> fs.exists({"foo": {"$gt": 12}})
- >>> fs.exists(foo={"$gt": 12})
Returns True
if a matching file exists, False
otherwise. Calls to exists()
will not automaticallycreate appropriate indexes; application developers should besure to create indexes if needed and as appropriate.
Parameters:
- _document_or_id_ (optional): query document, or _id of thedocument to check for
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
- _**kwargs_ (optional): keyword arguments are used as aquery document, if they’re present.
Changed in version 3.6: Added session
parameter.
Returns a cursor that iterates across files matchingarbitrary queries on the files collection. Can be combinedwith other modifiers for additional control. For example:
- for grid_out in fs.find({"filename": "lisa.txt"},
- no_cursor_timeout=True):
- data = grid_out.read()
would iterate through all versions of “lisa.txt” stored in GridFS.Note that setting no_cursor_timeout to True may be important toprevent the cursor from timing out during long multi-file processingwork.
As another example, the call:
- most_recent_three = fs.find().sort("uploadDate", -1).limit(3)
would return a cursor to the three most recently uploaded filesin GridFS.
Follows a similar interface tofind()
in Collection
.
If a ClientSession
is passed tofind()
, all returned GridOut
instancesare associated with that session.
Parameters:
- _filter_ (optional): a SON object specifying elements whichmust be present for a document to be included in theresult set
- _skip_ (optional): the number of files to omit (fromthe start of the result set) when returning the results
- _limit_ (optional): the maximum number of results toreturn
- _no_cursor_timeout_ (optional): if False (the default), anyreturned cursor is closed by the server after 10 minutes ofinactivity. If set to True, the returned cursor will nevertime out on the server. Care should be taken to ensure thatcursors with no_cursor_timeout turned on are properly closed.
- _sort_ (optional): a list of (key, direction) pairsspecifying the sort order for this query. See[<code>sort()</code>]($11aa48d96c71b56e.md#pymongo.cursor.Cursor.sort) for details.
Raises TypeError
if any of the arguments are ofimproper type. Returns an instance ofGridOutCursor
corresponding to this query.
Changed in version 3.0: Removed the read_preference, tag_sets, andsecondary_acceptable_latency_ms options.
New in version 2.7.
See also
The MongoDB documentation on
All arguments to find()
are also valid arguments forfind_one()
, although any limit argument will beignored. Returns a single GridOut
,or None
if no matching file is found. For example:
- file = fs.find_one({"filename": "lisa.txt"})
Parameters:
- _filter_ (optional): a dictionary specifyingthe query to be performing OR any other type to be used asthe value for a query for <code>"_id"</code> in the file collection.
- _*args_ (optional): any additional positional arguments arethe same as the arguments to [<code>find()</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFS.find).
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
- _**kwargs_ (optional): any additional keyword argumentsare the same as the arguments to [<code>find()</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFS.find).
Changed in version 3.6: Added session
parameter.
Returns an instance of GridOut
,which provides a file-like interface for reading.
Parameters:
- _file_id_: <code>"_id"</code> of the file to get
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
getlast_version
(_filename=None, session=None, **kwargs)- Get the most recent version of a file in GridFS by
"filename"
or metadata fields.
Equivalent to calling get_version()
with the defaultversion (-1
).
Parameters:
- _filename_: <code>"filename"</code> of the file to get, or _None_
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
- _**kwargs_ (optional): find files by custom metadata.
Changed in version 3.6: Added session
parameter.
getversion
(_filename=None, version=-1, session=None, **kwargs)- Get a file from GridFS by
"filename"
or metadata fields.
Returns a version of the file in GridFS whose filename matchesfilename and whose metadata fields match the supplied keywordarguments, as an instance of GridOut
.
Version numbering is a convenience atop the GridFS API providedby MongoDB. If more than one file matches the query (either byfilename alone, by metadata fields, or by a combination ofboth), then version -1
will be the most recently uploadedmatching file, -2
the second most recentlyuploaded, etc. Version 0
will be the first versionuploaded, 1
the second version, etc. So if three versionshave been uploaded, then version 0
is the same as version-3
, version 1
is the same as version -2
, andversion 2
is the same as version -1
.
Raises NoFile
if no such version ofthat file exists.
Parameters:
- _filename_: <code>"filename"</code> of the file to get, or _None_
- _version_ (optional): version of the file to get (defaultsto -1, the most recent version uploaded)
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
- _**kwargs_ (optional): find files by custom metadata.
Changed in version 3.6: Added session
parameter.
Changed in version 3.1: get_version
no longer ensures indexes.
list
(session=None)- List the names of all files stored in this instance of
GridFS
.
Parameters:
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
Changed in version 3.1: list
no longer ensures indexes.
Returns a new GridIn
instance towhich data can be written. Any keyword arguments will bepassed through to GridIn()
.
If the "_id"
of the file is manually specified, it mustnot already exist in GridFS. OtherwiseFileExists
is raised.
Parameters:
- _**kwargs_ (optional): keyword arguments for file creation
Equivalent to doing:
- try:
- f = new_file(**kwargs)
- f.write(data)
- finally:
- f.close()
data can be either an instance of str
(bytes
in python 3) or a file-like object providing a read()
method.If an encoding keyword argument is passed, data can also be aunicode
(str
in python 3) instance, which willbe encoded as encoding before being written. Any keyword argumentswill be passed through to the created file - seeGridIn()
for possible arguments. Returns the"_id"
of the created file.
If the "_id"
of the file is manually specified, it mustnot already exist in GridFS. OtherwiseFileExists
is raised.
Parameters:
- _data_: data to be written as a file.
- _**kwargs_ (optional): keyword arguments for file creation
Changed in version 3.0: w=0 writes to GridFS are now prohibited.
- class
gridfs.
GridFSBucket
(db, bucket_name='fs', chunk_size_bytes=261120, write_concern=None, read_preference=None, disable_md5=False) - Create a new instance of
GridFSBucket
.
Raises TypeError
if database is not an instance ofDatabase
.
Raises ConfigurationError
if _write_concern_is not acknowledged.
Parameters:
- database: database to use.
- bucket_name (optional): The name of the bucket. Defaults to ‘fs’.
- chunk_size_bytes (optional): The chunk size in bytes. Defaultsto 255KB.
- write_concern (optional): The
WriteConcern
to use. IfNone
(the default) db.write_concern is used. - read_preference (optional): The read preference to use. If
None
(the default) db.read_preference is used. - disable_md5 (optional): When True, MD5 checksums will not becomputed for uploaded files. Useful in environments where MD5cannot be used for regulatory or other reasons. Defaults to False.
New in version 3.1.
See also
The MongoDB documentation on
delete
(file_id, session=None)- Given an file_id, delete this stored file’s files collection documentand associated chunks from a GridFS bucket.
For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- # Get _id of file to delete
- file_id = fs.upload_from_stream("test_file", "data I want to store!")
- fs.delete(file_id)
Raises NoFile
if no file with file_id exists.
Parameters:
- _file_id_: The _id of the file to be deleted.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
downloadto_stream
(_file_id, destination, session=None)- Downloads the contents of the stored file specified by fileid andwrites the contents to _destination.
For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- # Get _id of file to read
- file_id = fs.upload_from_stream("test_file", "data I want to store!")
- # Get file to write to
- file = open('myfile','wb+')
- fs.download_to_stream(file_id, file)
- file.seek(0)
- contents = file.read()
Raises NoFile
if no file with file_id exists.
Parameters:
- _file_id_: The _id of the file to be downloaded.
- _destination_: a file-like object implementing <code>write()</code>.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
downloadto_stream_by_name
(_filename, destination, revision=-1, session=None)- Write the contents of filename (with optional revision) todestination.
For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- # Get file to write to
- file = open('myfile','wb')
- fs.download_to_stream_by_name("test_file", file)
Raises NoFile
if no such version ofthat file exists.
Raises ValueError
if filename is not a string.
Parameters:
- _filename_: The name of the file to read from.
- _destination_: A file-like object that implements <code>write()</code>.
- _revision_ (optional): Which revision (documents with the samefilename and different uploadDate) of the file to retrieve.Defaults to -1 (the most recent revision).
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)Note:
Revision numbers are defined as follows:
- 0 = the original stored file
- 1 = the first revision
- 2 = the second revision
- etc…
- -2 = the second most recent revision
- -1 = the most recent revision
Changed in version 3.6: Added session
parameter.
Returns a cursor that iterates across files matchingarbitrary queries on the files collection. Can be combinedwith other modifiers for additional control.
For example:
- for grid_data in fs.find({"filename": "lisa.txt"},
- no_cursor_timeout=True):
- data = grid_data.read()
would iterate through all versions of “lisa.txt” stored in GridFS.Note that setting no_cursor_timeout to True may be important toprevent the cursor from timing out during long multi-file processingwork.
As another example, the call:
- most_recent_three = fs.find().sort("uploadDate", -1).limit(3)
would return a cursor to the three most recently uploaded filesin GridFS.
Follows a similar interface tofind()
in Collection
.
If a ClientSession
is passed tofind()
, all returned GridOut
instancesare associated with that session.
Parameters:
- _filter_: Search query.
- _batch_size_ (optional): The number of documents to return perbatch.
- _limit_ (optional): The maximum number of documents to return.
- _no_cursor_timeout_ (optional): The server normally times out idlecursors after an inactivity period (10 minutes) to prevent excessmemory use. Set this option to True prevent that.
- _skip_ (optional): The number of documents to skip beforereturning.
- _sort_ (optional): The order by which to sort results. Defaults toNone.
opendownload_stream
(_file_id, session=None)- Opens a Stream from which the application can read the contents ofthe stored file specified by file_id.
For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- # get _id of file to read.
- file_id = fs.upload_from_stream("test_file", "data I want to store!")
- grid_out = fs.open_download_stream(file_id)
- contents = grid_out.read()
Returns an instance of GridOut
.
Raises NoFile
if no file with file_id exists.
Parameters:
- _file_id_: The _id of the file to be downloaded.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
opendownload_stream_by_name
(_filename, revision=-1, session=None)- Opens a Stream from which the application can read the contents offilename and optional revision.
For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- grid_out = fs.open_download_stream_by_name("test_file")
- contents = grid_out.read()
Returns an instance of GridOut
.
Raises NoFile
if no such version ofthat file exists.
Raises ValueError
filename is not a string.
Parameters:
- _filename_: The name of the file to read from.
- _revision_ (optional): Which revision (documents with the samefilename and different uploadDate) of the file to retrieve.Defaults to -1 (the most recent revision).
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)Note:
Revision numbers are defined as follows:
- 0 = the original stored file
- 1 = the first revision
- 2 = the second revision
- etc…
- -2 = the second most recent revision
- -1 = the most recent revision
Changed in version 3.6: Added session
parameter.
openupload_stream
(_filename, chunk_size_bytes=None, metadata=None, session=None)- Opens a Stream that the application can write the contents of thefile to.
The user must specify the filename, and can choose to add anyadditional information in the metadata field of the file document ormodify the chunk size.For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- grid_in = fs.open_upload_stream(
- "test_file", chunk_size_bytes=4,
- metadata={"contentType": "text/plain"})
- grid_in.write("data I want to store!")
- grid_in.close() # uploaded on close
Returns an instance of GridIn
.
Raises NoFile
if no such version ofthat file exists.Raises ValueError
if filename is not a string.
Parameters:
- _filename_: The name of the file to upload.
- _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes in [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
- _metadata_ (optional): User data for the ‘metadata’ field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
openupload_stream_with_id
(_file_id, filename, chunk_size_bytes=None, metadata=None, session=None)- Opens a Stream that the application can write the contents of thefile to.
The user must specify the file id and filename, and can choose to addany additional information in the metadata field of the file documentor modify the chunk size.For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- grid_in = fs.open_upload_stream_with_id(
- ObjectId(),
- "test_file",
- chunk_size_bytes=4,
- metadata={"contentType": "text/plain"})
- grid_in.write("data I want to store!")
- grid_in.close() # uploaded on close
Returns an instance of GridIn
.
Raises NoFile
if no such version ofthat file exists.Raises ValueError
if filename is not a string.
Parameters:
- _file_id_: The id to use for this file. The id must not havealready been used for another file.
- _filename_: The name of the file to upload.
- _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes in [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
- _metadata_ (optional): User data for the ‘metadata’ field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- # Get _id of file to rename
- file_id = fs.upload_from_stream("test_file", "data I want to store!")
- fs.rename(file_id, "new_test_name")
Raises NoFile
if no file with file_id exists.
Parameters:
- _file_id_: The _id of the file to be renamed.
- _new_filename_: The new name of the file.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
uploadfrom_stream
(_filename, source, chunk_size_bytes=None, metadata=None, session=None)- Uploads a user file to a GridFS bucket.
Reads the contents of the user file from source and uploadsit to the file filename. Source can be a string or file-like object.For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- file_id = fs.upload_from_stream(
- "test_file",
- "data I want to store!",
- chunk_size_bytes=4,
- metadata={"contentType": "text/plain"})
Returns the _id of the uploaded file.
Raises NoFile
if no such version ofthat file exists.Raises ValueError
if filename is not a string.
Parameters:
- _filename_: The name of the file to upload.
- _source_: The source stream of the content to be uploaded. Must bea file-like object that implements <code>read()</code> or a string.
- _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes of [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
- _metadata_ (optional): User data for the ‘metadata’ field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
uploadfrom_stream_with_id
(_file_id, filename, source, chunk_size_bytes=None, metadata=None, session=None)- Uploads a user file to a GridFS bucket with a custom file id.
Reads the contents of the user file from source and uploadsit to the file filename. Source can be a string or file-like object.For example:
- my_db = MongoClient().test
- fs = GridFSBucket(my_db)
- file_id = fs.upload_from_stream(
- ObjectId(),
- "test_file",
- "data I want to store!",
- chunk_size_bytes=4,
- metadata={"contentType": "text/plain"})
Raises NoFile
if no such version ofthat file exists.Raises ValueError
if filename is not a string.
Parameters:
- _file_id_: The id to use for this file. The id must not havealready been used for another file.
- _filename_: The name of the file to upload.
- _source_: The source stream of the content to be uploaded. Must bea file-like object that implements <code>read()</code> or a string.
- _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes of [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
- _metadata_ (optional): User data for the ‘metadata’ field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
- _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
Changed in version 3.6: Added session
parameter.
Sub-modules: