Collations
See also
The API docs for collation.
Collations are a new feature in MongoDB version 3.4. They provide a set of rules to use when comparing strings that comply with the conventions of a particular language, such as Spanish or German. If no collation is specified, the server sorts strings based on a binary comparison. Many languages have specific ordering rules, and collations allow users to build applications that adhere to language-specific comparison rules.
In French, for example, the last accent in a given word determines the sorting order. The correct sorting order for the following four words in French is:
cote < côte < coté < côté
Specifying a French collation allows users to sort string fields using the French sort order.
Usage
Users can specify a collation for a collection, an index, or a CRUD command.
Collation Parameters:
Collations can be specified with the Collation model or with plain Python dictionaries. The structure is the same:
Collation(locale=<string>,
caseLevel=<bool>,
caseFirst=<string>,
strength=<int>,
numericOrdering=<bool>,
alternate=<string>,
maxVariable=<string>,
backwards=<bool>)
The only required parameter is locale
, which the server parses as an ICU format locale ID. For example, set locale
to en_US
to represent US English or fr_CA
to represent Canadian French.
For a complete description of the available parameters, see the MongoDB manual.
Assign a Default Collation to a Collection
The following example demonstrates how to create a new collection called contacts
and assign a default collation with the fr_CA
locale. This operation ensures that all queries that are run against the contacts
collection use the fr_CA
collation unless another collation is explicitly specified:
from pymongo import MongoClient
from pymongo.collation import Collation
db = MongoClient().test
collection = db.create_collection('contacts',
collation=Collation(locale='fr_CA'))
Assign a Default Collation to an Index
When creating a new index, you can specify a default collation.
The following example shows how to create an index on the name
field of the contacts
collection, with the unique
parameter enabled and a default collation with locale
set to fr_CA
:
from pymongo import MongoClient
from pymongo.collation import Collation
contacts = MongoClient().test.contacts
contacts.create_index('name',
unique=True,
collation=Collation(locale='fr_CA'))
Specify a Collation for a Query
Individual queries can specify a collation to use when sorting results. The following example demonstrates a query that runs on the contacts
collection in database test
. It matches on documents that contain New York
in the city
field, and sorts on the name
field with the fr_CA
collation:
from pymongo import MongoClient
from pymongo.collation import Collation
collection = MongoClient().test.contacts
docs = collection.find({'city': 'New York'}).sort('name').collation(
Collation(locale='fr_CA'))
Other Query Types
You can use collations to control document matching rules for several different types of queries. All the various update and delete methods (update_one(), update_many(), delete_one(), etc.) support collation, and you can create query filters which employ collations to comply with any of the languages and variants available to the locale
parameter.
The following example uses a collation with strength
set to SECONDARY, which considers only the base character and character accents in string comparisons, but not case sensitivity, for example. All documents in the contacts
collection with jürgen
(case-insensitive) in the first_name
field are updated:
from pymongo import MongoClient
from pymongo.collation import Collation, CollationStrength
contacts = MongoClient().test.contacts
result = contacts.update_many(
{'first_name': 'jürgen'},
{'$set': {'verified': 1}},
collation=Collation(locale='de',
strength=CollationStrength.SECONDARY))