Collations

See also

The API docs for collation.

Collations are a new feature in MongoDB version 3.4. They provide a set of rules to use when comparing strings that comply with the conventions of a particular language, such as Spanish or German. If no collation is specified, the server sorts strings based on a binary comparison. Many languages have specific ordering rules, and collations allow users to build applications that adhere to language-specific comparison rules.

In French, for example, the last accent in a given word determines the sorting order. The correct sorting order for the following four words in French is:

  1. cote < côte < coté < côté

Specifying a French collation allows users to sort string fields using the French sort order.

Usage

Users can specify a collation for a collection, an index, or a CRUD command.

Collation Parameters:

Collations can be specified with the Collation model or with plain Python dictionaries. The structure is the same:

  1. Collation(locale=<string>,
  2. caseLevel=<bool>,
  3. caseFirst=<string>,
  4. strength=<int>,
  5. numericOrdering=<bool>,
  6. alternate=<string>,
  7. maxVariable=<string>,
  8. backwards=<bool>)

The only required parameter is locale, which the server parses as an ICU format locale ID. For example, set locale to en_US to represent US English or fr_CA to represent Canadian French.

For a complete description of the available parameters, see the MongoDB manual.

Assign a Default Collation to a Collection

The following example demonstrates how to create a new collection called contacts and assign a default collation with the fr_CA locale. This operation ensures that all queries that are run against the contacts collection use the fr_CA collation unless another collation is explicitly specified:

  1. from pymongo import MongoClient
  2. from pymongo.collation import Collation
  3. db = MongoClient().test
  4. collection = db.create_collection('contacts',
  5. collation=Collation(locale='fr_CA'))

Assign a Default Collation to an Index

When creating a new index, you can specify a default collation.

The following example shows how to create an index on the name field of the contacts collection, with the unique parameter enabled and a default collation with locale set to fr_CA:

  1. from pymongo import MongoClient
  2. from pymongo.collation import Collation
  3. contacts = MongoClient().test.contacts
  4. contacts.create_index('name',
  5. unique=True,
  6. collation=Collation(locale='fr_CA'))

Specify a Collation for a Query

Individual queries can specify a collation to use when sorting results. The following example demonstrates a query that runs on the contacts collection in database test. It matches on documents that contain New York in the city field, and sorts on the name field with the fr_CA collation:

  1. from pymongo import MongoClient
  2. from pymongo.collation import Collation
  3. collection = MongoClient().test.contacts
  4. docs = collection.find({'city': 'New York'}).sort('name').collation(
  5. Collation(locale='fr_CA'))

Other Query Types

You can use collations to control document matching rules for several different types of queries. All the various update and delete methods (update_one(), update_many(), delete_one(), etc.) support collation, and you can create query filters which employ collations to comply with any of the languages and variants available to the locale parameter.

The following example uses a collation with strength set to SECONDARY, which considers only the base character and character accents in string comparisons, but not case sensitivity, for example. All documents in the contacts collection with jürgen (case-insensitive) in the first_name field are updated:

  1. from pymongo import MongoClient
  2. from pymongo.collation import Collation, CollationStrength
  3. contacts = MongoClient().test.contacts
  4. result = contacts.update_many(
  5. {'first_name': 'jürgen'},
  6. {'$set': {'verified': 1}},
  7. collation=Collation(locale='de',
  8. strength=CollationStrength.SECONDARY))