Shard Keys

Shard Keys

The shard key determines the distribution of the collection’s documents among the cluster’s shards. The shard key iseither an indexed field or indexed compoundfields that exists in every document in the collection.

MongoDB partitions data in the collection using rangesof shard key values. Each range defines a non-overlapping range of shard keyvalues and is associated with a chunk.

MongoDB attempts to distribute chunks evenly among the shards in the cluster.The shard key has a direct relationship to the effectiveness of chunkdistribution. See Choosing a Shard Key.

Important

Once you shard a collection, the selection of the shard key isimmutable; i.e. you cannot select a different shard key for thatcollection.
Starting in MongoDB 4.2, you can update a document’s shard key valueunless the shard key field is the immutable _id field. For detailson updating the shard key, see Change a Document’s Shard Key Value.

Before MongoDB 4.2, a document’s shard key field value is immutable.

Shard Key Specification

To shard a collection, you must specify the target collection and theshard key to the sh.shardCollection() method:

sh.shardCollection( namespace, key )

The namespace parameter consists of a string<database>.<collection> specifying the full namespace ofthe target collection.
The key parameter consists of a document containing a field andthe index traversal direction for that field.

Important

Once you shard a collection, the selection of the shard key isimmutable; i.e. you cannot select a different shard key for thatcollection.
Starting in MongoDB 4.2, you can update a document’s shard key valueunless the shard key field is the immutable _id field. For detailson updating the shard key, see Change a Document’s Shard Key Value.

Before MongoDB 4.2, a document’s shard key field value is immutable.

For instructions specific to sharding a collection using thehashed sharding strategy, seeShard a Collection using Hashed Sharding

For instructions specific to sharding a collection using theranged sharding strategy, seeShard a Collection using Ranged Sharding.

Change a Document’s Shard Key Value

Starting in MongoDB 4.2, you can update a document’s shard key valueunless the shard key field is the immutable _id field. To update,use the following operations to update a document’s shard key value:

Command	Method
`update` with `multi: false`	`db.collection.replaceOne()db.collection.updateOne()db.collection.update()` with `multi: false`
`findAndModify`	`db.collection.findOneAndReplace()db.collection.findOneAndUpdate()db.collection.findAndModify()`
	`db.collection.bulkWrite()Bulk.find.updateOne()`If the shard key modification results in moving the document toanother shard, you cannot specify more than one shard keymodification in the bulk operation; i.e. batch size of1.If the shard key modification does not result in moving thedocument to another shard, you can specify multiple shardkey modification in the bulk operation.

Shard Key Indexes

All sharded collections must have an index that supports theshard key; i.e. the index can be an index on the shard key or acompound index where the shard key is a prefix of the index.

If the collection is empty, sh.shardCollection() createsthe index on the shard key if such an index does not already exists.
If the collection is not empty, you must create the index firstbefore using sh.shardCollection().

If you drop the last valid index for the shard key, recover byrecreating an index on just the shard key.

Unique Indexes

You cannot specify a unique constraint on a hashed index.

For a ranged sharded collection, only the following indexes can beunique:

the index on the shard key
a compound index where the shard key is a prefix
the default _id index; however, the _id index onlyenforces the uniqueness constraint per shard if the _id fieldis not the shard key or the prefix of the shard key.

Uniqueness and the _id Index

If the _id field is not the shard key or the prefix of theshard key, _id index only enforces the uniqueness constraintper shard and not across shards.

For example, consider a sharded collection (with shard key {x:1}) that spans two shards A and B. Because the _id key isnot part of the shard key, the collection could have a documentwith _id value 1 in shard A and another document with_id value 1 in shard B.

If the _id field is not the shard key nor the prefix of theshard key, MongoDB expects applications to enforce the uniquenessof the _id values across the shards.

The unique index constraints mean that:

For a to-be-sharded collection, you cannot shard the collection ifthe collection has other unique indexes.
For an already-sharded collection, you cannot create unique indexeson other fields.

Through the use of the unique index on the shard key, MongoDB can_enforce uniqueness on the shard key values. MongoDB enforces uniquenesson the _entire key combination, and not individual components of theshard key. To enforce uniqueness on the shard key values, pass theunique parameter as true to the sh.shardCollection()method:

If the collection is empty, sh.shardCollection() creates the unique index on theshard key if such an index does not already exist.
If the collection is not empty, you must create the index firstbefore using sh.shardCollection().

Although you can have a unique compound index where the shardkey is a prefix, if using uniqueparameter, the collection must have a unique index that is on the shardkey.

Choosing a Shard Key

The choice of shard key affects the creation and distribution ofthe chunks across the available shards. This affects the overall efficiency and performance ofoperations within the sharded cluster.

The shard key affects the performance and efficiency of thesharding strategy used by the sharded cluster.

The ideal shard key allows MongoDB to distribute documents evenly throughoutthe cluster.

At minimum, consider the consequences of thecardinality, frequency, andrate of change of a potential shard key.

Restrictions

For restrictions on shard key, see Shard Key Limitations.

Collection Size

When sharding a collection that is not empty, the shard key canconstrain the maximum supported collection size for the initialsharding operation only. SeeSharding Existing Collection Data Size.

Important

A sharded collection can grow to any size after successful sharding.

Shard Key Cardinality

The cardinality of a shard key determines the maximum number of chunksthe balancer can create. This can reduce or remove the effectiveness ofhorizontal scaling in the cluster.

A unique shard key value can exist on no more than a single chunk at anygiven time. If a shard key has a cardinality of 4, then there canbe no more than 4 chunks within the sharded cluster, each storingone unique shard key value. This constrains the number of effectiveshards in the cluster to 4 as well - adding additional shards wouldnot provide any benefit.

The following image illustrates a sharded cluster using the fieldX as the shard key. If X has low cardinality, the distribution ofinserts may look similar to the following:

The cluster in this example would not scale horizontally, as incoming writeswould only route to a subset of shards.

A shard key with high cardinality does not guarantee even distribution of dataacross the sharded cluster, though it does better facilitate horizontalscaling. The frequency and rate ofchange of the shard key also contributes to datadistribution. Consider each factor when choosing a shard key.

If your data model requires sharding on a key that has low cardinality,consider using a compound index using a field thathas higher relative cardinality.

Shard Key Frequency

Consider a set representing the range of shard key values - the frequencyof the shard key represents how often a given value occurs in the data. If themajority of documents contain only a subset of those values, then the chunksstoring those documents become a bottleneck within the cluster. Furthermore,as those chunks grow, they may become indivisible chunksas they cannot be split any further. This reduces or removes the effectivenessof horizontal scaling within the cluster.

The following image illustrates a sharded cluster using the field X as theshard key. If a subset of values for X occur with high frequency, thedistribution of inserts may look similar to the following:

A shard key with low frequency does not guarantee even distribution of dataacross the sharded cluster. The cardinality andrate of change of the shard key also contributesto data distribution. Consider each factor when choosing a shard key.

If your data model requires sharding on a key that has high frequencyvalues, consider using a compound index using a unique orlow frequency value.

Monotonically Changing Shard Keys

A shard key on a value that increases or decreases monotonically is morelikely to distribute inserts to a single shard within the cluster.

This occurs because every cluster has a chunk that captures a range with anupper bound of maxKey. maxKey alwayscompares as higher than all other values. Similarly, there is a chunk thatcaptures a range with a lower bound of minKey.minKey always compares as lower than all other values.

If the shard key value is always increasing, all new inserts are routed to thechunk with maxKey as the upper bound. If the shard key value is alwaysdecreasing, all new inserts are routed to the chunk with minKey as thelower bound. The shard containing that chunk becomes the bottleneck for writeoperations.

The following image illustrates a sharded cluster using the field Xas the shard key. If the values for X are monotonically increasing, thedistribution of inserts may look similar to the following:

If the shard key value was monotonically decreasing, then all inserts wouldroute to Chunk A instead.

A shard key that does not change monotonically does not guarantee evendistribution of data across the sharded cluster. Thecardinality andfrequency of the shard key also contributes todata distribution. Consider each factor when choosing a shard key.

If your data model requires sharding on a key that changesmonotonically, consider using Hashed Sharding.