Shard Keys
The shard key determines the distribution of the collection’s documents among the cluster’s shards. The shard key iseither an indexed field or indexed compoundfields that exists in every document in the collection.
MongoDB partitions data in the collection using rangesof shard key values. Each range defines a non-overlapping range of shard keyvalues and is associated with a chunk.
MongoDB attempts to distribute chunks evenly among the shards in the cluster.The shard key has a direct relationship to the effectiveness of chunkdistribution. See Choosing a Shard Key.
Important
Once you shard a collection, the selection of the shard key isimmutable; i.e. you cannot select a different shard key for thatcollection.
Starting in MongoDB 4.2, you can update a document’s shard key valueunless the shard key field is the immutable
_id
field. For detailson updating the shard key, see Change a Document’s Shard Key Value.
Before MongoDB 4.2, a document’s shard key field value is immutable.
Shard Key Specification
To shard a collection, you must specify the target collection and theshard key to the sh.shardCollection()
method:
- sh.shardCollection( namespace, key )
- The
namespace
parameter consists of a string<database>.<collection>
specifying the full namespace ofthe target collection. - The
key
parameter consists of a document containing a field andthe index traversal direction for that field.
Important
Once you shard a collection, the selection of the shard key isimmutable; i.e. you cannot select a different shard key for thatcollection.
Starting in MongoDB 4.2, you can update a document’s shard key valueunless the shard key field is the immutable
_id
field. For detailson updating the shard key, see Change a Document’s Shard Key Value.
Before MongoDB 4.2, a document’s shard key field value is immutable.
For instructions specific to sharding a collection using thehashed sharding strategy, seeShard a Collection using Hashed Sharding
For instructions specific to sharding a collection using theranged sharding strategy, seeShard a Collection using Ranged Sharding.
Change a Document’s Shard Key Value
Starting in MongoDB 4.2, you can update a document’s shard key valueunless the shard key field is the immutable _id
field. To update,use the following operations to update a document’s shard key value:
Command | Method |
---|---|
update with multi: false | db.collection.replaceOne() db.collection.updateOne() db.collection.update() with multi: false |
findAndModify | db.collection.findOneAndReplace() db.collection.findOneAndUpdate() db.collection.findAndModify() |
db.collection.bulkWrite() Bulk.find.updateOne() If the shard key modification results in moving the document toanother shard, you cannot specify more than one shard keymodification in the bulk operation; i.e. batch size of1.If the shard key modification does not result in moving thedocument to another shard, you can specify multiple shardkey modification in the bulk operation. |
Shard Key Indexes
All sharded collections must have an index that supports theshard key; i.e. the index can be an index on the shard key or acompound index where the shard key is a prefix of the index.
- If the collection is empty,
sh.shardCollection()
createsthe index on the shard key if such an index does not already exists. - If the collection is not empty, you must create the index firstbefore using
sh.shardCollection()
.
If you drop the last valid index for the shard key, recover byrecreating an index on just the shard key.
Unique Indexes
You cannot specify a unique constraint on a hashed index.
For a ranged sharded collection, only the following indexes can beunique:
the index on the shard key
a compound index where the shard key is a prefix
the default
_id
index; however, the_id
index onlyenforces the uniqueness constraint per shard if the_id
fieldis not the shard key or the prefix of the shard key.
Uniqueness and the _id
Index
If the _id
field is not the shard key or the prefix of theshard key, _id
index only enforces the uniqueness constraintper shard and not across shards.
For example, consider a sharded collection (with shard key {x:1}
) that spans two shards A and B. Because the _id
key isnot part of the shard key, the collection could have a documentwith _id
value 1
in shard A and another document with_id
value 1
in shard B.
If the _id
field is not the shard key nor the prefix of theshard key, MongoDB expects applications to enforce the uniquenessof the _id
values across the shards.
The unique index constraints mean that:
- For a to-be-sharded collection, you cannot shard the collection ifthe collection has other unique indexes.
- For an already-sharded collection, you cannot create unique indexeson other fields.
Through the use of the unique index on the shard key, MongoDB can_enforce uniqueness on the shard key values. MongoDB enforces uniquenesson the _entire key combination, and not individual components of theshard key. To enforce uniqueness on the shard key values, pass theunique
parameter as true
to the sh.shardCollection()
method:
- If the collection is empty,
sh.shardCollection()
creates the unique index on theshard key if such an index does not already exist. - If the collection is not empty, you must create the index firstbefore using
sh.shardCollection()
.
Although you can have a unique compound index where the shardkey is a prefix, if using unique
parameter, the collection must have a unique index that is on the shardkey.
Choosing a Shard Key
The choice of shard key affects the creation and distribution ofthe chunks across the available shards. This affects the overall efficiency and performance ofoperations within the sharded cluster.
The shard key affects the performance and efficiency of thesharding strategy used by the sharded cluster.
The ideal shard key allows MongoDB to distribute documents evenly throughoutthe cluster.
At minimum, consider the consequences of thecardinality, frequency, andrate of change of a potential shard key.
Restrictions
For restrictions on shard key, see Shard Key Limitations.
Collection Size
When sharding a collection that is not empty, the shard key canconstrain the maximum supported collection size for the initialsharding operation only. SeeSharding Existing Collection Data Size
.
Important
A sharded collection can grow to any size after successful sharding.
Shard Key Cardinality
The cardinality of a shard key determines the maximum number of chunksthe balancer can create. This can reduce or remove the effectiveness ofhorizontal scaling in the cluster.
A unique shard key value can exist on no more than a single chunk at anygiven time. If a shard key has a cardinality of 4
, then there canbe no more than 4
chunks within the sharded cluster, each storingone unique shard key value. This constrains the number of effectiveshards in the cluster to 4
as well - adding additional shards wouldnot provide any benefit.
The following image illustrates a sharded cluster using the fieldX
as the shard key. If X
has low cardinality, the distribution ofinserts may look similar to the following:
The cluster in this example would not scale horizontally, as incoming writeswould only route to a subset of shards.
A shard key with high cardinality does not guarantee even distribution of dataacross the sharded cluster, though it does better facilitate horizontalscaling. The frequency and rate ofchange of the shard key also contributes to datadistribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that has low cardinality,consider using a compound index using a field thathas higher relative cardinality.
Shard Key Frequency
Consider a set representing the range of shard key values - the frequency
of the shard key represents how often a given value occurs in the data. If themajority of documents contain only a subset of those values, then the chunksstoring those documents become a bottleneck within the cluster. Furthermore,as those chunks grow, they may become indivisible chunksas they cannot be split any further. This reduces or removes the effectivenessof horizontal scaling within the cluster.
The following image illustrates a sharded cluster using the field X
as theshard key. If a subset of values for X
occur with high frequency, thedistribution of inserts may look similar to the following:
A shard key with low frequency does not guarantee even distribution of dataacross the sharded cluster. The cardinality andrate of change of the shard key also contributesto data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that has high frequencyvalues, consider using a compound index using a unique orlow frequency value.
Monotonically Changing Shard Keys
A shard key on a value that increases or decreases monotonically is morelikely to distribute inserts to a single shard within the cluster.
This occurs because every cluster has a chunk that captures a range with anupper bound of maxKey. maxKey
alwayscompares as higher than all other values. Similarly, there is a chunk thatcaptures a range with a lower bound of minKey.minKey
always compares as lower than all other values.
If the shard key value is always increasing, all new inserts are routed to thechunk with maxKey
as the upper bound. If the shard key value is alwaysdecreasing, all new inserts are routed to the chunk with minKey
as thelower bound. The shard containing that chunk becomes the bottleneck for writeoperations.
The following image illustrates a sharded cluster using the field X
as the shard key. If the values for X
are monotonically increasing, thedistribution of inserts may look similar to the following:
If the shard key value was monotonically decreasing, then all inserts wouldroute to Chunk A
instead.
A shard key that does not change monotonically does not guarantee evendistribution of data across the sharded cluster. Thecardinality andfrequency of the shard key also contributes todata distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that changesmonotonically, consider using Hashed Sharding.