Shard Keys
On this page
The shard key determines the distribution of the collection’sdocumentsamong the cluster’sshards. The shard key is either an indexedfieldor indexedcompoundfields that exists in every document in the collection.
MongoDBpartitionsdata in the collection using ranges of shard key values. Each range defines a non-overlapping range of shard key values and is associated with achunk.
MongoDB attempts to distribute chunks evenly among the shards in the cluster. The shard key has a direct relationship to the effectiveness of chunk distribution. SeeChoosing a Shard Key.
IMPORTANT
Once you shard a collection, the shard key and the shard key values are immutable; i.e.
- You cannot select a different shard key for that collection.
- You cannot update the values of the shard key fields.
Shard Key Specification
To shard a collection, you must specify the target collection and the shard key to thesh.shardCollection()
method:
sh
.
shardCollection
(
namespace
,
key
)
- The
namespace
parameter consists of a string<
database
>
.
<
collection
>
specifying the full
namespace
of the target collection. - The
key
parameter consists of a document containing a field and the index traversal direction for that field.
For instructions specific to sharding a collection using thehashed shardingstrategy, seeShard a Collection using Hashed Sharding
For instructions specific to sharding a collection using theranged shardingstrategy, seeShard a Collection using Ranged Sharding.
Shard Key Indexes
All sharded collectionsmusthave an index that supports theshard key; i.e. the index can be an index on the shard key or acompound indexwhere the shard key is aprefixof the index.
- If the collection is empty,
sh.shardCollection()
creates the index on the shard key if such an index does not already exists. - If the collection is not empty, you must create the index first before using
sh.shardCollection()
.
If you drop the last valid index for the shard key, recover by recreating an index on just the shard key.
Unique Indexes
For a sharded collection, only the_id
field index and the index on the shard key or acompound indexwhere the shard key is aprefixcan beunique:
- You cannot shard a collection that has unique indexes on other fields.
- You cannot create unique indexes on other fields for a sharded collection.
Through the use of the unique index on the shard key, MongoDB_can_enforce uniqueness on the shard key values. MongoDB enforces uniqueness on the_entire_key combination, and not individual components of the shard key. To enforce uniqueness on the shard key values, pass theunique
parameter astrue
to thesh.shardCollection()
method:
- If the collection is empty,
sh.shardCollection()
creates the unique index on the shard key if such an index does not already exists. - If the collection is not empty, you must create the index first before using
sh.shardCollection()
.
Although you can have a uniquecompound indexwhere the shard key is aprefix, if usingunique
parameter, the collection must have a unique index that is on the shard key.
You cannot specify a unique constraint on ahashed index.
Choosing a Shard Key
The choice of shard key affects how thesharded clusterbalancercreates and distributeschunksacross the availableshards. This affects the overall efficiency and performance of operations within the sharded cluster.
The shard key affects the performance and efficiency of thesharding strategyused by the sharded cluster.
The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster.
At minimum, consider the consequences of thecardinality,frequency, and rate ofchangeof a potential shard key.
Restrictions
For restrictions on shard key, seeShard Key Limitations.
Collection Size
When sharding a collection that is not empty, the shard key can constrain the maximum supported collection size for the initial sharding operation only. SeeShardingExistingCollectionDataSize
.
IMPORTANT
A sharded collection can grow to any size after successful sharding.
Shard Key Cardinality
Thecardinalityof a shard key determines the maximum number of chunks the balancer can create. This can reduce or remove the effectiveness of horizontal scaling in the cluster.
A unique shard key value can exist on no more than a single chunk at any given time. If a shard key has a cardinality of4
, then there can be no more than4
chunks within the sharded cluster, each storing one unique shard key value. This constrains the number of effective shards in the cluster to4
as well - adding additional shards would not provide any benefit.
The following image illustrates a sharded cluster using the fieldX
as the shard key. IfX
has low cardinality, the distribution of inserts may look similar to the following:
The cluster in this example would_not_scale horizontally, as incoming writes would only route to a subset of shards.
A shard key with high cardinality does not guarantee even distribution of data across the sharded cluster, though it does better facilitate horizontal scaling. Thefrequencyandrate of changeof the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that has low cardinality, consider using acompound indexusing a field that has higher relative cardinality.
Shard Key Frequency
Consider a set representing the range of shard key values - thefrequency
of the shard key represents how often a given value occurs in the data. If the majority of documents contain only a subset of those values, then the chunks storing those documents become a bottleneck within the cluster. Furthermore, as those chunks grow, they may becomeindivisible chunksas they cannot be split any further. This reduces or removes the effectiveness of horizontal scaling within the cluster.
The following image illustrates a sharded cluster using the fieldX
as the shard key. If a subset of values forX
occur with high frequency, the distribution of inserts may look similar to the following:
A shard key with low frequency does not guarantee even distribution of data across the sharded cluster. Thecardinalityandrate of changeof the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that has high frequency values, consider using acompound indexusing a unique or low frequency value.
Monotonically Changing Shard Keys
A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single shard within the cluster.
This occurs because every cluster has a chunk that captures a range with an upper bound ofmaxKey
.maxKey
always compares as higher than all other values. Similarly, there is a chunk that captures a range with a lower bound ofminKey
.minKey
always compares as lower than all other values.
If the shard key value is always increasing, all new inserts are routed to the chunk withmaxKey
as the upper bound. If the shard key value is always decreasing, all new inserts are routed to the chunk withminKey
as the lower bound. The shard containing that chunk becomes the bottleneck for write operations.
The following image illustrates a sharded cluster using the fieldX
as the shard key. If the values forX
are monotonically increasing, the distribution of inserts may look similar to the following:
If the shard key value was monotonically decreasing, then all inserts would route toChunkA
instead.
A shard key that does not change monotonically does not guarantee even distribution of data across the sharded cluster. Thecardinalityandfrequencyof the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that changes monotonically, consider usingHashed Sharding.