Segmenting Data by Location
In sharded clusters, you can create zones of sharded data basedon the shard key. You can associate each zone with one or more shardsin the cluster. A shard can associate with any number of zones. In a balancedcluster, MongoDB migrates chunks covered by a zone only tothose shards associated with the zone.
Tip
Changed in version 4.0.3: By defining the zones and the zone ranges before sharding an emptyor a non-existing collection, the shard collection operation createschunks for the defined zone ranges as well as any additional chunksto cover the entire range of the shard key values and performs aninitial chunk distribution based on the zone ranges. This initialcreation and distribution of chunks allows for faster setup of zonedsharding. After the initial distribution, the balancer manages thechunk distribution going forward.
See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.
This tutorial uses Zones to segment data based ongeographic area.
The following are some example use cases for segmenting data by geographicarea:
- An application that requires segmenting user data based on geographic country
- A database that requires resource allocation based on geographic country
The following diagram illustrates a sharded cluster that uses geographic basedzones to manage and satisfy data segmentation requirements.
Scenario
A financial chat application logs messages, tracking the country of theoriginating user. The application stores the logs in the chat
databaseunder the messages
collection. The chats contain information that must besegmented by country to have servers local to the country serve read and writerequests for the country’s users. A group of countries can be assignedsame zone in order to share resources.
The application currently has users in the US, UK, and Germany. Thecountry
field represents the user’s country based on itsISO 3166-1 Alpha-2two-character country codes.
The following documents represent a partial view of three chat messages:
- {
- "_id" : ObjectId("56f08c447fe58b2e96f595fa"),
- "country" : "US",
- "userid" : 123,
- "message" : "Hello there",
- ...,
- }
- {
- "_id" : ObjectId("56f08c447fe58b2e96f595fb"),
- "country" : "UK",
- "userid" : 456,
- "message" : "Good Morning"
- ...,
- }
- {
- "_id" : ObjectId("56f08c447fe58b2e96f595fc"),
- "country" : "DE",
- "userid" : 789,
- "message" : "Guten Tag"
- ...,
- }
Shard Key
The messages
collection uses the { country : 1, userid : 1 }
compoundindex as the shard key.
The country
field in each document allows for creating a zone foreach distinct country value.
The userid
field provides a high cardinalityand low frequency component to the shard keyrelative to country
.
See Choosing a Shard Key for moregeneral instructions on selecting a shard key.
Architecture
The sharded cluster has shards in two data centers - one in Europe, andone in North America.
Zones
This application requires one zone per data center.
EU
- European data center- Shards deployed on this data center are assigned to the
EU
zone.
For each country using the EU
data center for local reads and writes,create a zone range for the EU
zone with:
- a lower bound of
{ "country" : <country>, "userid" : MinKey }
- an upper bound of
{ "country" : <country>, "userid" : MaxKey }
NA
- North American data center- Shards deployed on this data center are assigned to the
NA
zone.
For each country using the NA
data center for local reads and writes,create a zone range for the NA
zone with:
- a lower bound of
{ "country" : <country>, "userid" : MinKey }
- an upper bound of
{ "country" : <country>, "userid" : MaxKey }
Note
The MinKey
and MaxKey
values are reserved specialvalues for comparisons
Write Operations
With zones, if an inserted or updated document matches aconfigured zone, it can only be written to a shard inside of that zone.
MongoDB can write documents that do not match a configured zone to anyshard in the cluster.
Note
The behavior described above requires the cluster to be in a steady statewith no chunks violating a configured zone. See the following section onthe balancer formore information.
Read Operations
MongoDB can route queries to a specific shard if the query includes at leastthe country
field.
For example, MongoDB can attempt a targeted read operation on the following query:
- chatDB = db.getSiblingDB("chat")
- chatDB.messages.find( { "country" : "UK" , "userid" : "123" } )
Queries without the country
field perform broadcast operations.
Balancer
The balancermigrates chunks to the appropriate shard respecting anyconfigured zones. Until the migration, shards may contain chunks that violateconfigured zones. Once balancing completes, shards should onlycontain chunks whose ranges do not violate its assigned zones.
Adding or removing zones or zone ranges can result in chunk migrations.Depending on the size of your data set and the number of chunks a zone or zonerange affects, these migrations may impact cluster performance. Considerrunning your balancer during specific scheduledwindows. See Schedule the Balancing Window for a tutorial on howto set a scheduling window.
Security
For sharded clusters running with Role-Based Access Control, authenticate as a userwith at least the clusterManager
role on the admin
database.
Procedure
You must be connected to a mongos
to create zones and zone ranges.You cannot create zones or zone ranges by connecting directly to ashard.
Disable the Balancer (Optional)
To reduce performance impacts, the balancer may be disabled on the collectionto ensure no migrations take place while configuring the new zones.
Use sh.disableBalancing()
, specifying the namespace of thecollection, to stop the balancer.
- sh.disableBalancing("chat.message")
Use sh.isBalancerRunning()
to check if the balancer processis currently running. Wait until any current balancing rounds have completedbefore proceeding.
Add each shard to the appropriate zone
Add each shard in the North American data center to the NA
zone.
- sh.addShardTag(<shard name>, "NA")
Add each shard in the European data center to the EU
zone.
- sh.addShardTag(<shard name>, "EU")
You can review the zones assigned to any given shard by runningsh.status()
.
Define ranges for each zone
For shard key values where country : US
, define a shard key rangeand associate it to the NA
zone using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the zone.
- sh.addTagRange(
- "chat.messages",
- { "country" : "US", "userid" : MinKey },
- { "country" : "US", "userid" : MaxKey },
- "NA"
- )
For shard key values where country : UK
, define a shard key rangeand associate it to the EU
zone using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the zone.
- sh.addTagRange(
- "chat.messages",
- { "country" : "UK", "userid" : MinKey },
- { "country" : "UK", "userid" : MaxKey },
- "EU"
- )
For shard key values where country : DE
, define a shard key rangeand associate it to the EU
zone using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the zone.
- sh.addTagRange(
- "chat.messages",
- { "country" : "DE", "userid" : MinKey },
- { "country" : "DE", "userid" : MaxKey },
- "EU"
- )
The MinKey
and MaxKey
values are reserved specialvalue for comparisons. MinKey
always compares as lower than everyother possible value, while MaxKey
always compares as higher thanevery other possible value. The configured ranges captures every user foreach device
.
Both country : UK
and country : DE
are assigned to the EU
zone.This associates any document with either UK
or DE
as the value forcountry
to the EU data center.
Enable the Balancer (Optional)
If the balancer was disabled in previous steps, re-enable the balancer atthe completion of this procedure to rebalance the cluster.
Use sh.enableBalancing()
, specifying the namespace of thecollection, to start the balancer.
- sh.enableBalancing("chat.message")
Use sh.isBalancerRunning()
to check if the balancer processis currently running.
Review the Changes
The next time the balancer runs, itsplits chunks where necessary andmigrates chunks across theshards respecting the configured zones.
Once balancing finishes, the shards in the NA
zone should onlycontain documents with country : NA
, while shards in the EU
zoneshould only contain documents with country : UK
or country : DE
.
A document with a value for country
other than NA
, UK
, orDE
can reside on any shard in the cluster.
You can confirm the chunk distribution by running sh.status()
.
Updating Zones
The application requires the following updates:
- Documents with
country : UK
must now be associated to the newUK
data center. Any data in theEU
data center must be migrated - The chat application now supports users in Mexico. Documents with
country : MX
must be routed to theNA
data center.
Perform the following procedures to update the zone ranges.
Disable the Balancer (Optional)
To reduce performance impacts, the balancer may be disabled on the collectionto ensure no migrations take place while configuring the new zones or removingthe old ones.
Use sh.disableBalancing()
, specifying the namespace of thecollection, to stop the balancer
- sh.disableBalancing("chat.messages")
Use sh.isBalancerRunning()
to check if the balancer processis currently running. Wait until any current balancing rounds have completedbefore proceeding.
Add the new UK zone
Add each shard in the UK
data center to the UK
zone.
- sh.addShardTag("<shard name>", "UK")
You can review the zones assigned to any given shard by runningsh.status()
.
Remove the old zone range
Remove the old zone range associated to the UK
country using thesh.removeTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the zone.
- sh.removeTagRange(
- "chat.messages",
- { "country" : "UK", "userid" : MinKey },
- { "country" : "UK", "userid" : MaxKey }
- "EU"
- )
Add new zone ranges
For shard key values where country : UK
, define a shard key rangeand associate it to the UK
zone using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the zone.
- sh.addTagRange(
- "chat.message",
- { "country" : "UK", "userid" : MinKey },
- { "country" : "UK", "userid" : MaxKey },
- "UK"
- )
For shard key values where country : MX
, define a shard key rangeand associate it to the NA
zone using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the zone.
- sh.addTagRange(
- "chat.messages",
- { "country" : "MX", "userid" : MinKey },
- { "country" : "MX", "userid" : MaxKey },
- "NA"
- )
The MinKey
and MaxKey
values are reserved special valuesfor comparisons. MinKey
always compares as lower than every otherpossible value, while MaxKey
always compares as higher thanevery other possible value. This ensures the two ranges captures theentire possible value space of creation_date
.
Enable the Balancer (Optional)
If the balancer was disabled in previous steps, re-enable the balancer atthe completion of this procedure to rebalance the cluster.
Use sh.enableBalancing()
, specifying the namespace of thecollection, to start the balancer
- sh.enableBalancing("chat.messages")
Use sh.isBalancerRunning()
to check if the balancer processis currently running.
Review the changes
The next time the balancer runs, itsplits chunks where necessary andmigrates chunks across theshards respecting the configured zones.
Before balancing, the shards in the EU
zone only contained documentswhere country : DE
or country : UK
. Documents with the country :MX
could be stored on any shard in the sharded cluster.
After balancing, the shards in the EU
zone should only contain documentswhere country : DE
, while shards in the UK
zone should only containdocuments where country : UK
. Additionally, shards in the NA
zoneshould only contain documents where country : US
or country : MX
.
A document with a value for country
other than NA
, MX
, UK
,or DE
can reside on any shard in the cluster.
You can confirm the chunk distribution by running sh.status()
.
See also