- Deploy Sharded Cluster using Hashed Sharding
Deploy Sharded Cluster using Hashed Sharding
Overview
Hashed shard keys use a hashed index of asingle field as the shard key to partition data across yoursharded cluster.
Hashed sharding provides a more even data distribution across the shardedcluster at the cost of reducing Targeted Operations vs. Broadcast Operations. Withhashed sharding, documents with “close” shard key values are unlikelyto be on the same chunk or shard, and the mongos
is morelikely to perform Broadcast Operations to fulfill a givenquery.
If you already have a sharded cluster deployed, skip toShard a Collection using Hashed Sharding.
Atlas, CloudManager and OpsManager
If you are currently using or are planning to use Atlas, Cloud Manageror Ops Manager, refer to their respective manual for instructions ondeploying a sharded cluster:
- Create a Cluster (Atlas)
- Deploy a Sharded Cluster (Cloud Manager)
- Deploy a Sharded Cluster (Ops Manager).
Considerations
Hostnames and Configuration
Tip
When possible, use a logical DNS hostname instead of an ip address,particularly when configuring replica set members or sharded clustermembers. The use of logical DNS hostnames avoids configurationchanges due to ip address changes.
Operating System
This tutorial uses the mongod
and mongos
programs. Windows users should use the mongod.exe
andmongos.exe
programs instead.
IP Binding
Use the bind_ip
option to ensure that MongoDB listens forconnections from applications on configured addresses.
Starting in MongoDB 3.6, MongoDB binaries, mongod
andmongos
, bind to localhost by default. If thenet.ipv6
configuration file setting or the —ipv6
command line option is set for the binary, the binary additionally bindsto the localhost IPv6 address.
Previously, starting from MongoDB 2.6, only the binaries from theofficial MongoDB RPM (Red Hat, CentOS, Fedora Linux, and derivatives)and DEB (Debian, Ubuntu, and derivatives) packages bind to localhost bydefault.
When bound only to the localhost, these MongoDB 3.6 binaries can onlyaccept connections from clients (including the mongo
shell,other members in your deployment for replica sets and sharded clusters)that are running on the same machine. Remote clients cannot connect tothe binaries bound only to localhost.
To override and bind to other ip addresses, you can use thenet.bindIp
configuration file setting or the—bind_ip
command-line option to specify a list of hostnames or ipaddresses.
Warning
Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.
For example, the following mongod
instance binds to boththe localhost and the hostname My-Example-Associated-Hostname
, which isassociated with the ip address 198.51.100.1
:
- mongod --bind_ip localhost,My-Example-Associated-Hostname
In order to connect to this instance, remote clients must specifythe hostname or its associated ip address 198.51.100.1
:
- mongo --host My-Example-Associated-Hostname
- mongo --host 198.51.100.1
Security
This tutorial does not include the required steps for configuringInternal/Membership Authentication or Role-Based Access Control.See Deploy Sharded Cluster with Keyfile Authentication for atutorial on deploying a sharded cluster with akeyfile.
In production environments, sharded clusters should employ atminimum x.509 security for internal authenticationand client access:
- For details on using x.509 for internal authentication, seeUse x.509 Certificate for Membership Authentication.
- For details on using x.509 for client authentication, seeUse x.509 Certificates to Authenticate Clients.
Note
Enabling internal authentication also enablesRole-Based Access Control.
Deploy Sharded Cluster with Hashed Sharding
Tip
When possible, use a logical DNS hostname instead of an ip address,particularly when configuring replica set members or sharded clustermembers. The use of logical DNS hostnames avoids configurationchanges due to ip address changes.
Create the Config Server Replica Set
The following steps deploys a config server replica set.
For a production deployment, deploy a config server replica set with atleast three members. For testing purposes, you can create asingle-member replica set.
For this tutorial, the config server replica set members are associatedwith the following hosts:
Config Server Replica Set Member | Hostname |
---|---|
Member 0 | cfg1.example.net |
Member 1 | cfg2.example.net |
Member 2 | cfg3.example.net |
Start each member of the config server replica set.
When starting eachmongod
, specify themongod
settings either via a configuration file or thecommand line.
Configuration File
If using a configuration file, set:
- sharding:
- clusterRole: configsvr
- replication:
- replSetName: <replica set name>
- net:
- bindIp: localhost,<hostname(s)|ip address(es)>
sharding.clusterRole
toconfigsvr
,replication.replSetName
to the desired name of theconfig server replica set,net.bindIp
option to the hostname/ip address orcomma-delimited list of hostnames or ip addresses that remoteclients (including the other members of the config serverreplica set as well as other members of the sharded cluster)can use to connect to the instance.
Warning
Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.
- Additional settings as appropriate to your deployment, such as
storage.dbPath
andnet.port
. For moreinformation on the configuration file, see configurationoptions.
Start the mongod
with the —config
optionset to the configuration file path.
- mongod --config <path-to-config-file>
Command Line
If using the command line options, start the mongod
with the —configsvr
, —replSet
, —bind_ip
,and other options as appropriate to your deployment. For example:
Warning
Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.
- mongod --configsvr --replSet <replica set name> --dbpath <path> --bind_ip localhost,<hostname(s)|ip address(es)>
For more information on startup parameters, see themongod
reference page.
Connect to one of the config servers.
Connect a mongo
shell to one of the config servermembers.
- mongo --host <hostname> --port <port>
Initiate the replica set.
From the mongo
shell, run the rs.initiate()
method.
rs.initiate()
can take an optional replica setconfiguration document. In thereplica set configuration document, include:
- The
_id
set to the replica set name specified in eitherthereplication.replSetName
or the—replSet
option. - The
configsvr
field set totrue
for the config server replica set. - The
members
array with a document per each member of the replica set.
Important
Run rs.initiate()
on just one and only onemongod
instance for the replica set.
- rs.initiate(
- {
- _id: "<replSetName>",
- configsvr: true,
- members: [
- { _id : 0, host : "cfg1.example.net:27019" },
- { _id : 1, host : "cfg2.example.net:27019" },
- { _id : 2, host : "cfg3.example.net:27019" }
- ]
- }
- )
See Replica Set Configuration for more information onreplica set configuration documents.
Once the config server replica set (CSRS) is initiated and up, proceedto creating the shard replica sets.
Create the Shard Replica Sets
For a production deployment, use a replica set with at least threemembers for each shard. For testing purposes, you can create asingle-member replica set.
For each shard, use the following steps to create the shard replica set.
Start each member of the shard replica set.
When starting eachmongod
, specify themongod
settings either via a configuration file or thecommand line.
Configuration File
If using a configuration file, set:
- sharding:
- clusterRole: shardsvr
- replication:
- replSetName: <replSetName>
- net:
- bindIp: localhost,<ip address>
replication.replSetName
to the desired name of thereplica set,sharding.clusterRole
option toshardsvr
,net.bindIp
option to the ip or a comma-delimitedlist of ips that remote clients (including the other members ofthe config server replica set as well as other members of thesharded cluster) can use to connect to the instance.
Warning
Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.
- Additional settings as appropriate to your deployment, such as
storage.dbPath
andnet.port
. For moreinformation on the configuration file, see configurationoptions.
Start the mongod
with the —config
option set tothe configuration file path.
- mongod --config <path-to-config-file>
Command Line
If using the command line option, start the mongod
withthe —replSet
, and —shardsvr
, —bind_ip
options,and other options as appropriate to your deployment. For example:
- mongod --shardsvr --replSet <replSetname> --dbpath <path> --bind_ip localhost,<hostname(s)|ip address(es)>
For more information on startup parameters, see themongod
reference page.
Connect to one member of the shard replica set.
Connect a mongo
shell to one of the replica set members.
- mongo --host <hostname> --port <port>
Initiate the replica set.
From the mongo
shell, run the rs.initiate()
method.
rs.initiate()
can take an optional replica setconfiguration document. In thereplica set configuration document, include:
- The
_id
field set to the replica set name specified ineither thereplication.replSetName
or the—replSet
option. - The
members
array with a document per each member of thereplica set.
The following example initiates a three member replica set.
Important
Run rs.initiate()
on just one and only onemongod
instance for the replica set.
- rs.initiate(
- {
- _id : <replicaSetName>,
- members: [
- { _id : 0, host : "s1-mongo1.example.net:27018" },
- { _id : 1, host : "s1-mongo2.example.net:27018" },
- { _id : 2, host : "s1-mongo3.example.net:27018" }
- ]
- }
- )
Connect a mongos to the Sharded Cluster
Connect a mongos to the cluster
Start a mongos
using either a configuration file or acommand line parameter to specify the config servers.
Configuration File
If using a configuration file, set the sharding.configDB
tothe config server replica set name and at least one member of the replicaset in <replSetName>/<host:port>
format.
Warning
Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.
- sharding:
- configDB: <configReplSetName>/cfg1.example.net:27019,cfg2.example.net:27019
- net:
- bindIp: localhost,<hostname(s)|ip address(es)>
Start the mongos
specifying the —config
option and thepath to the configuration file.
- mongos --config <path-to-config>
For more information on the configuration file, seeconfiguration options.
Command Line
If using command line parameters start the mongos
and specifythe —configdb
, —bind_ip
,and other options as appropriate to your deployment. For example:
Warning
Before binding to a non-localhost (e.g. publicly accessible)IP address, ensure you have secured your cluster from unauthorizedaccess. For a complete list of security recommendations, seeSecurity Checklist. At minimum, considerenabling authentication andhardening network infrastructure.
- mongos --configdb <configReplSetName>/cfg1.example.net:27019,cfg2.example.net:27019 --bind_ip localhost,<hostname(s)|ip address(es)>
Include any other options as appropriate for your deployment.
Connect to the mongos.
Connect a mongo
shell to the mongos
.
- mongo --host <hostname> --port <port>
Once you have connected the mongo
shell to themongos
, continue to the next procedure to add shards tothe cluster.
Add Shards to the Cluster
In the mongo
shell connected to the mongos
, usethe sh.addShard()
method to add each shard to the cluster.
The following operation adds a single shard replica set to the cluster:
- sh.addShard( "<replSetName>/s1-mongo1.example.net:27018")
Repeat to add all shards.
Enable Sharding for a Database
From the mongo
shell connected to the mongos
, usethe sh.enableSharding()
method to enable sharding on thetarget database. Enabling sharding on a database makes it possible toshard collections within a database.
- sh.enableSharding("<database>")
Shard a Collection using Hashed Sharding
From the mongo
shell connected to the mongos
, usethe sh.shardCollection()
method to shard a collection.
Note
You must have enabled sharding for the databasewhere the collection resides. SeeEnable Sharding for a Database.
If the collection already contains data, you must create aHashed Indexes on the shard key using thedb.collection.createIndex()
method before usingshardCollection()
. [1]
If the collection is empty, MongoDB creates the index as part ofsh.shardCollection()
.
The following operation shards the target collection using thehashed sharding strategy.
- sh.shardCollection("<database>.<collection>", { <shard key> : "hashed" } )
- You must specify the full namespace of the collection and the shardkey.
- Your selection of shard key affects the efficiency of sharding, aswell as your ability to take advantage of certain sharding featuressuch as zones. See the selectionconsiderations listed in the Hashed Sharding Shard Key.
[1] | Starting in version 4.0, the mongo shell provides themethod convertShardKeyToHashed() . This method uses thesame hashing function as the hashed index and can be used to seewhat the hashed value would be for a key. |