Monitoring for MongoDB
Monitoring is a critical component of all database administration. Afirm grasp of MongoDB’s reporting will allow you to assess the stateof your database and maintain your deployment without crisis.Additionally, a sense of MongoDB’s normal operational parameters willallow you to diagnose problems before they escalate to failures.
This document presents an overview of the available monitoring utilitiesand the reporting statisticsavailable in MongoDB. It also introduces diagnostic strategiesand suggestions for monitoring replica sets andsharded clusters.
Monitoring Strategies
MongoDB provides various methods for collecting data about the state ofa running MongoDB instance:
- Starting in version 4.0, MongoDB offers free Cloud monitoring for standalones and replica sets.
- MongoDB distributes a set of utilities that provides real-timereporting of database activities.
- MongoDB provides various database commands that return statistics regarding the currentdatabase state with greater fidelity.
- MongoDB Atlasis a cloud-hosted database-as-a-service for running, monitoring, andmaintaining MongoDB deployments.
- MongoDB Cloud Manager is a hosted service that monitors running MongoDBdeployments to collect data and provide visualization and alertsbased on that data.
- MongoDB Ops Manager is an on-premise solution available inMongoDB Enterprise Advancedthat monitors running MongoDB deployments to collect data and providevisualization and alerts based on that data.
Each strategy can help answer different questions and is useful indifferent contexts. These methods are complementary.
MongoDB Reporting Tools
This section provides an overview of the reporting methods distributedwith MongoDB. It also offers examples of the kinds of questions thateach method is best suited to help you address.
Free Monitoring
New in version 4.0.
MongoDB offers free Cloud monitoring for standalones or replica sets.
By default, you can enable/disable free monitoring during runtime usingdb.enableFreeMonitoring()
and db.disableFreeMonitoring()
.
Free monitoring provides up to 24 hours of data. For more details, seeFree Monitoring.
Utilities
The MongoDB distribution includes a number of utilities that quicklyreturn statistics about instances’ performance and activity. Typically,these are most useful for diagnosing issues and assessing normaloperation.
mongostat
mongostat
captures and returns the counts of databaseoperations by type (e.g. insert, query, update, delete, etc.). Thesecounts report on the load distribution on the server.
Use mongostat
to understand the distribution of operation typesand to inform capacity planning. See the mongostat manual for details.
mongotop
mongotop
tracks and reports the current read and writeactivity of a MongoDB instance, and reports these statistics on a percollection basis.
Use mongotop
to check if your database activity and usematch your expectations. See the mongotop manual for details.
HTTP Console
Changed in version 3.6: MongoDB 3.6 removes the deprecated HTTP interface and REST API toMongoDB.
Commands
MongoDB includes a number of commands that report on the state of thedatabase.
These data may provide a finer level of granularity than the utilitiesdiscussed above. Consider using their output in scripts and programs todevelop custom alerts, or to modify the behavior of your application inresponse to the activity of your instance. The db.currentOp
method is another useful tool for identifying the database instance’sin-progress operations.
serverStatus
The serverStatus
command, or db.serverStatus()
from the shell, returns a general overview of the status of thedatabase, detailing disk usage, memory use, connection, journaling,and index access. The command returns quickly and does not impactMongoDB performance.
serverStatus
outputs an account of the state of a MongoDBinstance. This command is rarely run directly. In most cases, the datais more meaningful when aggregated, as one would see with monitoringtools including MongoDB Cloud Manager and Ops Manager. Nevertheless, alladministrators should be familiar with the data provided byserverStatus
.
dbStats
The dbStats
command, or db.stats()
from the shell,returns a document that addresses storage use and data volumes. ThedbStats
reflect the amount ofstorage used, the quantity of data contained in the database, andobject, collection, and index counters.
Use this data to monitor the state and storage capacityof a specific database. This output also allows you to compareuse between databases and to determine the averagedocument size in a database.
collStats
The collStats
or db.collection.stats()
from theshell that provides statistics that resemble dbStats
onthe collection level, including a count of the objects in thecollection, the size of the collection, the amount of disk space usedby the collection, and information about its indexes.
replSetGetStatus
The replSetGetStatus
command (rs.status()
fromthe shell) returns an overview of your replica set’s status. The replSetGetStatus document details thestate and configuration of the replica set and statistics about its members.
Use this data to ensure that replication is properly configured,and to check the connections between the current host and the other membersof the replica set.
Hosted (SaaS) Monitoring Tools
These are monitoring tools provided as a hosted service, usually througha paid subscription.
Name | Notes |
---|---|
MongoDB Cloud Manager | MongoDB Cloud Manager is a cloud-based suite of services for managing MongoDBdeployments. MongoDB Cloud Manager provides monitoring, backup, and automationfunctionality. For an on-premise solution, see alsoOps Manager, available in MongoDB Enterprise Advanced. |
VividCortex | VividCortex provides deep insights into MongoDB productiondatabase workload and query performance – inone-second resolution. Track latency, throughput, errors, andmore to ensure scalability and exceptional performance of yourapplication on MongoDB. |
Scout | Several plugins, including MongoDB Monitoring,MongoDB Slow Queries,and MongoDB Replica Set Monitoring. |
Server Density | Dashboard for MongoDB, MongoDBspecific alerts, replication failover timeline and iPhone, iPadand Android mobile apps. |
Application Performance Management | IBM has an Application Performance Management SaaS offering thatincludes monitor for MongoDB and other applications and middleware. |
New Relic | New Relic offers full support for application performancemanagement. In addition, New Relic Plugins and Insights enable you to viewmonitoring metrics from Cloud Manager in New Relic. |
Datadog | Infrastructure monitoring to visualizethe performance of your MongoDB deployments. |
SPM Performance Monitoring | Monitoring, Anomaly Detection and Alerting SPM monitors all key MongoDB metrics together with infrastructure incl. Docker and other application metrics, e.g. Node.js, Java, NGINX, Apache, HAProxy or Elasticsearch. SPM provides correlation of metrics and logs. |
Process Logging
During normal operation, mongod
and mongos
instances report a live account of all server activity and operationsto eitherstandard output or a log file. The following runtime settingscontrol these options.
quiet
. Limits the amount of information written to thelog or output.verbosity
. Increases the amount of information written tothe log or output. You can also modify the logging verbosity duringruntime with thelogLevel
parameter or thedb.setLogLevel()
method in the shell.path
. Enables logging to a file, rather than the standardoutput. You must specify the full path to the log file when adjustingthis setting.logAppend
. Adds information to a logfile instead of overwriting the file.
Note
You can specify these configuration operations as the command linearguments to mongod or mongos
For example:
- mongod -v --logpath /var/log/mongodb/server1.log --logappend
Starts a mongod
instance in verbose
mode, appending data to the log file at/var/log/mongodb/server1.log/
.
The following database commands alsoaffect logging:
getLog
. Displays recent messages from themongod
process log.logRotate
. Rotates the log files formongod
processes only. See Rotate Log Files.
Log Redaction
New in version 3.4: Available in MongoDB Enterprise only
A mongod
running with security.redactClientLogData
redacts messages associated with any givenlog event before logging, leaving only metadata, source files, or line numbersrelated to the event. security.redactClientLogData
preventspotentially sensitive information from entering the system log at the cost ofdiagnostic detail.
For example, the following operation inserts a document into amongod
running without log redaction. The mongod
has systemLog.component.command.verbosity
set to 1
:
- db.clients.insertOne( { "name" : "Joe", "PII" : "Sensitive Information" } )
This operation produces the following log event:
- 2017-06-09T13:35:23.446-0400 I COMMAND [conn1] command internal.clients
- appName: "MongoDB Shell"
- command: insert {
- insert: "clients",
- documents: [ {
- _id: ObjectId('593adc5b99001b7d119d0c97'),
- name: "Joe",
- PII: " Sensitive Information"
- } ],
- ordered: true
- }
- ...
A mongod
running with security.redactClientLogData
performing the same insert operation produces the following log event:
- 2017-06-09T13:45:18.599-0400 I COMMAND [conn1] command internal.clients
- appName: "MongoDB Shell"
- command: insert {
- insert: "###", documents: [ {
- _id: "###", name: "###", PII: "###"
- } ],
- ordered: "###"
- }
Use redactClientLogData
in conjunction withEncryption at Rest andTLS/SSL (Transport Encryption) to assist compliance withregulatory requirements.
Diagnosing Performance Issues
As you develop and operate applications with MongoDB, you may want toanalyze the performance of the database as the application.MongoDB Performance discusses some of theoperational factors that can influence performance.
Replication and Monitoring
Beyond the basic monitoring requirements for any MongoDB instance, forreplica sets, administrators must monitor replicationlag. “Replication lag” refers to the amount of time that it takes tocopy (i.e. replicate) a write operation on the primary to asecondary. Some small delay period may be acceptable, butsignificant problems emerge as replication lag grows, including:
Growing cache pressure on the primary.
Operations that occurred during the period of lag are notreplicated to one or more secondaries. If you’re using replicationto ensure data persistence, exceptionally long delays may impact theintegrity of your data set.
If the replication lag exceeds the length of the operationlog (oplog) then MongoDB will have to perform an initialsync on the secondary, copying all data from the primary andrebuilding all indexes. [1] This is uncommon under normal circumstances,but if you configure the oplog to be smaller than the default,the issue can arise.
Note
The size of the oplog is only configurable during the firstrun using the —oplogSize
argument tothe mongod
command, or preferably, theoplogSizeMB
settingin the MongoDB configuration file. If you do not specify this on thecommand line before running with the —replSet
option, mongod
will create a default sized oplog.
By default, the oplog is 5 percent of total available disk spaceon 64-bit systems. For more information about changing the oplogsize, see the Change the Size of the Oplog.
Flow Control
Starting in MongoDB 4.2, administrators can limit the rate at whichthe primary applies its writes with the goal of keeping the majoritycommitted
lag undera configurable maximum value flowControlTargetLagSeconds
.
By default, flow control is enabled
.
Note
For flow control to engage, the replica set/sharded cluster musthave: featureCompatibilityVersion (FCV) of4.2
and read concern majority enabled
. That is, enabled flowcontrol has no effect if FCV is not 4.2
or if read concernmajority is disabled.
See also: Check the Replication Lag.
Replica Set Status
Replication issues are most often the result of network connectivityissues between members, or the result of a primary that does nothave the resources to support application and replication traffic. Tocheck the status of a replica, use the replSetGetStatus
orthe following helper in the shell:
- rs.status()
The replSetGetStatus
reference provides a more in-depthoverview view of this output. In general, watch the value ofoptimeDate
, and pay particular attentionto the time difference between the primary and thesecondary members.
[1] | Starting in MongoDB 4.0, the oplog can grow past its configured sizelimit to avoid deleting the majority commit point . |
Free Monitoring
Note
Starting in version 4.0, MongoDB offers free monitoring for standalone and replica sets.For more information, see Free Monitoring.
Slow Application of Oplog Entries
Starting in version 4.2 (also available starting in 4.0.6), secondary members of a replica set nowlog oplog entries that take longer than the slowoperation threshold to apply. These slow oplog messages are loggedfor the secondaries in the diagnostic log
under the REPL
component with the text appliedop: <oplog entry> took <num>ms
. These slow oplog entries dependonly on the slow operation threshold. They do not depend on the loglevels (either at the system or component level), or the profilinglevel, or the slow operation sample rate. The profiler does notcapture slow oplog entries.
Sharding and Monitoring
In most cases, the components of sharded clustersbenefit from the same monitoring and analysis as all other MongoDBinstances. In addition, clusters require further monitoring to ensurethat data is effectively distributed among nodes and that shardingoperations are functioning appropriately.
See also
See the Sharding documentation for moreinformation.
Config Servers
The config database maintains a map identifying whichdocuments are on which shards. The cluster updates this map aschunks move between shards. When a configurationserver becomes inaccessible, certain sharding operations becomeunavailable, such as moving chunks and starting mongos
instances. However, clusters remain accessible from already-runningmongos
instances.
Because inaccessible configuration servers can seriously impactthe availability of a sharded cluster, you should monitor yourconfiguration servers to ensure that the cluster remains wellbalanced and that mongos
instances can restart.
MongoDB Cloud Manager and Ops Manager monitor config servers and cancreate notifications if a config server becomes inaccessible. See theMongoDB Cloud Manager documentation and Ops Manager documentation for more information.
Balancing and Chunk Distribution
The most effective sharded cluster deployments evenly balancechunks among the shards. To facilitate this, MongoDBhas a background balancer process that distributes data to ensure thatchunks are always optimally distributed among the shards.
Issue the db.printShardingStatus()
or sh.status()
command to the mongos
by way of the mongo
shell. This returns an overview of the entire cluster including thedatabase name, and a list of the chunks.
Stale Locks
To check the lock status of the database, connect to amongos
instance using the mongo
shell. Issue thefollowing command sequence to switch to the config
database anddisplay all outstanding locks on the shard database:
- use config
- db.locks.find()
The balancing process takes a special “balancer” lock that preventsother balancing activity from transpiring. In the config
database,use the following command to view the “balancer” lock.
- db.locks.find( { _id : "balancer" } )
Changed in version 3.4: Starting in 3.4, the primary of the CSRS config server holds the“balancer” lock, using a process id named “ConfigServer”. This lockis never released. To determine if the balancer is running, seeCheck if Balancer is Running.
Storage Node Watchdog
Note
- Starting in MongoDB 4.2, the Storage Node Watchdog is available in both the Community andMongoDB Enterprise editions.
- In earlier versions (3.2.16+, 3.4.7+, 3.6.0+, 4.0.0+), theStorage Node Watchdog is onlyavailable in MongoDB Enterprise edition.
The Storage Node Watchdog monitors the following MongoDB directories todetect filesystem unresponsiveness:
- The
—dbpath
directory - The
journal
directory inside the—dbpath
directory ifjournaling
is enabled - The directory of
—logpath
file - The directory of
—auditPath
file
By default, the Storage Node Watchdog is disabled. You can only enablethe Storage Node Watchdog on a mongod
at startup time bysetting the watchdogPeriodSeconds
parameter to an integergreater than or equal to 60. However, once enabled, you can pause theStorage Node Watchdog and restart during runtime. SeewatchdogPeriodSeconds
parameter for details.
If any of the filesystems containing the monitored directories becomeunresponsive, the Storage Node Watchdog terminates themongod
and exits with a status code of 61. If themongod
is the primary of a replica set, thetermination initiates a failover, allowing another member tobecome primary.
Once a mongod
has terminated, it may not be possible to cleanlyrestart it on the same machine.
The maximum time the Storage Node Watchdog cantake to detect an unresponsive filesystem and terminate is nearly twice thevalue of watchdogPeriodSeconds
.