Aggregation Pipeline and Sharded Collections
The aggregation pipeline supports operations on sharded collections. This section describes behaviorsspecific to the aggregation pipeline andsharded collections.
Behavior
Changed in version 3.2.
If the pipeline starts with an exact $match
on a shard key,the entire pipeline runs on the matching shard only. Previously, thepipeline would have been split, and the work of merging it would haveto be done on the primary shard.
For aggregation operations that must run on multiple shards, if theoperations do not require running on the database’s primary shard,these operations will route the results to a random shard to merge theresults to avoid overloading the primary shard for that database. The$out
stage and the $lookup
stage requirerunning on the database’s primary shard.
Optimization
When splitting the aggregation pipeline into two parts, the pipeline issplit to ensure that the shards perform as many stages as possible withconsideration for optimization.
To see how the pipeline was split, include the explain
option in thedb.collection.aggregate()
method.
Optimizations are subject to change between releases.