Aggregation Pipeline Optimization
Aggregation pipeline operations have an optimization phase whichattempts to reshape the pipeline for improved performance.
To see how the optimizer transforms a particular aggregation pipeline,include the explain
option in thedb.collection.aggregate()
method.
Optimizations are subject to change between releases.
Projection Optimization
The aggregation pipeline can determine if it requires only a subset ofthe fields in the documents to obtain the results. If so, the pipelinewill only use those required fields, reducing the amount of datapassing through the pipeline.
Pipeline Sequence Optimization
($project or $unset or $addFields or $set) + $match Sequence Optimization
For an aggregation pipeline that contains a projection stage($project
or $unset
or$addFields
or $set
) followed by a$match
stage, MongoDB moves any filters in the$match
stage that do not require values computed in theprojection stage to a new $match
stage before theprojection.
If an aggregation pipeline contains multiple projection and/or$match
stages, MongoDB performs this optimization for each$match
stage, moving each $match
filter beforeall projection stages that the filter does not depend on.
Consider a pipeline of the following stages:
- { $addFields: {
- maxTime: { $max: "$times" },
- minTime: { $min: "$times" }
- } },
- { $project: {
- _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
- avgTime: { $avg: ["$maxTime", "$minTime"] }
- } },
- { $match: {
- name: "Joe Schmoe",
- maxTime: { $lt: 20 },
- minTime: { $gt: 5 },
- avgTime: { $gt: 7 }
- } }
The optimizer breaks up the $match
stage into fourindividual filters, one for each key in the $match
querydocument. The optimizer then moves each filter before as many projectionstages as possible, creating new $match
stages as needed.Given this example, the optimizer produces the following _optimized_pipeline:
- { $match: { name: "Joe Schmoe" } },
- { $addFields: {
- maxTime: { $max: "$times" },
- minTime: { $min: "$times" }
- } },
- { $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
- { $project: {
- _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
- avgTime: { $avg: ["$maxTime", "$minTime"] }
- } },
- { $match: { avgTime: { $gt: 7 } } }
The $match
filter { avgTime: { $gt: 7 } }
depends on the$project
stage to compute the avgTime
field. The$project
stage is the last projection stage in thispipeline, so the $match
filter on avgTime
could not bemoved.
The maxTime
and minTime
fields are computed in the$addFields
stage but have no dependency on the$project
stage. The optimizer created a new$match
stage for the filters on these fields and placed itbefore the $project
stage.
The $match
filter { name: "Joe Schmoe" }
does notuse any values computed in either the $project
or$addFields
stages so it was moved to a new$match
stage before both of the projection stages.
Note
After optimization, the filter { name: "Joe Schmoe" }
is ina $match
stage at the beginning of the pipeline. This hasthe added benefit of allowing the aggregation to use an index on thename
field when initially querying the collection.See Pipeline Operators and Indexes for moreinformation.
$sort + $match Sequence Optimization
When you have a sequence with $sort
followed by a$match
, the $match
moves before the$sort
to minimize the number of objects to sort. Forexample, if the pipeline consists of the following stages:
- { $sort: { age : -1 } },
- { $match: { status: 'A' } }
During the optimization phase, the optimizer transforms the sequence tothe following:
- { $match: { status: 'A' } },
- { $sort: { age : -1 } }
$redact + $match Sequence Optimization
When possible, when the pipeline has the $redact
stageimmediately followed by the $match
stage, the aggregationcan sometimes add a portion of the $match
stage before the$redact
stage. If the added $match
stage is atthe start of a pipeline, the aggregation can use an index as well asquery the collection to limit the number of documents that enter thepipeline. See Pipeline Operators and Indexes formore information.
For example, if the pipeline consists of the following stages:
- { $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
- { $match: { year: 2014, category: { $ne: "Z" } } }
The optimizer can add the same $match
stage before the$redact
stage:
- { $match: { year: 2014 } },
- { $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
- { $match: { year: 2014, category: { $ne: "Z" } } }
$project/$unset + $skip Sequence Optimization
New in version 3.2.
When you have a sequence with $project
or $unset
followed by$skip
, the $skip
moves before $project
. For example, ifthe pipeline consists of the following stages:
- { $sort: { age : -1 } },
- { $project: { status: 1, name: 1 } },
- { $skip: 5 }
During the optimization phase, the optimizer transforms the sequence tothe following:
- { $sort: { age : -1 } },
- { $skip: 5 },
- { $project: { status: 1, name: 1 } }
Pipeline Coalescence Optimization
When possible, the optimization phase coalesces a pipeline stage intoits predecessor. Generally, coalescence occurs after any sequencereordering optimization.
$sort + $limit Coalescence
Changed in version 4.0.
When a $sort
precedes a $limit
, the optimizercan coalesce the $limit
into the $sort
if nointervening stages modify the number of documents(e.g. $unwind
, $group
).MongoDB will not coalesce the $limit
into the$sort
if there are pipeline stages that change the number ofdocuments between the $sort
and $limit
stages..
For example, if the pipeline consists of the following stages:
- { $sort : { age : -1 } },
- { $project : { age : 1, status : 1, name : 1 } },
- { $limit: 5 }
During the optimization phase, the optimizer coalesces the sequenceto the following:
- {
- "$sort" : {
- "sortKey" : {
- "age" : -1
- },
- "limit" : NumberLong(5)
- }
- },
- { "$project" : {
- "age" : 1,
- "status" : 1,
- "name" : 1
- }
- }
This allows the sort operation to only maintain thetop n
results as it progresses, where n
is the specified limit,and MongoDB only needs to store n
items in memory[1]. See $sort Operator and Memory for moreinformation.
Sequence Optimization with $skip
If there is a $skip
stage between the $sort
and $limit
stages, MongoDB will coalesce the$limit
into the $sort
stage and increase the$limit
value by the $skip
amount. See$sort + $skip + $limit Sequence for an example.
[1] | The optimization will still apply whenallowDiskUse is true and the n items exceed theaggregation memory limit. |
$limit + $limit Coalescence
When a $limit
immediately follows another$limit
, the two stages can coalesce into a single$limit
where the limit amount is the smaller of the twoinitial limit amounts. For example, a pipeline contains the followingsequence:
- { $limit: 100 },
- { $limit: 10 }
Then the second $limit
stage can coalesce into the first$limit
stage and result in a single $limit
stage where the limit amount 10
is the minimum of the two initiallimits 100
and 10
.
- { $limit: 10 }
$skip + $skip Coalescence
When a $skip
immediately follows another $skip
,the two stages can coalesce into a single $skip
where theskip amount is the sum of the two initial skip amounts. For example, apipeline contains the following sequence:
- { $skip: 5 },
- { $skip: 2 }
Then the second $skip
stage can coalesce into the first$skip
stage and result in a single $skip
stage where the skip amount 7
is the sum of the two initiallimits 5
and 2
.
- { $skip: 7 }
$match + $match Coalescence
When a $match
immediately follows another$match
, the two stages can coalesce into a single$match
combining the conditions with an$and
. For example, a pipeline contains the followingsequence:
- { $match: { year: 2014 } },
- { $match: { status: "A" } }
Then the second $match
stage can coalesce into the first$match
stage and result in a single $match
stage
- { $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
$lookup + $unwind Coalescence
New in version 3.2.
When a $unwind
immediately follows another$lookup
, and the $unwind
operates on the as
field of the $lookup
, the optimizer can coalesce the$unwind
into the $lookup
stage. This avoidscreating large intermediate documents.
For example, a pipeline contains the following sequence:
- {
- $lookup: {
- from: "otherCollection",
- as: "resultingArray",
- localField: "x",
- foreignField: "y"
- }
- },
- { $unwind: "$resultingArray"}
The optimizer can coalesce the $unwind
stage into the$lookup
stage. If you run the aggregation with explain
option, the explain
output shows the coalesced stage:
- {
- $lookup: {
- from: "otherCollection",
- as: "resultingArray",
- localField: "x",
- foreignField: "y",
- unwinding: { preserveNullAndEmptyArrays: false }
- }
- }
Example
$sort + $skip + $limit Sequence
A pipeline contains a sequence of $sort
followed by a$skip
followed by a $limit
:
- { $sort: { age : -1 } },
- { $skip: 10 },
- { $limit: 5 }
The optimizer performs $sort + $limit Coalescence totransforms the sequence to the following:
- {
- "$sort" : {
- "sortKey" : {
- "age" : -1
- },
- "limit" : NumberLong(15)
- }
- },
- {
- "$skip" : NumberLong(10)
- }
MongoDB increases the $limit
amount with the reordering.
See also
explain
option in thedb.collection.aggregate()