Aggregation Pipeline
The aggregation pipeline is a framework for data aggregation modeledon the concept of data processing pipelines. Documents enter amulti-stage pipeline that transforms the documents into aggregatedresults. For example:
- db.orders.aggregate([
- { $match: { status: "A" } },
- { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
- ])
First Stage: The $match
stage filters the documents bythe status
field and passes to the next stage those documents thathave status
equal to "A"
.
Second Stage: The $group
stage groups the documents bythe cust_id
field to calculate the sum of the amount for eachunique cust_id
.
Pipeline
The MongoDB aggregation pipeline consists of stages. Each stage transforms thedocuments as they pass through the pipeline. Pipeline stages do notneed to produce one output document for every input document; e.g.,some stages may generate new documents or filter out documents.
Pipeline stages can appear multiple times in the pipeline with theexception of $out
, $merge
, and$geoNear
stages. For a listof all available stages, seeAggregation Pipeline Stages.
MongoDB provides the db.collection.aggregate()
method in themongo
shell and the aggregate
command torun the aggregation pipeline.
For example usage of the aggregation pipeline, considerAggregation with User Preference Data andAggregation with the Zip Code Data Set.
Starting in MongoDB 4.2, you can use the aggregation pipeline forupdates in:
Command | mongo Shell Methods |
---|---|
findAndModify | db.collection.findAndModify() db.collection.findOneAndUpdate() |
update | db.collection.updateOne() db.collection.updateMany() db.collection.update() |
Pipeline Expressions
Some pipeline stages take a pipeline expression as the operand.Pipeline expressions specify the transformation to apply to the inputdocuments. Expressions have a documentstructure and can contain other expression.
Pipeline expressions can only operate on the current document in thepipeline and cannot refer to data from other documents: expressionoperations provide in-memory transformation of documents.
Generally, expressions are stateless and are only evaluated when seenby the aggregation process with one exception: accumulator expressions.
The accumulators, used in the $group
stage,maintain their state (e.g. totals, maximums, minimums, and relateddata) as documents progress through the pipeline.
Changed in version 3.2: Some accumulators are available in the $project
stage;however, when used in the $project
stage, theaccumulators do not maintain their state across documents.
For more information on expressions, seeExpressions.
Aggregation Pipeline Behavior
In MongoDB, the aggregate
command operates on a singlecollection, logically passing the entire collection into theaggregation pipeline. To optimize the operation, wherever possible, usethe following strategies to avoid scanning the entire collection.
Pipeline Operators and Indexes
The $match
and $sort
pipeline operators cantake advantage of an index when they occur at the beginning of thepipeline.
The $geoNear
pipeline operator takes advantage of ageospatial index. When using $geoNear
, the$geoNear
pipeline operation must appear as the first stagein an aggregation pipeline.
Changed in version 3.2: Starting in MongoDB 3.2, indexes can cover an aggregation pipeline. In MongoDB2.6 and 3.0, indexes could not cover an aggregation pipeline sinceeven when the pipeline uses an index, aggregation still requiresaccess to the actual documents.
Early Filtering
If your aggregation operation requires only a subset of the data in acollection, use the $match
, $limit
, and$skip
stages to restrict the documents that enter at thebeginning of the pipeline. When placed at the beginning of a pipeline,$match
operations use suitable indexes to scan onlythe matching documents in a collection.
Placing a $match
pipeline stage followed by a$sort
stage at the start of the pipeline is logicallyequivalent to a single query with a sort and can use an index. Whenpossible, place $match
operators at the beginning of thepipeline.
Considerations
Sharded Collections
The aggregation pipeline supports operations on sharded collections.See Aggregation Pipeline and Sharded Collections.
Aggregation vs Map-Reduce
The aggregation pipeline provides an alternative to map-reduceand may be the preferred solution for aggregation tasks where thecomplexity of map-reduce may be unwarranted.
Limitations
Aggregation pipeline have some limitations on value types and resultsize. See Aggregation Pipeline Limits for details onlimits and restrictions on the aggregation pipeline.
Pipeline Optimization
The aggregation pipeline has an internal optimization phase thatprovides improved performance for certain sequences of operators. Fordetails, see Aggregation Pipeline Optimization.