Aggregation Pipeline

Aggregation Pipeline

The aggregation pipeline is a framework for data aggregation modeledon the concept of data processing pipelines. Documents enter amulti-stage pipeline that transforms the documents into aggregatedresults. For example:

In the example,

db.orders.aggregate([
   { $match: { status: "A" } },
   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])

First Stage: The $match stage filters the documents bythe status field and passes to the next stage those documents thathave status equal to "A".

Second Stage: The $group stage groups the documents bythe cust_id field to calculate the sum of the amount for eachunique cust_id.

Pipeline

The MongoDB aggregation pipeline consists of stages. Each stage transforms thedocuments as they pass through the pipeline. Pipeline stages do notneed to produce one output document for every input document; e.g.,some stages may generate new documents or filter out documents.

Pipeline stages can appear multiple times in the pipeline with theexception of $out, $merge, and$geoNear stages. For a listof all available stages, seeAggregation Pipeline Stages.

MongoDB provides the db.collection.aggregate() method in themongo shell and the aggregate command torun the aggregation pipeline.

For example usage of the aggregation pipeline, considerAggregation with User Preference Data andAggregation with the Zip Code Data Set.

Starting in MongoDB 4.2, you can use the aggregation pipeline forupdates in:

Command	`mongo` Shell Methods
`findAndModify`	`db.collection.findAndModify()db.collection.findOneAndUpdate()`
`update`	`db.collection.updateOne()db.collection.updateMany()db.collection.update()`

Pipeline Expressions

Some pipeline stages take a pipeline expression as the operand.Pipeline expressions specify the transformation to apply to the inputdocuments. Expressions have a documentstructure and can contain other expression.

Pipeline expressions can only operate on the current document in thepipeline and cannot refer to data from other documents: expressionoperations provide in-memory transformation of documents.

Generally, expressions are stateless and are only evaluated when seenby the aggregation process with one exception: accumulator expressions.

The accumulators, used in the $group stage,maintain their state (e.g. totals, maximums, minimums, and relateddata) as documents progress through the pipeline.

Changed in version 3.2: Some accumulators are available in the $project stage;however, when used in the $project stage, theaccumulators do not maintain their state across documents.

For more information on expressions, seeExpressions.

Aggregation Pipeline Behavior

In MongoDB, the aggregate command operates on a singlecollection, logically passing the entire collection into theaggregation pipeline. To optimize the operation, wherever possible, usethe following strategies to avoid scanning the entire collection.

Pipeline Operators and Indexes

The $match and $sort pipeline operators cantake advantage of an index when they occur at the beginning of thepipeline.

The $geoNear pipeline operator takes advantage of ageospatial index. When using $geoNear, the$geoNear pipeline operation must appear as the first stagein an aggregation pipeline.

Changed in version 3.2: Starting in MongoDB 3.2, indexes can cover an aggregation pipeline. In MongoDB2.6 and 3.0, indexes could not cover an aggregation pipeline sinceeven when the pipeline uses an index, aggregation still requiresaccess to the actual documents.

Early Filtering

If your aggregation operation requires only a subset of the data in acollection, use the $match, $limit, and$skip stages to restrict the documents that enter at thebeginning of the pipeline. When placed at the beginning of a pipeline,$match operations use suitable indexes to scan onlythe matching documents in a collection.

Placing a $match pipeline stage followed by a$sort stage at the start of the pipeline is logicallyequivalent to a single query with a sort and can use an index. Whenpossible, place $match operators at the beginning of thepipeline.

Considerations

Sharded Collections

The aggregation pipeline supports operations on sharded collections.See Aggregation Pipeline and Sharded Collections.

Aggregation vs Map-Reduce

The aggregation pipeline provides an alternative to map-reduceand may be the preferred solution for aggregation tasks where thecomplexity of map-reduce may be unwarranted.

Limitations

Aggregation pipeline have some limitations on value types and resultsize. See Aggregation Pipeline Limits for details onlimits and restrictions on the aggregation pipeline.

Pipeline Optimization

The aggregation pipeline has an internal optimization phase thatprovides improved performance for certain sequences of operators. Fordetails, see Aggregation Pipeline Optimization.