Aggregation with User Preference Data
Data Model
Consider a hypothetical sports club with a database that contains ausers
collection that tracks the user’s join dates, sport preferences,and stores these data in documents that resemble the following:
- {
- _id : "jane",
- joined : ISODate("2011-03-02"),
- likes : ["golf", "racquetball"]
- }
- {
- _id : "joe",
- joined : ISODate("2012-07-02"),
- likes : ["tennis", "golf", "swimming"]
- }
Normalize and Sort Documents
The following operation returns user names in upper case and inalphabetical order. The aggregation includes user names for all documents inthe users
collection. You might do this to normalize user names forprocessing.
- db.users.aggregate(
- [
- { $project : { name:{$toUpper:"$_id"} , _id:0 } },
- { $sort : { name : 1 } }
- ]
- )
All documents from the users
collection pass through thepipeline, which consists of the following operations:
The results of the aggregation would resemble the following:
- {
- "name" : "JANE"
- },
- {
- "name" : "JILL"
- },
- {
- "name" : "JOE"
- }
Return Usernames Ordered by Join Month
The following aggregation operation returns user names sorted by themonth they joined. This kind of aggregation could help generatemembership renewal notices.
- db.users.aggregate(
- [
- { $project :
- {
- month_joined : { $month : "$joined" },
- name : "$_id",
- _id : 0
- }
- },
- { $sort : { month_joined : 1 } }
- ]
- )
The pipeline passes all documents in the users
collection throughthe following operations:
- The
$project
operator:- Creates two new fields:
month_joined
andname
. - Suppresses the
id
from the results. Theaggregate()
method includes the_id
, unlessexplicitly suppressed.
- Creates two new fields:
- The
$month
operator converts the values of thejoined
field to integer representations of the month. Then the$project
operator assigns those values to themonth_joined
field. - The
$sort
operator sorts the results by themonth_joined
field.
The operation returns results that resemble the following:
- {
- "month_joined" : 1,
- "name" : "ruth"
- },
- {
- "month_joined" : 1,
- "name" : "harold"
- },
- {
- "month_joined" : 1,
- "name" : "kate"
- }
- {
- "month_joined" : 2,
- "name" : "jill"
- }
Return Total Number of Joins per Month
The following operation shows how many people joined each month of theyear. You might use this aggregated data for recruiting and marketingstrategies.
- db.users.aggregate(
- [
- { $project : { month_joined : { $month : "$joined" } } } ,
- { $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
- { $sort : { "_id.month_joined" : 1 } }
- ]
- )
The pipeline passes all documents in the users
collection throughthe following operations:
- The
$project
operator creates a new field calledmonth_joined
. - The
$month
operator converts the values of thejoined
field to integer representations of the month. Then the$project
operator assigns the values to themonth_joined
field. - The
$group
operator collects all documents with agivenmonth_joined
value and counts how many documents there arefor that value. Specifically, for each unique value,$group
creates a new “per-month” document with twofields:_id
, which contains a nested document with themonth_joined
field and its value.number
, which is a generated field. The$sum
operator increments this field by 1 for every document containingthe givenmonth_joined
value.
- The
$sort
operator sorts the documents created by$group
according to the contents of themonth_joined
field.
The result of this aggregation operation would resemble the following:
- {
- "_id" : {
- "month_joined" : 1
- },
- "number" : 3
- },
- {
- "_id" : {
- "month_joined" : 2
- },
- "number" : 9
- },
- {
- "_id" : {
- "month_joined" : 3
- },
- "number" : 5
- }
Return the Five Most Common “Likes”
The following aggregation collects top five most “liked” activities inthe data set. This type of analysis could help inform planning andfuture development.
- db.users.aggregate(
- [
- { $unwind : "$likes" },
- { $group : { _id : "$likes" , number : { $sum : 1 } } },
- { $sort : { number : -1 } },
- { $limit : 5 }
- ]
- )
The pipeline begins with all documents in the users
collection,and passes these documents through the following operations:
- The
$unwind
operator separates each value in thelikes
array, and creates a new version of the source documentfor every element in the array.
Example
Given the following document from the users
collection:
- {
- _id : "jane",
- joined : ISODate("2011-03-02"),
- likes : ["golf", "racquetball"]
- }
The $unwind
operator would create the followingdocuments:
- {
- _id : "jane",
- joined : ISODate("2011-03-02"),
- likes : "golf"
- }
- {
- _id : "jane",
- joined : ISODate("2011-03-02"),
- likes : "racquetball"
- }
The
$group
operator collects all documents with the samevalue for thelikes
field and counts each grouping. With thisinformation,$group
creates a new document with twofields:_id
, which contains thelikes
value.number
, which is a generated field. The$sum
operator increments this field by 1 for every document containingthe givenlikes
value.
The
$sort
operator sorts these documents by thenumber
field in reverse order.The
$limit
operator only includes the first 5 resultdocuments.
The results of aggregation would resemble the following:
- {
- "_id" : "golf",
- "number" : 33
- },
- {
- "_id" : "racquetball",
- "number" : 31
- },
- {
- "_id" : "swimming",
- "number" : 24
- },
- {
- "_id" : "handball",
- "number" : 19
- },
- {
- "_id" : "tennis",
- "number" : 18
- }