Population

There are no joins in MongoDB but sometimes we still want references to documents in other collections. This is where population comes in.

Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query. Let’s look at some examples.

  1. var mongoose = require('mongoose')
  2. , Schema = mongoose.Schema
  3. var personSchema = Schema({
  4. _id : Number,
  5. name : String,
  6. age : Number,
  7. stories : [{ type: Schema.Types.ObjectId, ref: 'Story' }]
  8. });
  9. var storySchema = Schema({
  10. _creator : { type: Number, ref: 'Person' },
  11. title : String,
  12. fans : [{ type: Number, ref: 'Person' }]
  13. });
  14. var Story = mongoose.model('Story', storySchema);
  15. var Person = mongoose.model('Person', personSchema);

So far we’ve created two Models. Our Person model has it’s stories field set to an array of ObjectIds. The ref option is what tells Mongoose which model to use during population, in our case the Story model. All _ids we store here must be document _ids from the Story model. We also declared the Story _creator property as a Number, the same type as the _id used in the personSchema. It is important to match the type of _id to the type of ref.

Note: ObjectId, Number, String, and Buffer are valid for use as refs.

Saving refs

Saving refs to other documents works the same way you normally save properties, just assign the _id value:

  1. var aaron = new Person({ _id: 0, name: 'Aaron', age: 100 });
  2. aaron.save(function (err) {
  3. if (err) return handleError(err);
  4. var story1 = new Story({
  5. title: "Once upon a timex.",
  6. _creator: aaron._id // assign the _id from the person
  7. });
  8. story1.save(function (err) {
  9. if (err) return handleError(err);
  10. // thats it!
  11. });
  12. })

Population

So far we haven’t done anything much different. We’ve merely created a Person and a Story. Now let’s take a look at populating our story’s _creator using the query builder:

  1. Story
  2. .findOne({ title: 'Once upon a timex.' })
  3. .populate('_creator')
  4. .exec(function (err, story) {
  5. if (err) return handleError(err);
  6. console.log('The creator is %s', story._creator.name);
  7. // prints "The creator is Aaron"
  8. })

Populated paths are no longer set to their original _id , their value is replaced with the mongoose document returned from the database by performing a separate query before returning the results.

Arrays of refs work the same way. Just call the populate method on the query and an array of documents will be returned in place of the original _ids.

Note: mongoose >= 3.6 exposes the original _ids used during population through the document#populated() method.

Field selection

What if we only want a few specific fields returned for the populated documents? This can be accomplished by passing the usual field name syntax as the second argument to the populate method:

  1. Story
  2. .findOne({ title: /timex/i })
  3. .populate('_creator', 'name') // only return the Persons name
  4. .exec(function (err, story) {
  5. if (err) return handleError(err);
  6. console.log('The creator is %s', story._creator.name);
  7. // prints "The creator is Aaron"
  8. console.log('The creators age is %s', story._creator.age);
  9. // prints "The creators age is null'
  10. })

Populating multiple paths

What if we wanted to populate multiple paths at the same time?

  1. Story
  2. .find(...)
  3. .populate('fans author') // space delimited path names
  4. .exec()

In mongoose >= 3.6, we can pass a space delimited string of path names to populate. Before 3.6 you must execute the populate() method multiple times.

  1. Story
  2. .find(...)
  3. .populate('fans')
  4. .populate('author')
  5. .exec()

Query conditions and other options

What if we wanted to populate our fans array based on their age, select just their names, and return at most, any 5 of them?

  1. Story
  2. .find(...)
  3. .populate({
  4. path: 'fans',
  5. match: { age: { $gte: 21 }},
  6. select: 'name -_id',
  7. options: { limit: 5 }
  8. })
  9. .exec()

Refs to children

We may find however, if we use the aaron object, we are unable to get a list of the stories. This is because no story objects were ever ‘pushed’ onto aaron.stories.

There are two perspectives here. First, it’s nice to have aaron know which stories are his.

  1. aaron.stories.push(story1);
  2. aaron.save(callback);

This allows us to perform a find and populate combo:

  1. Person
  2. .findOne({ name: 'Aaron' })
  3. .populate('stories') // only works if we pushed refs to children
  4. .exec(function (err, person) {
  5. if (err) return handleError(err);
  6. console.log(person);
  7. })

It is debatable that we really want two sets of pointers as they may get out of sync. Instead we could skip populating and directly find() the stories we are interested in.

  1. Story
  2. .find({ _creator: aaron._id })
  3. .exec(function (err, stories) {
  4. if (err) return handleError(err);
  5. console.log('The stories are an array: ', stories);
  6. })

Updating refs

Now that we have a story we realized that the _creator was incorrect. We can update refs the same as any other property through Mongoose’s internal casting:

  1. var guille = new Person({ name: 'Guillermo' });
  2. guille.save(function (err) {
  3. if (err) return handleError(err);
  4. story._creator = guille;
  5. console.log(story._creator.name);
  6. // prints "Guillermo" in mongoose >= 3.6
  7. // see https://github.com/LearnBoost/mongoose/wiki/3.6-release-notes
  8. story.save(function (err) {
  9. if (err) return handleError(err);
  10. Story
  11. .findOne({ title: /timex/i })
  12. .populate({ path: '_creator', select: 'name' })
  13. .exec(function (err, story) {
  14. if (err) return handleError(err);
  15. console.log('The creator is %s', story._creator.name)
  16. // prints "The creator is Guillermo"
  17. })
  18. })
  19. })

The documents returned from query population become fully functional, removeable, saveable documents unless the lean option is specified. Do not confuse them with sub docs. Take caution when calling its remove method because you’ll be removing it from the database, not just the array.

Populating an existing document

If we have an existing mongoose document and want to populate some of its paths, mongoose >= 3.6 supports the document#populate() method.

Populating multiple existing documents

If we have one or many mongoose documents or even plain objects (like mapReduce output), we may populate them using the Model.populate() method available in mongoose >= 3.6. This is what document#populate() and query#populate() use to populate documents.

Field selection difference from v2 to v3

Field selection in v3 is slightly different than v2. Arrays of fields are no longer accepted. See the migration guide and the example below for more detail.

  1. // this works in v3
  2. Story.findOne(..).populate('_creator', 'name age').exec(..);
  3. // this doesn't
  4. Story.findOne(..).populate('_creator', ['name', 'age']).exec(..);

Next Up

Now that we’ve covered query population, let’s take a look at connections.