Avoiding the N+1 problem

The N+1 problem refers to a situation where an application performs a query, then for each row of the result set, the application performs at least one other query (another way to conceptualize this is as a nested loop). In many cases, these n queries can be avoided through the use of a SQL join or subquery. The database itself may do a nested loop, but it will usually be more performant than doing n queries in your application code, which involves latency communicating with the database and may not take advantage of indices or other optimizations employed by the database when joining or executing a subquery.

Peewee provides several APIs for mitigating N+1 query behavior. Recollecting the models used throughout this document, User and Tweet, this section will try to outline some common N+1 scenarios, and how peewee can help you avoid them.

Attention

In some cases, N+1 queries will not result in a significant or measurable performance hit. It all depends on the data you are querying, the database you are using, and the latency involved in executing queries and retrieving results. As always when making optimizations, profile before and after to ensure the changes do what you expect them to.

List recent tweets

The twitter timeline displays a list of tweets from multiple users. In addition to the tweet’s content, the username of the tweet’s author is also displayed. The N+1 scenario here would be:

  1. Fetch the 10 most recent tweets.
  2. For each tweet, select the author (10 queries).

By selecting both tables and using a join, peewee makes it possible to accomplish this in a single query:

  1. query = (Tweet
  2. .select(Tweet, User) # Note that we are selecting both models.
  3. .join(User) # Use an INNER join because every tweet has an author.
  4. .order_by(Tweet.id.desc()) # Get the most recent tweets.
  5. .limit(10))
  6. for tweet in query:
  7. print(tweet.user.username, '-', tweet.message)

Without the join, accessing tweet.user.username would trigger a query to resolve the foreign key tweet.user and retrieve the associated user. But since we have selected and joined on User, peewee will automatically resolve the foreign-key for us.

Note

This technique is discussed in more detail in Selecting from multiple sources.

List users and all their tweets

Let’s say you want to build a page that shows several users and all of their tweets. The N+1 scenario would be:

  1. Fetch some users.
  2. For each user, fetch their tweets.

This situation is similar to the previous example, but there is one important difference: when we selected tweets, they only have a single associated user, so we could directly assign the foreign key. The reverse is not true, however, as one user may have any number of tweets (or none at all).

Peewee provides an approach to avoiding O(n) queries in this situation. Fetch users first, then fetch all the tweets associated with those users. Once peewee has the big list of tweets, it will assign them out, matching them with the appropriate user. This method is usually faster but will involve a query for each table being selected.

Using prefetch

peewee supports pre-fetching related data using sub-queries. This method requires the use of a special API, prefetch(). Prefetch, as its name implies, will eagerly load the appropriate tweets for the given users using subqueries. This means instead of O(n) queries for n rows, we will do O(k) queries for k tables.

Here is an example of how we might fetch several users and any tweets they created within the past week.

  1. week_ago = datetime.date.today() - datetime.timedelta(days=7)
  2. users = User.select()
  3. tweets = (Tweet
  4. .select()
  5. .where(Tweet.timestamp >= week_ago))
  6. # This will perform two queries.
  7. users_with_tweets = prefetch(users, tweets)
  8. for user in users_with_tweets:
  9. print(user.username)
  10. for tweet in user.tweets:
  11. print(' ', tweet.message)

Note

Note that neither the User query, nor the Tweet query contained a JOIN clause. When using prefetch() you do not need to specify the join.

prefetch() can be used to query an arbitrary number of tables. Check the API documentation for more examples.

Some things to consider when using prefetch():

  • Foreign keys must exist between the models being prefetched.
  • LIMIT works as you’d expect on the outer-most query, but may be difficult to implement correctly if trying to limit the size of the sub-selects.