Database access optimization

Database access optimization

Django's database layer provides various ways to help developers get the mostout of their databases. This document gathers together links to the relevantdocumentation, and adds various tips, organized under a number of headings thatoutline the steps to take when attempting to optimize your database usage.

Profile first

As general programming practice, this goes without saying. Find out whatqueries you are doing and what they are costing you.Use QuerySet.explain() to understand how specific QuerySets areexecuted by your database. You may also want to use an external project likedjango-debug-toolbar, or a tool that monitors your database directly.

Remember that you may be optimizing for speed or memory or both, depending onyour requirements. Sometimes optimizing for one will be detrimental to theother, but sometimes they will help each other. Also, work that is done by thedatabase process might not have the same cost (to you) as the same amount ofwork done in your Python process. It is up to you to decide what yourpriorities are, where the balance must lie, and profile all of these as requiredsince this will depend on your application and server.

With everything that follows, remember to profile after every change to ensurethat the change is a benefit, and a big enough benefit given the decrease inreadability of your code. All of the suggestions below come with the caveatthat in your circumstances the general principle might not apply, or might evenbe reversed.

Use standard DB optimization techniques

…including:

Indexes. This is a number one priority, after you have determined fromprofiling what indexes should be added. UseField.db_index orMeta.index_together to addthese from Django. Consider adding indexes to fields that you frequentlyquery using filter(),exclude(),order_by(), etc. as indexes may helpto speed up lookups. Note that determining the best indexes is a complexdatabase-dependent topic that will depend on your particular application.The overhead of maintaining an index may outweigh any gains in query speed.
Appropriate use of field types.
We will assume you have done the obvious things above. The rest of this documentfocuses on how to use Django in such a way that you are not doing unnecessarywork. This document also does not address other optimization techniques thatapply to all expensive operations, such as general purpose caching.

Understand QuerySets

Understanding QuerySets is vital to getting goodperformance with simple code. In particular:

Understand QuerySet evaluation

To avoid performance problems, it is important to understand:

Understand cached attributes

As well as caching of the whole QuerySet, there is caching of the result ofattributes on ORM objects. In general, attributes that are not callable will becached. For example, assuming the example Weblog models:

>>> entry = Entry.objects.get(id=1)
>>> entry.blog   # Blog object is retrieved at this point
>>> entry.blog   # cached version, no DB access

But in general, callable attributes cause DB lookups every time:

>>> entry = Entry.objects.get(id=1)
>>> entry.authors.all()   # query performed
>>> entry.authors.all()   # query performed again

Be careful when reading template code - the template system does not allow useof parentheses, but will call callables automatically, hiding the abovedistinction.

Be careful with your own custom properties - it is up to you to implementcaching when required, for example using thecached_property decorator.

Use the with template tag

To make use of the caching behavior of QuerySet, you may need to use thewith template tag.

Use iterator()

When you have a lot of objects, the caching behavior of the QuerySet cancause a large amount of memory to be used. In this case,iterator() may help.

Use explain()

QuerySet.explain() gives you detailed information about how the databaseexecutes a query, including indexes and joins that are used. These details mayhelp you find queries that could be rewritten more efficiently, or identifyindexes that could be added to improve performance.

Do database work in the database rather than in Python

For instance:

At the most basic level, use filter and exclude to dofiltering in the database.
Use F expressions to filterbased on other fields within the same model.
Use annotate to do aggregation in the database.
If these aren't enough to generate the SQL you need:

Use RawSQL

A less portable but more powerful method is theRawSQL expression, which allows some SQLto be explicitly added to the query. If that still isn't powerful enough:

Use raw SQL

Write your own custom SQL to retrieve data or populate models. Use django.db.connection.queries to find out what Djangois writing for you and start from there.

Retrieve individual objects using a unique, indexed column

There are two reasons to use a column withunique ordb_index when usingget() to retrieve individual objects.First, the query will be quicker because of the underlying database index.Also, the query could run much slower if multiple objects match the lookup;having a unique constraint on the column guarantees this will never happen.

So using the example Weblog models:

>>> entry = Entry.objects.get(id=10)

will be quicker than:

>>> entry = Entry.objects.get(headline="News Item Title")

because id is indexed by the database and is guaranteed to be unique.

Doing the following is potentially quite slow:

>>> entry = Entry.objects.get(headline__startswith="News")

First of all, headline is not indexed, which will make the underlyingdatabase fetch slower.

Second, the lookup doesn't guarantee that only one object will be returned.If the query matches more than one object, it will retrieve and transfer all ofthem from the database. This penalty could be substantial if hundreds orthousands of records are returned. The penalty will be compounded if thedatabase lives on a separate server, where network overhead and latency alsoplay a factor.

Retrieve everything at once if you know you will need it

Hitting the database multiple times for different parts of a single 'set' ofdata that you will need all parts of is, in general, less efficient thanretrieving it all in one query. This is particularly important if you have aquery that is executed in a loop, and could therefore end up doing many databasequeries, when only one was needed. So:

Use QuerySet.select_related() and prefetch_related()

Understand select_related() andprefetch_related() thoroughly, and usethem:

in managers and default managers whereappropriate. Be aware when your manager is and is not used; sometimes this istricky so don't make assumptions.
in view code or other layers, possibly making use ofprefetch_related_objects() where needed.

Don't retrieve things you don't need

Use QuerySet.values() and values_list()

When you just want a dict or list of values, and don't need ORM modelobjects, make appropriate usage ofvalues().These can be useful for replacing model objects in template code - as long asthe dicts you supply have the same attributes as those used in the template,you are fine.

使用 QuerySet.defer() 和 only()

Use defer() andonly() if there are database columnsyou know that you won't need (or won't need in most cases) to avoid loadingthem. Note that if you do use them, the ORM will have to go and get them ina separate query, making this a pessimization if you use it inappropriately.

Also, be aware that there is some (small extra) overhead incurred insideDjango when constructing a model with deferred fields. Don't be too aggressivein deferring fields without profiling as the database has to read most of thenon-text, non-VARCHAR data from the disk for a single row in the results, evenif it ends up only using a few columns. The defer() and only() methodsare most useful when you can avoid loading a lot of text data or for fieldsthat might take a lot of processing to convert back to Python. As always,profile first, then optimize.

使用 QuerySet.exists()

…if you only want the count, rather than doing len(queryset).

使用 QuerySet.exists()

…if you only want to find out if at least one result exists, rather than if queryset.

但是：

请不要过度使用 count() 和 exists()

If you are going to need other data from the QuerySet, just evaluate it.

For example, assuming an Email model that has a body attribute and amany-to-many relation to User, the following template code is optimal:

{% if display_inbox %}
  {% with emails=user.emails.all %}
    {% if emails %}
      <p>You have {{ emails|length }} email(s)</p>
      {% for email in emails %}
        <p>{{ email.body }}</p>
      {% endfor %}
    {% else %}
      <p>No messages today.</p>
    {% endif %}
  {% endwith %}
{% endif %}

It is optimal because:

Since QuerySets are lazy, this does no database queries if 'display_inbox'is False.
Use of with means that we store user.emails.all in a variablefor later use, allowing its cache to be re-used.
The line {% if emails %} causes QuerySet.bool() to be called,which causes the user.emails.all() query to be run on the database, andat the least the first line to be turned into an ORM object. If there aren'tany results, it will return False, otherwise True.
The use of {{ emails|length }} calls QuerySet.len(), fillingout the rest of the cache without doing another query.
The for loop iterates over the already filled cache.
In total, this code does either one or zero database queries. The onlydeliberate optimization performed is the use of the with tag. UsingQuerySet.exists() or QuerySet.count() at any point would causeadditional queries.

Use QuerySet.update() and delete()

Rather than retrieve a load of objects, set some values, and save themindividual, use a bulk SQL UPDATE statement, via QuerySet.update(). Similarly, do bulk deletes where possible.

Note, however, that these bulk update methods cannot call the save() ordelete() methods of individual instances, which means that any custombehavior you have added for these methods will not be executed, includinganything driven from the normal database object signals.

Use foreign key values directly

If you only need a foreign key value, use the foreign key value that is already onthe object you've got, rather than getting the whole related object and takingits primary key. i.e. do:

entry.blog_id

替换成：

entry.blog.id

Don't order results if you don't care

Ordering is not free; each field to order by is an operation the database mustperform. If a model has a default ordering (Meta.ordering) and you don't need it, removeit on a QuerySet by callingorder_by() with no parameters.

Adding an index to your database may help to improve ordering performance.

Insert in bulk

When creating objects, where possible, use thebulk_create() method to reduce thenumber of SQL queries. For example:

Entry.objects.bulk_create([
    Entry(headline='This is a test'),
    Entry(headline='This is only a test'),
])

…is preferable to:

Entry.objects.create(headline='This is a test')
Entry.objects.create(headline='This is only a test')

Note that there are a number of caveats to this method, so make sure it's appropriatefor your use case.

This also applies to ManyToManyFields, so doing:

my_band.members.add(me, my_friend)

…is preferable to:

my_band.members.add(me)
my_band.members.add(my_friend)

…where Bands and Artists have a many-to-many relationship.