Full text search
The database functions in the django.contrib.postgres.search
module ease the use of PostgreSQL’s full text search engine.
For the examples in this document, we’ll use the models defined in Making queries.
See also
For a high-level overview of searching, see the topic documentation.
The search
lookup
A common way to use full text search is to search a single term against a single column in the database. For example:
>>> Entry.objects.filter(body_text__search='Cheese')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
This creates a to_tsvector
in the database from the body_text
field and a plainto_tsquery
from the search term 'Cheese'
, both using the default database search configuration. The results are obtained by matching the query and the vector.
To use the search
lookup, 'django.contrib.postgres'
must be in your INSTALLED_APPS
.
SearchVector
class SearchVector
(\expressions, config=None, weight=None*)
Searching against a single field is great but rather limiting. The Entry
instances we’re searching belong to a Blog
, which has a tagline
field. To query against both fields, use a SearchVector
:
>>> from django.contrib.postgres.search import SearchVector
>>> Entry.objects.annotate(
... search=SearchVector('body_text', 'blog__tagline'),
... ).filter(search='Cheese')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
The arguments to SearchVector
can be any Expression
or the name of a field. Multiple arguments will be concatenated together using a space so that the search document includes them all.
SearchVector
objects can be combined together, allowing you to reuse them. For example:
>>> Entry.objects.annotate(
... search=SearchVector('body_text') + SearchVector('blog__tagline'),
... ).filter(search='Cheese')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
See Changing the search configuration and Weighting queries for an explanation of the config
and weight
parameters.
SearchQuery
class SearchQuery
(value, config=None, search_type=’plain’)
SearchQuery
translates the terms the user provides into a search query object that the database compares to a search vector. By default, all the words the user provides are passed through the stemming algorithms, and then it looks for matches for all of the resulting terms.
If search_type
is 'plain'
, which is the default, the terms are treated as separate keywords. If search_type
is 'phrase'
, the terms are treated as a single phrase. If search_type
is 'raw'
, then you can provide a formatted search query with terms and operators. If search_type
is 'websearch'
, then you can provide a formatted search query, similar to the one used by web search engines. 'websearch'
requires PostgreSQL ≥ 11. Read PostgreSQL’s Full Text Search docs to learn about differences and syntax. Examples:
>>> from django.contrib.postgres.search import SearchQuery
>>> SearchQuery('red tomato') # two keywords
>>> SearchQuery('tomato red') # same results as above
>>> SearchQuery('red tomato', search_type='phrase') # a phrase
>>> SearchQuery('tomato red', search_type='phrase') # a different phrase
>>> SearchQuery("'tomato' & ('red' | 'green')", search_type='raw') # boolean operators
>>> SearchQuery("'tomato' ('red' OR 'green')", search_type='websearch') # websearch operators
SearchQuery
terms can be combined logically to provide more flexibility:
>>> from django.contrib.postgres.search import SearchQuery
>>> SearchQuery('meat') & SearchQuery('cheese') # AND
>>> SearchQuery('meat') | SearchQuery('cheese') # OR
>>> ~SearchQuery('meat') # NOT
See Changing the search configuration for an explanation of the config
parameter.
SearchRank
class SearchRank
(vector, query, weights=None, normalization=None, cover_density=False)
So far, we’ve returned the results for which any match between the vector and the query are possible. It’s likely you may wish to order the results by some sort of relevancy. PostgreSQL provides a ranking function which takes into account how often the query terms appear in the document, how close together the terms are in the document, and how important the part of the document is where they occur. The better the match, the higher the value of the rank. To order by relevancy:
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector('body_text')
>>> query = SearchQuery('cheese')
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
See Weighting queries for an explanation of the weights
parameter.
Set the cover_density
parameter to True
to enable the cover density ranking, which means that the proximity of matching query terms is taken into account.
Provide an integer to the normalization
parameter to control rank normalization. This integer is a bit mask, so you can combine multiple behaviors:
>>> from django.db.models import Value
>>> Entry.objects.annotate(
... rank=SearchRank(
... vector,
... query,
... normalization=Value(2).bitor(Value(4)),
... )
... )
The PostgreSQL documentation has more details about different rank normalization options.
SearchHeadline
class SearchHeadline
(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)
Accepts a single text field or an expression, a query, a config, and a set of options. Returns highlighted search results.
Set the start_sel
and stop_sel
parameters to the string values to be used to wrap highlighted query terms in the document. PostgreSQL’s defaults are <b>
and </b>
.
Provide integer values to the max_words
and min_words
parameters to determine the longest and shortest headlines. PostgreSQL’s defaults are 35 and 15.
Provide an integer value to the short_word
parameter to discard words of this length or less in each headline. PostgreSQL’s default is 3.
Set the highlight_all
parameter to True
to use the whole document in place of a fragment and ignore max_words
, min_words
, and short_word
parameters. That’s disabled by default in PostgreSQL.
Provide a non-zero integer value to the max_fragments
to set the maximum number of fragments to display. That’s disabled by default in PostgreSQL.
Set the fragment_delimiter
string parameter to configure the delimiter between fragments. PostgreSQL’s default is " ... "
.
The PostgreSQL documentation has more details on highlighting search results.
Usage example:
>>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
>>> query = SearchQuery('red tomato')
>>> entry = Entry.objects.annotate(
... headline=SearchHeadline(
... 'body_text',
... query,
... start_sel='<span>',
... stop_sel='</span>',
... ),
... ).get()
>>> print(entry.headline)
Sandwich with <span>tomato</span> and <span>red</span> cheese.
See Changing the search configuration for an explanation of the config
parameter.
Changing the search configuration
You can specify the config
attribute to a SearchVector
and SearchQuery
to use a different search configuration. This allows using different language parsers and dictionaries as defined by the database:
>>> from django.contrib.postgres.search import SearchQuery, SearchVector
>>> Entry.objects.annotate(
... search=SearchVector('body_text', config='french'),
... ).filter(search=SearchQuery('œuf', config='french'))
[<Entry: Pain perdu>]
The value of config
could also be stored in another column:
>>> from django.db.models import F
>>> Entry.objects.annotate(
... search=SearchVector('body_text', config=F('blog__language')),
... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
[<Entry: Pain perdu>]
Weighting queries
Every field may not have the same relevance in a query, so you can set weights of various vectors before you combine them:
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
>>> query = SearchQuery('cheese')
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
The weight should be one of the following letters: D, C, B, A. By default, these weights refer to the numbers 0.1
, 0.2
, 0.4
, and 1.0
, respectively. If you wish to weight them differently, pass a list of four floats to SearchRank
as weights
in the same order above:
>>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
>>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
Performance
Special database configuration isn’t necessary to use any of these functions, however, if you’re searching more than a few hundred records, you’re likely to run into performance problems. Full text search is a more intensive process than comparing the size of an integer, for example.
In the event that all the fields you’re querying on are contained within one particular model, you can create a functional index which matches the search vector you wish to use. The PostgreSQL documentation has details on creating indexes for full text search.
SearchVectorField
class SearchVectorField
If this approach becomes too slow, you can add a SearchVectorField
to your model. You’ll need to keep it populated with triggers, for example, as described in the PostgreSQL documentation. You can then query the field as if it were an annotated SearchVector
:
>>> Entry.objects.update(search_vector=SearchVector('body_text'))
>>> Entry.objects.filter(search_vector='cheese')
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
Trigram similarity
Another approach to searching is trigram similarity. A trigram is a group of three consecutive characters. In addition to the trigram_similar
and trigram_word_similar
lookups, you can use a couple of other expressions.
To use them, you need to activate the pg_trgm extension on PostgreSQL. You can install it using the TrigramExtension
migration operation.
TrigramSimilarity
class TrigramSimilarity
(expression, string, \*extra*)
Accepts a field name or expression, and a string or expression. Returns the trigram similarity between the two arguments.
Usage example:
>>> from django.contrib.postgres.search import TrigramSimilarity
>>> Author.objects.create(name='Katy Stevens')
>>> Author.objects.create(name='Stephen Keats')
>>> test = 'Katie Stephens'
>>> Author.objects.annotate(
... similarity=TrigramSimilarity('name', test),
... ).filter(similarity__gt=0.3).order_by('-similarity')
[<Author: Katy Stevens>, <Author: Stephen Keats>]
TrigramWordSimilarity
New in Django 4.0.
class TrigramWordSimilarity
(string, expression, \*extra*)
Accepts a string or expression, and a field name or expression. Returns the trigram word similarity between the two arguments.
Usage example:
>>> from django.contrib.postgres.search import TrigramWordSimilarity
>>> Author.objects.create(name='Katy Stevens')
>>> Author.objects.create(name='Stephen Keats')
>>> test = 'Kat'
>>> Author.objects.annotate(
... similarity=TrigramWordSimilarity(test, 'name'),
... ).filter(similarity__gt=0.3).order_by('-similarity')
[<Author: Katy Stevens>]
TrigramDistance
class TrigramDistance
(expression, string, \*extra*)
Accepts a field name or expression, and a string or expression. Returns the trigram distance between the two arguments.
Usage example:
>>> from django.contrib.postgres.search import TrigramDistance
>>> Author.objects.create(name='Katy Stevens')
>>> Author.objects.create(name='Stephen Keats')
>>> test = 'Katie Stephens'
>>> Author.objects.annotate(
... distance=TrigramDistance('name', test),
... ).filter(distance__lte=0.7).order_by('distance')
[<Author: Katy Stevens>, <Author: Stephen Keats>]
TrigramWordDistance
New in Django 4.0.
class TrigramWordDistance
(string, expression, \*extra*)
Accepts a string or expression, and a field name or expression. Returns the trigram word distance between the two arguments.
Usage example:
>>> from django.contrib.postgres.search import TrigramWordDistance
>>> Author.objects.create(name='Katy Stevens')
>>> Author.objects.create(name='Stephen Keats')
>>> test = 'Kat'
>>> Author.objects.annotate(
... distance=TrigramWordDistance(test, 'name'),
... ).filter(distance__lte=0.7).order_by('distance')
[<Author: Katy Stevens>]