Postgresql Extensions
The postgresql extensions module provides a number of “postgres-only” functions, currently:
- json support, including jsonb for Postgres 9.4.
- hstore support
- server-side cursors
- full-text search
ArrayField
field type, for storing arrays.HStoreField
field type, for storing key/value pairs.IntervalField
field type, for storingtimedelta
objects.JSONField
field type, for storing JSON data.BinaryJSONField
field type for thejsonb
JSON data type.TSVectorField
field type, for storing full-text search data.DateTimeTZ
field type, a timezone-aware datetime field.
In the future I would like to add support for more of postgresql’s features. If there is a particular feature you would like to see added, please open a Github issue.
Warning
In order to start using the features described below, you will need to use the extension PostgresqlExtDatabase
class instead of PostgresqlDatabase
.
The code below will assume you are using the following database and base model:
from playhouse.postgres_ext import *
ext_db = PostgresqlExtDatabase('peewee_test', user='postgres')
class BaseExtModel(Model):
class Meta:
database = ext_db
JSON Support
peewee has basic support for Postgres’ native JSON data type, in the form of JSONField
. As of version 2.4.7, peewee also supports the Postgres 9.4 binary json jsonb
type, via BinaryJSONField
.
Warning
Postgres supports a JSON data type natively as of 9.2 (full support in 9.3). In order to use this functionality you must be using the correct version of Postgres with psycopg2 version 2.5 or greater.
To use BinaryJSONField
, which has many performance and querying advantages, you must have Postgres 9.4 or later.
Note
You must be sure your database is an instance of PostgresqlExtDatabase
in order to use the JSONField.
Here is an example of how you might declare a model with a JSON field:
import json
import urllib2
from playhouse.postgres_ext import *
db = PostgresqlExtDatabase('my_database')
class APIResponse(Model):
url = CharField()
response = JSONField()
class Meta:
database = db
@classmethod
def request(cls, url):
fh = urllib2.urlopen(url)
return cls.create(url=url, response=json.loads(fh.read()))
APIResponse.create_table()
# Store a JSON response.
offense = APIResponse.request('http://crime-api.com/api/offense/')
booking = APIResponse.request('http://crime-api.com/api/booking/')
# Query a JSON data structure using a nested key lookup:
offense_responses = APIResponse.select().where(
APIResponse.response['meta']['model'] == 'offense')
# Retrieve a sub-key for each APIResponse. By calling .as_json(), the
# data at the sub-key will be returned as Python objects (dicts, lists,
# etc) instead of serialized JSON.
q = (APIResponse
.select(
APIResponse.data['booking']['person'].as_json().alias('person'))
.where(APIResponse.data['meta']['model'] == 'booking'))
for result in q:
print(result.person['name'], result.person['dob'])
The BinaryJSONField
works the same and supports the same operations as the regular JSONField
, but provides several additional operations for testing containment. Using the binary json field, you can test whether your JSON data contains other partial JSON structures (contains()
, contains_any()
, contains_all()
), or whether it is a subset of a larger JSON document (contained_by()
).
For more examples, see the JSONField
and BinaryJSONField
API documents below.
hstore support
Postgresql hstore is an embedded key/value store. With hstore, you can store arbitrary key/value pairs in your database alongside structured relational data.
To use hstore
, you need to specify an additional parameter when instantiating your PostgresqlExtDatabase
:
# Specify "register_hstore=True":
db = PostgresqlExtDatabase('my_db', register_hstore=True)
Currently the postgres_ext
module supports the following operations:
- Store and retrieve arbitrary dictionaries
- Filter by key(s) or partial dictionary
- Update/add one or more keys to an existing dictionary
- Delete one or more keys from an existing dictionary
- Select keys, values, or zip keys and values
- Retrieve a slice of keys/values
- Test for the existence of a key
- Test that a key has a non-NULL value
Using hstore
To start with, you will need to import the custom database class and the hstore functions from playhouse.postgres_ext
(see above code snippet). Then, it is as simple as adding a HStoreField
to your model:
class House(BaseExtModel):
address = CharField()
features = HStoreField()
You can now store arbitrary key/value pairs on House
instances:
>>> h = House.create(
... address='123 Main St',
... features={'garage': '2 cars', 'bath': '2 bath'})
...
>>> h_from_db = House.get(House.id == h.id)
>>> h_from_db.features
{'bath': '2 bath', 'garage': '2 cars'}
You can filter by individual key, multiple keys or partial dictionary:
>>> query = House.select()
>>> garage = query.where(House.features.contains('garage'))
>>> garage_and_bath = query.where(House.features.contains(['garage', 'bath']))
>>> twocar = query.where(House.features.contains({'garage': '2 cars'}))
Suppose you want to do an atomic update to the house:
>>> new_features = House.features.update({'bath': '2.5 bath', 'sqft': '1100'})
>>> query = House.update(features=new_features)
>>> query.where(House.id == h.id).execute()
1
>>> h = House.get(House.id == h.id)
>>> h.features
{'bath': '2.5 bath', 'garage': '2 cars', 'sqft': '1100'}
Or, alternatively an atomic delete:
>>> query = House.update(features=House.features.delete('bath'))
>>> query.where(House.id == h.id).execute()
1
>>> h = House.get(House.id == h.id)
>>> h.features
{'garage': '2 cars', 'sqft': '1100'}
Multiple keys can be deleted at the same time:
>>> query = House.update(features=House.features.delete('garage', 'sqft'))
You can select just keys, just values, or zip the two:
>>> for h in House.select(House.address, House.features.keys().alias('keys')):
... print(h.address, h.keys)
123 Main St [u'bath', u'garage']
>>> for h in House.select(House.address, House.features.values().alias('vals')):
... print(h.address, h.vals)
123 Main St [u'2 bath', u'2 cars']
>>> for h in House.select(House.address, House.features.items().alias('mtx')):
... print(h.address, h.mtx)
123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]
You can retrieve a slice of data, for example, all the garage data:
>>> query = House.select(House.address, House.features.slice('garage').alias('garage_data'))
>>> for house in query:
... print(house.address, house.garage_data)
123 Main St {'garage': '2 cars'}
You can check for the existence of a key and filter rows accordingly:
>>> has_garage = House.features.exists('garage')
>>> for house in House.select(House.address, has_garage.alias('has_garage')):
... print(house.address, house.has_garage)
123 Main St True
>>> for house in House.select().where(House.features.exists('garage')):
... print(house.address, house.features['garage']) # <-- just houses w/garage data
123 Main St 2 cars
Interval support
Postgres supports durations through the INTERVAL
data-type (docs).
class IntervalField
([null=False[, …]])
Field class capable of storing Python datetime.timedelta
instances.
Example:
from datetime import timedelta
from playhouse.postgres_ext import *
db = PostgresqlExtDatabase('my_db')
class Event(Model):
location = CharField()
duration = IntervalField()
start_time = DateTimeField()
class Meta:
database = db
@classmethod
def get_long_meetings(cls):
return cls.select().where(cls.duration > timedelta(hours=1))
Server-side cursors
When psycopg2 executes a query, normally all results are fetched and returned to the client by the backend. This can cause your application to use a lot of memory when making large queries. Using server-side cursors, results are returned a little at a time (by default 2000 records). For the definitive reference, please see the psycopg2 documentation.
Note
To use server-side (or named) cursors, you must be using PostgresqlExtDatabase
.
To execute a query using a server-side cursor, simply wrap your select query using the ServerSide()
helper:
large_query = PageView.select() # Build query normally.
# Iterate over large query inside a transaction.
for page_view in ServerSide(large_query):
# do some interesting analysis here.
pass
# Server-side resources are released.
If you would like all SELECT
queries to automatically use a server-side cursor, you can specify this when creating your PostgresqlExtDatabase
:
from postgres_ext import PostgresqlExtDatabase
ss_db = PostgresqlExtDatabase('my_db', server_side_cursors=True)
Note
Server-side cursors live only as long as the transaction, so for this reason peewee will not automatically call commit()
after executing a SELECT
query. If you do not commit
after you are done iterating, you will not release the server-side resources until the connection is closed (or the transaction is committed later). Furthermore, since peewee will by default cache rows returned by the cursor, you should always call .iterator()
when iterating over a large query.
If you are using the ServerSide()
helper, the transaction and call to iterator()
will be handled transparently.
Full-text search
Postgresql provides sophisticated full-text search using special data-types (tsvector
and tsquery
). Documents should be stored or converted to the tsvector
type, and search queries should be converted to tsquery
.
For simple cases, you can simply use the Match()
function, which will automatically perform the appropriate conversions, and requires no schema changes:
def blog_search(query):
return Blog.select().where(
(Blog.status == Blog.STATUS_PUBLISHED) &
Match(Blog.content, query))
The Match()
function will automatically convert the left-hand operand to a tsvector
, and the right-hand operand to a tsquery
. For better performance, it is recommended you create a GIN
index on the column you plan to search:
CREATE INDEX blog_full_text_search ON blog USING gin(to_tsvector(content));
Alternatively, you can use the TSVectorField
to maintain a dedicated column for storing tsvector
data:
class Blog(Model):
content = TextField()
search_content = TSVectorField()
You will need to explicitly convert the incoming text data to tsvector
when inserting or updating the search_content
field:
content = 'Excellent blog post about peewee ORM.'
blog_entry = Blog.create(
content=content,
search_content=fn.to_tsvector(content))
Note
If you are using the TSVectorField
, it will automatically be created with a GIN index.
postgres_ext API notes
class PostgresqlExtDatabase
(database[, server_side_cursors=False[, register_hstore=False[, …]]])
Identical to PostgresqlDatabase
but required in order to support:
Parameters: |
|
---|
If you wish to use the HStore extension, you must specify register_hstore=True
.
If using server_side_cursors
, also be sure to wrap your queries with ServerSide()
.
ServerSide
(select_query)
Parameters: | select_query – a SelectQuery instance. |
---|---|
Rtype generator: | |
Wrap the given select query in a transaction, and call it’s iterator()
method to avoid caching row instances. In order for the server-side resources to be released, be sure to exhaust the generator (iterate over all the rows).
Usage:
large_query = PageView.select()
for page_view in ServerSide(large_query):
# Do something interesting.
pass
# At this point server side resources are released.
class ArrayField
([field_class=IntegerField[, field_kwargs=None[, dimensions=1[, convert_values=False]]]])
Parameters: |
|
---|
Field capable of storing arrays of the provided field_class.
Note
By default ArrayField will use a GIN index. To disable this, initialize the field with index=False
.
You can store and retrieve lists (or lists-of-lists):
class BlogPost(BaseModel):
content = TextField()
tags = ArrayField(CharField)
post = BlogPost(content='awesome', tags=['foo', 'bar', 'baz'])
Additionally, you can use the __getitem__
API to query values or slices in the database:
# Get the first tag on a given blog post.
first_tag = (BlogPost
.select(BlogPost.tags[0].alias('first_tag'))
.where(BlogPost.id == 1)
.dicts()
.get())
# first_tag = {'first_tag': 'foo'}
Get a slice of values:
# Get the first two tags.
two_tags = (BlogPost
.select(BlogPost.tags[:2].alias('two'))
.dicts()
.get())
# two_tags = {'two': ['foo', 'bar']}
contains
(\items*)Parameters: items – One or more items that must be in the given array field. # Get all blog posts that are tagged with both "python" and "django".
Blog.select().where(Blog.tags.contains('python', 'django'))
contains_any
(\items*)Parameters: items – One or more items to search for in the given array field. Like
contains()
, except will match rows where the array contains any of the given items.# Get all blog posts that are tagged with "flask" and/or "django".
Blog.select().where(Blog.tags.contains_any('flask', 'django'))
class DateTimeTZField
(\args, **kwargs*)
A timezone-aware subclass of DateTimeField
.
class HStoreField
(\args, **kwargs*)
A field for storing and retrieving arbitrary key/value pairs. For details on usage, see hstore support.
Attention
To use the HStoreField
you will need to be sure the hstore extension is registered with the connection. To accomplish this, instantiate the PostgresqlExtDatabase
with register_hstore=True
.
Note
By default HStoreField
will use a GiST index. To disable this, initialize the field with index=False
.
keys
()Returns the keys for a given row.
>>> for h in House.select(House.address, House.features.keys().alias('keys')):
... print(h.address, h.keys)
123 Main St [u'bath', u'garage']
values
()Return the values for a given row.
>>> for h in House.select(House.address, House.features.values().alias('vals')):
... print(h.address, h.vals)
123 Main St [u'2 bath', u'2 cars']
items
()Like python’s
dict
, return the keys and values in a list-of-lists:>>> for h in House.select(House.address, House.features.items().alias('mtx')):
... print(h.address, h.mtx)
123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]
slice
(\args*)Return a slice of data given a list of keys.
>>> for h in House.select(House.address, House.features.slice('garage').alias('garage_data')):
... print(h.address, h.garage_data)
123 Main St {'garage': '2 cars'}
exists
(key)Query for whether the given key exists.
>>> for h in House.select(House.address, House.features.exists('garage').alias('has_garage')):
... print(h.address, h.has_garage)
123 Main St True
>>> for h in House.select().where(House.features.exists('garage')):
... print(h.address, h.features['garage']) # <-- just houses w/garage data
123 Main St 2 cars
defined
(key)Query for whether the given key has a value associated with it.
update
(\*data*)Perform an atomic update to the keys/values for a given row or rows.
>>> query = House.update(features=House.features.update(
... sqft=2000,
... year_built=2012))
>>> query.where(House.id == 1).execute()
delete
(\keys*)Delete the provided keys for a given row or rows.
Note
We will use an
UPDATE
query.>>> query = House.update(features=House.features.delete(
... 'sqft', 'year_built'))
>>> query.where(House.id == 1).execute()
contains
(value)Parameters: value – Either a dict
, alist
of keys, or a single key.Query rows for the existence of either:
- a partial dictionary.
- a list of keys.
- a single key.
>>> query = House.select()
>>> has_garage = query.where(House.features.contains('garage'))
>>> garage_bath = query.where(House.features.contains(['garage', 'bath']))
>>> twocar = query.where(House.features.contains({'garage': '2 cars'}))
contains_any
(\keys*)Parameters: keys – One or more keys to search for. Query rows for the existince of any key.
class JSONField
(dumps=None, \args, **kwargs*)
Parameters: | dumps – The default is to call json.dumps() or the dumps function. You can override this method to create a customized JSON wrapper. |
---|
Field class suitable for storing and querying arbitrary JSON. When using this on a model, set the field’s value to a Python object (either a dict
or a list
). When you retrieve your value from the database it will be returned as a Python data structure.
Note
You must be using Postgres 9.2 / psycopg2 2.5 or greater.
Note
If you are using Postgres 9.4, strongly consider using the BinaryJSONField
instead as it offers better performance and more powerful querying options.
Example model declaration:
db = PostgresqlExtDatabase('my_db')
class APIResponse(Model):
url = CharField()
response = JSONField()
class Meta:
database = db
Example of storing JSON data:
url = 'http://foo.com/api/resource/'
resp = json.loads(urllib2.urlopen(url).read())
APIResponse.create(url=url, response=resp)
APIResponse.create(url='http://foo.com/baz/', response={'key': 'value'})
To query, use Python’s []
operators to specify nested key or array lookups:
APIResponse.select().where(
APIResponse.response['key1']['nested-key'] == 'some-value')
To illustrate the use of the []
operators, imagine we have the following data stored in an APIResponse
:
{
"foo": {
"bar": ["i1", "i2", "i3"],
"baz": {
"huey": "mickey",
"peewee": "nugget"
}
}
}
Here are the results of a few queries:
def get_data(expression):
# Helper function to just retrieve the results of a
# particular expression.
query = (APIResponse
.select(expression.alias('my_data'))
.dicts()
.get())
return query['my_data']
# Accessing the foo -> bar subkey will return a JSON
# representation of the list.
get_data(APIResponse.data['foo']['bar'])
# '["i1", "i2", "i3"]'
# In order to retrieve this list as a Python list,
# we will call .as_json() on the expression.
get_data(APIResponse.data['foo']['bar'].as_json())
# ['i1', 'i2', 'i3']
# Similarly, accessing the foo -> baz subkey will
# return a JSON representation of the dictionary.
get_data(APIResponse.data['foo']['baz'])
# '{"huey": "mickey", "peewee": "nugget"}'
# Again, calling .as_json() will return an actual
# python dictionary.
get_data(APIResponse.data['foo']['baz'].as_json())
# {'huey': 'mickey', 'peewee': 'nugget'}
# When dealing with simple values, either way works as
# you expect.
get_data(APIResponse.data['foo']['bar'][0])
# 'i1'
# Calling .as_json() when the result is a simple value
# will return the same thing as the previous example.
get_data(APIResponse.data['foo']['bar'][0].as_json())
# 'i1'
class BinaryJSONField
(dumps=None, \args, **kwargs*)
Parameters: | dumps – The default is to call json.dumps() or the dumps function. You can override this method to create a customized JSON wrapper. |
---|
Store and query arbitrary JSON documents. Data should be stored using normal Python dict
and list
objects, and when data is returned from the database, it will be returned using dict
and list
as well.
For examples of basic query operations, see the above code samples for JSONField
. The example queries below will use the same APIResponse
model described above.
Note
By default BinaryJSONField will use a GiST index. To disable this, initialize the field with index=False
.
Note
You must be using Postgres 9.4 / psycopg2 2.5 or newer. If you are using Postgres 9.2 or 9.3, you can use the regular JSONField
instead.
contains
(other)Test whether the given JSON data contains the given JSON fragment or key.
Example:
search_fragment = {
'foo': {'bar': ['i2']}
}
query = (APIResponse
.select()
.where(APIResponse.data.contains(search_fragment)))
# If we're searching for a list, the list items do not need to
# be ordered in a particular way:
query = (APIResponse
.select()
.where(APIResponse.data.contains({
'foo': {'bar': ['i2', 'i1']}})))
We can pass in simple keys as well. To find APIResponses that contain the key
foo
at the top-level:APIResponse.select().where(APIResponse.data.contains('foo'))
We can also search sub-keys using square-brackets:
APIResponse.select().where(
APIResponse.data['foo']['bar'].contains(['i2', 'i1']))
contains_any
(\items*)Search for the presence of one or more of the given items.
APIResponse.select().where(
APIResponse.data.contains_any('foo', 'baz', 'nugget'))
Like
contains()
, we can also search sub-keys:APIResponse.select().where(
APIResponse.data['foo']['bar'].contains_any('i2', 'ix'))
contains_all
(\items*)Search for the presence of all of the given items.
APIResponse.select().where(
APIResponse.data.contains_all('foo'))
Like
contains_any()
, we can also search sub-keys:APIResponse.select().where(
APIResponse.data['foo']['bar'].contains_all('i1', 'i2', 'i3'))
contained_by
(other)Test whether the given JSON document is contained by (is a subset of) the given JSON document. This method is the inverse of
contains()
.big_doc = {
'foo': {
'bar': ['i1', 'i2', 'i3'],
'baz': {
'huey': 'mickey',
'peewee': 'nugget',
}
},
'other_key': ['nugget', 'bear', 'kitten'],
}
APIResponse.select().where(
APIResponse.data.contained_by(big_doc))
Match
(field, query)
Generate a full-text search expression, automatically converting the left-hand operand to a tsvector
, and the right-hand operand to a tsquery
.
Example:
def blog_search(query):
return Blog.select().where(
(Blog.status == Blog.STATUS_PUBLISHED) &
Match(Blog.content, query))
class TSVectorField
Field type suitable for storing tsvector
data. This field will automatically be created with a GIN
index for improved search performance.
Note
Data stored in this field will still need to be manually converted to the tsvector
type.
Note
By default TSVectorField will use a GIN index. To disable this, initialize the field with
index=False
.
Example usage:
class Blog(Model):
content = TextField()
search_content = TSVectorField()
content = 'this is a sample blog entry.'
blog_entry = Blog.create(
content=content,
search_content=fn.to_tsvector(content)) # Note `to_tsvector()`.