Playhouse, extensions to Peewee - Postgresql Extensions - 《Peewee 3.5.0 Documentation》

Postgresql Extensions

Postgresql Extensions

The postgresql extensions module provides a number of “postgres-only” functions, currently:

json support, including jsonb for Postgres 9.4.
hstore support
server-side cursors
full-text search
ArrayField field type, for storing arrays.
HStoreField field type, for storing key/value pairs.
IntervalField field type, for storing timedelta objects.
JSONField field type, for storing JSON data.
BinaryJSONField field type for the jsonb JSON data type.
TSVectorField field type, for storing full-text search data.
DateTimeTZ field type, a timezone-aware datetime field.

In the future I would like to add support for more of postgresql’s features. If there is a particular feature you would like to see added, please open a Github issue.

Warning

In order to start using the features described below, you will need to use the extension PostgresqlExtDatabase class instead of PostgresqlDatabase.

The code below will assume you are using the following database and base model:

from playhouse.postgres_ext import *
ext_db = PostgresqlExtDatabase('peewee_test', user='postgres')
class BaseExtModel(Model):
    class Meta:
        database = ext_db

JSON Support

peewee has basic support for Postgres’ native JSON data type, in the form of JSONField. As of version 2.4.7, peewee also supports the Postgres 9.4 binary json jsonb type, via BinaryJSONField.

Warning

Postgres supports a JSON data type natively as of 9.2 (full support in 9.3). In order to use this functionality you must be using the correct version of Postgres with psycopg2 version 2.5 or greater.

To use BinaryJSONField, which has many performance and querying advantages, you must have Postgres 9.4 or later.

Note

You must be sure your database is an instance of PostgresqlExtDatabase in order to use the JSONField.

Here is an example of how you might declare a model with a JSON field:

import json
import urllib2
from playhouse.postgres_ext import *
db = PostgresqlExtDatabase('my_database')
class APIResponse(Model):
    url = CharField()
    response = JSONField()
    class Meta:
        database = db
    @classmethod
    def request(cls, url):
        fh = urllib2.urlopen(url)
        return cls.create(url=url, response=json.loads(fh.read()))
APIResponse.create_table()
# Store a JSON response.
offense = APIResponse.request('http://crime-api.com/api/offense/')
booking = APIResponse.request('http://crime-api.com/api/booking/')
# Query a JSON data structure using a nested key lookup:
offense_responses = APIResponse.select().where(
    APIResponse.response['meta']['model'] == 'offense')
# Retrieve a sub-key for each APIResponse. By calling .as_json(), the
# data at the sub-key will be returned as Python objects (dicts, lists,
# etc) instead of serialized JSON.
q = (APIResponse
     .select(
       APIResponse.data['booking']['person'].as_json().alias('person'))
     .where(APIResponse.data['meta']['model'] == 'booking'))
for result in q:
    print(result.person['name'], result.person['dob'])

The BinaryJSONField works the same and supports the same operations as the regular JSONField, but provides several additional operations for testing containment. Using the binary json field, you can test whether your JSON data contains other partial JSON structures (contains(), contains_any(), contains_all()), or whether it is a subset of a larger JSON document (contained_by()).

For more examples, see the JSONField and BinaryJSONField API documents below.

hstore support

Postgresql hstore is an embedded key/value store. With hstore, you can store arbitrary key/value pairs in your database alongside structured relational data.

To use hstore, you need to specify an additional parameter when instantiating your PostgresqlExtDatabase:

# Specify "register_hstore=True":
db = PostgresqlExtDatabase('my_db', register_hstore=True)

Currently the postgres_ext module supports the following operations:

Store and retrieve arbitrary dictionaries
Filter by key(s) or partial dictionary
Update/add one or more keys to an existing dictionary
Delete one or more keys from an existing dictionary
Select keys, values, or zip keys and values
Retrieve a slice of keys/values
Test for the existence of a key
Test that a key has a non-NULL value

Using hstore

To start with, you will need to import the custom database class and the hstore functions from playhouse.postgres_ext (see above code snippet). Then, it is as simple as adding a HStoreField to your model:

class House(BaseExtModel):
    address = CharField()
    features = HStoreField()

You can now store arbitrary key/value pairs on House instances:

>>> h = House.create(
...     address='123 Main St',
...     features={'garage': '2 cars', 'bath': '2 bath'})
...
>>> h_from_db = House.get(House.id == h.id)
>>> h_from_db.features
{'bath': '2 bath', 'garage': '2 cars'}

You can filter by individual key, multiple keys or partial dictionary:

>>> query = House.select()
>>> garage = query.where(House.features.contains('garage'))
>>> garage_and_bath = query.where(House.features.contains(['garage', 'bath']))
>>> twocar = query.where(House.features.contains({'garage': '2 cars'}))

Suppose you want to do an atomic update to the house:

>>> new_features = House.features.update({'bath': '2.5 bath', 'sqft': '1100'})
>>> query = House.update(features=new_features)
>>> query.where(House.id == h.id).execute()
1
>>> h = House.get(House.id == h.id)
>>> h.features
{'bath': '2.5 bath', 'garage': '2 cars', 'sqft': '1100'}

Or, alternatively an atomic delete:

>>> query = House.update(features=House.features.delete('bath'))
>>> query.where(House.id == h.id).execute()
1
>>> h = House.get(House.id == h.id)
>>> h.features
{'garage': '2 cars', 'sqft': '1100'}

Multiple keys can be deleted at the same time:

>>> query = House.update(features=House.features.delete('garage', 'sqft'))

You can select just keys, just values, or zip the two:

>>> for h in House.select(House.address, House.features.keys().alias('keys')):
...     print(h.address, h.keys)
123 Main St [u'bath', u'garage']
>>> for h in House.select(House.address, House.features.values().alias('vals')):
...     print(h.address, h.vals)
123 Main St [u'2 bath', u'2 cars']
>>> for h in House.select(House.address, House.features.items().alias('mtx')):
...     print(h.address, h.mtx)
123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]

You can retrieve a slice of data, for example, all the garage data:

>>> query = House.select(House.address, House.features.slice('garage').alias('garage_data'))
>>> for house in query:
...     print(house.address, house.garage_data)
123 Main St {'garage': '2 cars'}

You can check for the existence of a key and filter rows accordingly:

>>> has_garage = House.features.exists('garage')
>>> for house in House.select(House.address, has_garage.alias('has_garage')):
...     print(house.address, house.has_garage)
123 Main St True
>>> for house in House.select().where(House.features.exists('garage')):
...     print(house.address, house.features['garage'])  # <-- just houses w/garage data
123 Main St 2 cars

Interval support

Postgres supports durations through the INTERVAL data-type (docs).

class IntervalField([null=False[, …]])

Field class capable of storing Python datetime.timedelta instances.

Example:

from datetime import timedelta
from playhouse.postgres_ext import *
db = PostgresqlExtDatabase('my_db')
class Event(Model):
    location = CharField()
    duration = IntervalField()
    start_time = DateTimeField()
    class Meta:
        database = db
    @classmethod
    def get_long_meetings(cls):
        return cls.select().where(cls.duration > timedelta(hours=1))

Server-side cursors

When psycopg2 executes a query, normally all results are fetched and returned to the client by the backend. This can cause your application to use a lot of memory when making large queries. Using server-side cursors, results are returned a little at a time (by default 2000 records). For the definitive reference, please see the psycopg2 documentation.

Note

To use server-side (or named) cursors, you must be using PostgresqlExtDatabase.

To execute a query using a server-side cursor, simply wrap your select query using the ServerSide() helper:

large_query = PageView.select()  # Build query normally.
# Iterate over large query inside a transaction.
for page_view in ServerSide(large_query):
    # do some interesting analysis here.
    pass
# Server-side resources are released.

If you would like all SELECT queries to automatically use a server-side cursor, you can specify this when creating your PostgresqlExtDatabase:

from postgres_ext import PostgresqlExtDatabase
ss_db = PostgresqlExtDatabase('my_db', server_side_cursors=True)

Note

Server-side cursors live only as long as the transaction, so for this reason peewee will not automatically call commit() after executing a SELECT query. If you do not commit after you are done iterating, you will not release the server-side resources until the connection is closed (or the transaction is committed later). Furthermore, since peewee will by default cache rows returned by the cursor, you should always call .iterator() when iterating over a large query.

If you are using the ServerSide() helper, the transaction and call to iterator() will be handled transparently.

Full-text search

Postgresql provides sophisticated full-text search using special data-types (tsvector and tsquery). Documents should be stored or converted to the tsvector type, and search queries should be converted to tsquery.

For simple cases, you can simply use the Match() function, which will automatically perform the appropriate conversions, and requires no schema changes:

def blog_search(query):
    return Blog.select().where(
        (Blog.status == Blog.STATUS_PUBLISHED) &
        Match(Blog.content, query))

The Match() function will automatically convert the left-hand operand to a tsvector, and the right-hand operand to a tsquery. For better performance, it is recommended you create a GIN index on the column you plan to search:

CREATE INDEX blog_full_text_search ON blog USING gin(to_tsvector(content));

Alternatively, you can use the TSVectorField to maintain a dedicated column for storing tsvector data:

class Blog(Model):
    content = TextField()
    search_content = TSVectorField()

You will need to explicitly convert the incoming text data to tsvector when inserting or updating the search_content field:

content = 'Excellent blog post about peewee ORM.'
blog_entry = Blog.create(
    content=content,
    search_content=fn.to_tsvector(content))

Note

If you are using the TSVectorField, it will automatically be created with a GIN index.

postgres_ext API notes

class PostgresqlExtDatabase(database[, server_side_cursors=False[, register_hstore=False[, …]]])

Identical to PostgresqlDatabase but required in order to support:

Parameters:	database (str) – Name of database to connect to. server_side_cursors (bool) – Whether `SELECT` queries should utilize server-side cursors. register_hstore (bool) – Register the HStore extension with the connection.

Server-side cursors
ArrayField
DateTimeTZField
JSONField
BinaryJSONField
HStoreField
TSVectorField

If you wish to use the HStore extension, you must specify register_hstore=True.

If using server_side_cursors, also be sure to wrap your queries with ServerSide().

ServerSide(select_query)

Rtype generator:
Parameters:	select_query – a `SelectQuery` instance.

Wrap the given select query in a transaction, and call it’s iterator() method to avoid caching row instances. In order for the server-side resources to be released, be sure to exhaust the generator (iterate over all the rows).

Usage:

large_query = PageView.select()
for page_view in ServerSide(large_query):
    # Do something interesting.
    pass
# At this point server side resources are released.

class ArrayField([field_class=IntegerField[, field_kwargs=None[, dimensions=1[, convert_values=False]]]])

Parameters:	field_class – a subclass of `Field`, e.g. `IntegerField`. field_kwargs (dict) – arguments to initialize `field_class`. dimensions (int) – dimensions of array. convert_values (bool) – apply `field_class` value conversion to array data.

Field capable of storing arrays of the provided field_class.

Note

By default ArrayField will use a GIN index. To disable this, initialize the field with index=False.

You can store and retrieve lists (or lists-of-lists):

class BlogPost(BaseModel):
    content = TextField()
    tags = ArrayField(CharField)
post = BlogPost(content='awesome', tags=['foo', 'bar', 'baz'])

Additionally, you can use the __getitem__ API to query values or slices in the database:

# Get the first tag on a given blog post.
first_tag = (BlogPost
             .select(BlogPost.tags[0].alias('first_tag'))
             .where(BlogPost.id == 1)
             .dicts()
             .get())
# first_tag = {'first_tag': 'foo'}

Get a slice of values:

# Get the first two tags.
two_tags = (BlogPost
            .select(BlogPost.tags[:2].alias('two'))
            .dicts()
            .get())
# two_tags = {'two': ['foo', 'bar']}

contains(\items*)

Parameters:	items – One or more items that must be in the given array field.

# Get all blog posts that are tagged with both "python" and "django".
Blog.select().where(Blog.tags.contains('python', 'django'))

contains_any(\items*)

Parameters: items – One or more items to search for in the given array field.

Like contains(), except will match rows where the array contains any of the given items.
```
# Get all blog posts that are tagged with "flask" and/or "django".
Blog.select().where(Blog.tags.contains_any('flask', 'django'))
```

Parameters:	items – One or more items to search for in the given array field.

class DateTimeTZField(\args, **kwargs*)

A timezone-aware subclass of DateTimeField.

class HStoreField(\args, **kwargs*)

A field for storing and retrieving arbitrary key/value pairs. For details on usage, see hstore support.

Attention

To use the HStoreField you will need to be sure the hstore extension is registered with the connection. To accomplish this, instantiate the PostgresqlExtDatabase with register_hstore=True.

Note

By default HStoreField will use a GiST index. To disable this, initialize the field with index=False.

keys()

Returns the keys for a given row.

>>> for h in House.select(House.address, House.features.keys().alias('keys')):
...     print(h.address, h.keys)
123 Main St [u'bath', u'garage']

values()

Return the values for a given row.

>>> for h in House.select(House.address, House.features.values().alias('vals')):
...     print(h.address, h.vals)
123 Main St [u'2 bath', u'2 cars']

items()

Like python’s dict, return the keys and values in a list-of-lists:

>>> for h in House.select(House.address, House.features.items().alias('mtx')):
...     print(h.address, h.mtx)
123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]

slice(\args*)

Return a slice of data given a list of keys.

>>> for h in House.select(House.address, House.features.slice('garage').alias('garage_data')):
...     print(h.address, h.garage_data)
123 Main St {'garage': '2 cars'}

exists(key)

Query for whether the given key exists.

>>> for h in House.select(House.address, House.features.exists('garage').alias('has_garage')):
...     print(h.address, h.has_garage)
123 Main St True
>>> for h in House.select().where(House.features.exists('garage')):
...     print(h.address, h.features['garage']) # <-- just houses w/garage data
123 Main St 2 cars

defined(key)

Query for whether the given key has a value associated with it.

update(\*data*)

Perform an atomic update to the keys/values for a given row or rows.

>>> query = House.update(features=House.features.update(
...     sqft=2000,
...     year_built=2012))
>>> query.where(House.id == 1).execute()

delete(\keys*)

Delete the provided keys for a given row or rows.

Note

We will use an UPDATE query.

>>> query = House.update(features=House.features.delete(
...     'sqft', 'year_built'))
>>> query.where(House.id == 1).execute()

contains(value)

Parameters:	value – Either a `dict`, a `list` of keys, or a single key.

Query rows for the existence of either:

a partial dictionary.
a list of keys.
a single key.

>>> query = House.select()
>>> has_garage = query.where(House.features.contains('garage'))
>>> garage_bath = query.where(House.features.contains(['garage', 'bath']))
>>> twocar = query.where(House.features.contains({'garage': '2 cars'}))

contains_any(\keys*)

Parameters: keys – One or more keys to search for.

Query rows for the existince of any key.

Parameters:	keys – One or more keys to search for.

class JSONField(dumps=None, \args, **kwargs*)

Parameters:	dumps – The default is to call json.dumps() or the dumps function. You can override this method to create a customized JSON wrapper.

Field class suitable for storing and querying arbitrary JSON. When using this on a model, set the field’s value to a Python object (either a dict or a list). When you retrieve your value from the database it will be returned as a Python data structure.

Note

You must be using Postgres 9.2 / psycopg2 2.5 or greater.

Note

If you are using Postgres 9.4, strongly consider using the BinaryJSONField instead as it offers better performance and more powerful querying options.

Example model declaration:

db = PostgresqlExtDatabase('my_db')
class APIResponse(Model):
    url = CharField()
    response = JSONField()
    class Meta:
        database = db

Example of storing JSON data:

url = 'http://foo.com/api/resource/'
resp = json.loads(urllib2.urlopen(url).read())
APIResponse.create(url=url, response=resp)
APIResponse.create(url='http://foo.com/baz/', response={'key': 'value'})

To query, use Python’s [] operators to specify nested key or array lookups:

APIResponse.select().where(
    APIResponse.response['key1']['nested-key'] == 'some-value')

To illustrate the use of the [] operators, imagine we have the following data stored in an APIResponse:

{
  "foo": {
    "bar": ["i1", "i2", "i3"],
    "baz": {
      "huey": "mickey",
      "peewee": "nugget"
    }
  }
}

Here are the results of a few queries:

def get_data(expression):
    # Helper function to just retrieve the results of a
    # particular expression.
    query = (APIResponse
             .select(expression.alias('my_data'))
             .dicts()
             .get())
    return query['my_data']
# Accessing the foo -> bar subkey will return a JSON
# representation of the list.
get_data(APIResponse.data['foo']['bar'])
# '["i1", "i2", "i3"]'
# In order to retrieve this list as a Python list,
# we will call .as_json() on the expression.
get_data(APIResponse.data['foo']['bar'].as_json())
# ['i1', 'i2', 'i3']
# Similarly, accessing the foo -> baz subkey will
# return a JSON representation of the dictionary.
get_data(APIResponse.data['foo']['baz'])
# '{"huey": "mickey", "peewee": "nugget"}'
# Again, calling .as_json() will return an actual
# python dictionary.
get_data(APIResponse.data['foo']['baz'].as_json())
# {'huey': 'mickey', 'peewee': 'nugget'}
# When dealing with simple values, either way works as
# you expect.
get_data(APIResponse.data['foo']['bar'][0])
# 'i1'
# Calling .as_json() when the result is a simple value
# will return the same thing as the previous example.
get_data(APIResponse.data['foo']['bar'][0].as_json())
# 'i1'

class BinaryJSONField(dumps=None, \args, **kwargs*)

Parameters:	dumps – The default is to call json.dumps() or the dumps function. You can override this method to create a customized JSON wrapper.

Store and query arbitrary JSON documents. Data should be stored using normal Python dict and list objects, and when data is returned from the database, it will be returned using dict and list as well.

For examples of basic query operations, see the above code samples for JSONField. The example queries below will use the same APIResponse model described above.

Note

By default BinaryJSONField will use a GiST index. To disable this, initialize the field with index=False.

Note

You must be using Postgres 9.4 / psycopg2 2.5 or newer. If you are using Postgres 9.2 or 9.3, you can use the regular JSONField instead.

contains(other)

Test whether the given JSON data contains the given JSON fragment or key.

Example:

search_fragment = {
    'foo': {'bar': ['i2']}
}
query = (APIResponse
         .select()
         .where(APIResponse.data.contains(search_fragment)))
# If we're searching for a list, the list items do not need to
# be ordered in a particular way:
query = (APIResponse
         .select()
         .where(APIResponse.data.contains({
             'foo': {'bar': ['i2', 'i1']}})))

We can pass in simple keys as well. To find APIResponses that contain the key foo at the top-level:

APIResponse.select().where(APIResponse.data.contains('foo'))

We can also search sub-keys using square-brackets:

APIResponse.select().where(
    APIResponse.data['foo']['bar'].contains(['i2', 'i1']))

contains_any(\items*)

Search for the presence of one or more of the given items.

APIResponse.select().where(
    APIResponse.data.contains_any('foo', 'baz', 'nugget'))

Like contains(), we can also search sub-keys:

APIResponse.select().where(
    APIResponse.data['foo']['bar'].contains_any('i2', 'ix'))

contains_all(\items*)

Search for the presence of all of the given items.

APIResponse.select().where(
    APIResponse.data.contains_all('foo'))

Like contains_any(), we can also search sub-keys:

APIResponse.select().where(
    APIResponse.data['foo']['bar'].contains_all('i1', 'i2', 'i3'))

contained_by(other)

Test whether the given JSON document is contained by (is a subset of) the given JSON document. This method is the inverse of contains().

big_doc = {
    'foo': {
        'bar': ['i1', 'i2', 'i3'],
        'baz': {
            'huey': 'mickey',
            'peewee': 'nugget',
        }
    },
    'other_key': ['nugget', 'bear', 'kitten'],
}
APIResponse.select().where(
    APIResponse.data.contained_by(big_doc))

Match(field, query)

Generate a full-text search expression, automatically converting the left-hand operand to a tsvector, and the right-hand operand to a tsquery.

Example:

def blog_search(query):
    return Blog.select().where(
        (Blog.status == Blog.STATUS_PUBLISHED) &
        Match(Blog.content, query))

class TSVectorField

Field type suitable for storing tsvector data. This field will automatically be created with a GIN index for improved search performance.

Note

Data stored in this field will still need to be manually converted to the tsvector type.

Note

By default TSVectorField will use a GIN index. To disable this, initialize the field with index=False.

Example usage:

class Blog(Model):
    content = TextField()
    search_content = TSVectorField()
content = 'this is a sample blog entry.'
blog_entry = Blog.create(
    content=content,
    search_content=fn.to_tsvector(content))  # Note `to_tsvector()`.