Postgresql Extensions
The postgresql extensions module provides a number of “postgres-only” functions,currently:
- json support, including jsonb for Postgres 9.4.
- hstore support
- server-side cursors
- full-text search
ArrayField
field type, for storing arrays.HStoreField
field type, for storing key/value pairs.IntervalField
field type, for storingtimedelta
objects.JSONField
field type, for storing JSON data.BinaryJSONField
field type for thejsonb
JSON data type.TSVectorField
field type, for storing full-text search data.DateTimeTZ
field type, a timezone-aware datetime field.
In the future I would like to add support for more of postgresql’s features.If there is a particular feature you would like to see added, pleaseopen a Github issue.
Warning
In order to start using the features described below, you will need to use the extension PostgresqlExtDatabase
class instead of PostgresqlDatabase
.
The code below will assume you are using the following database and base model:
- from playhouse.postgres_ext import *
- ext_db = PostgresqlExtDatabase('peewee_test', user='postgres')
- class BaseExtModel(Model):
- class Meta:
- database = ext_db
JSON Support
peewee has basic support for Postgres’ native JSON data type, in the form ofJSONField
. As of version 2.4.7, peewee also supports the Postgres9.4 binary json jsonb
type, via BinaryJSONField
.
Warning
Postgres supports a JSON data type natively as of 9.2 (full support in 9.3).In order to use this functionality you must be using the correct version ofPostgres with psycopg2 version 2.5 or greater.
To use BinaryJSONField
, which has many performance and queryingadvantages, you must have Postgres 9.4 or later.
Note
You must be sure your database is an instance ofPostgresqlExtDatabase
in order to use the JSONField.
Here is an example of how you might declare a model with a JSON field:
- import json
- import urllib2
- from playhouse.postgres_ext import *
- db = PostgresqlExtDatabase('my_database')
- class APIResponse(Model):
- url = CharField()
- response = JSONField()
- class Meta:
- database = db
- @classmethod
- def request(cls, url):
- fh = urllib2.urlopen(url)
- return cls.create(url=url, response=json.loads(fh.read()))
- APIResponse.create_table()
- # Store a JSON response.
- offense = APIResponse.request('http://crime-api.com/api/offense/')
- booking = APIResponse.request('http://crime-api.com/api/booking/')
- # Query a JSON data structure using a nested key lookup:
- offense_responses = APIResponse.select().where(
- APIResponse.response['meta']['model'] == 'offense')
- # Retrieve a sub-key for each APIResponse. By calling .as_json(), the
- # data at the sub-key will be returned as Python objects (dicts, lists,
- # etc) instead of serialized JSON.
- q = (APIResponse
- .select(
- APIResponse.data['booking']['person'].as_json().alias('person'))
- .where(APIResponse.data['meta']['model'] == 'booking'))
- for result in q:
- print(result.person['name'], result.person['dob'])
The BinaryJSONField
works the same and supports the same operationsas the regular JSONField
, but provides several additionaloperations for testing containment. Using the binary json field, you cantest whether your JSON data contains other partial JSON structures(contains()
, contains_any()
,contains_all()
), or whether it is a subset of alarger JSON document (contained_by()
).
For more examples, see the JSONField
andBinaryJSONField
API documents below.
hstore support
Postgresql hstoreis an embedded key/value store. With hstore, you can store arbitrary key/valuepairs in your database alongside structured relational data.
To use hstore
, you need to specify an additional parameter wheninstantiating your PostgresqlExtDatabase
:
- # Specify "register_hstore=True":
- db = PostgresqlExtDatabase('my_db', register_hstore=True)
Currently the postgres_ext
module supports the following operations:
- Store and retrieve arbitrary dictionaries
- Filter by key(s) or partial dictionary
- Update/add one or more keys to an existing dictionary
- Delete one or more keys from an existing dictionary
- Select keys, values, or zip keys and values
- Retrieve a slice of keys/values
- Test for the existence of a key
- Test that a key has a non-NULL value
Using hstore
To start with, you will need to import the custom database class and the hstorefunctions from playhouse.postgres_ext
(see above code snippet). Then, itis as simple as adding a HStoreField
to your model:
- class House(BaseExtModel):
- address = CharField()
- features = HStoreField()
You can now store arbitrary key/value pairs on House
instances:
- >>> h = House.create(
- ... address='123 Main St',
- ... features={'garage': '2 cars', 'bath': '2 bath'})
- ...
- >>> h_from_db = House.get(House.id == h.id)
- >>> h_from_db.features
- {'bath': '2 bath', 'garage': '2 cars'}
You can filter by individual key, multiple keys or partial dictionary:
- >>> query = House.select()
- >>> garage = query.where(House.features.contains('garage'))
- >>> garage_and_bath = query.where(House.features.contains(['garage', 'bath']))
- >>> twocar = query.where(House.features.contains({'garage': '2 cars'}))
Suppose you want to do an atomic update to the house:
- >>> new_features = House.features.update({'bath': '2.5 bath', 'sqft': '1100'})
- >>> query = House.update(features=new_features)
- >>> query.where(House.id == h.id).execute()
- 1
- >>> h = House.get(House.id == h.id)
- >>> h.features
- {'bath': '2.5 bath', 'garage': '2 cars', 'sqft': '1100'}
Or, alternatively an atomic delete:
- >>> query = House.update(features=House.features.delete('bath'))
- >>> query.where(House.id == h.id).execute()
- 1
- >>> h = House.get(House.id == h.id)
- >>> h.features
- {'garage': '2 cars', 'sqft': '1100'}
Multiple keys can be deleted at the same time:
- >>> query = House.update(features=House.features.delete('garage', 'sqft'))
You can select just keys, just values, or zip the two:
- >>> for h in House.select(House.address, House.features.keys().alias('keys')):
- ... print(h.address, h.keys)
- 123 Main St [u'bath', u'garage']
- >>> for h in House.select(House.address, House.features.values().alias('vals')):
- ... print(h.address, h.vals)
- 123 Main St [u'2 bath', u'2 cars']
- >>> for h in House.select(House.address, House.features.items().alias('mtx')):
- ... print(h.address, h.mtx)
- 123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]
You can retrieve a slice of data, for example, all the garage data:
- >>> query = House.select(House.address, House.features.slice('garage').alias('garage_data'))
- >>> for house in query:
- ... print(house.address, house.garage_data)
- 123 Main St {'garage': '2 cars'}
You can check for the existence of a key and filter rows accordingly:
- >>> has_garage = House.features.exists('garage')
- >>> for house in House.select(House.address, has_garage.alias('has_garage')):
- ... print(house.address, house.has_garage)
- 123 Main St True
- >>> for house in House.select().where(House.features.exists('garage')):
- ... print(house.address, house.features['garage']) # <-- just houses w/garage data
- 123 Main St 2 cars
Interval support
Postgres supports durations through the INTERVAL
data-type (docs).
- class
IntervalField
([null=False[, …]]) - Field class capable of storing Python
datetime.timedelta
instances.
Example:
- from datetime import timedelta
- from playhouse.postgres_ext import *
- db = PostgresqlExtDatabase('my_db')
- class Event(Model):
- location = CharField()
- duration = IntervalField()
- start_time = DateTimeField()
- class Meta:
- database = db
- @classmethod
- def get_long_meetings(cls):
- return cls.select().where(cls.duration > timedelta(hours=1))
Server-side cursors
When psycopg2 executes a query, normally all results are fetched and returnedto the client by the backend. This can cause your application to use a lot ofmemory when making large queries. Using server-side cursors, results arereturned a little at a time (by default 2000 records). For the definitivereference, please see the psycopg2 documentation.
Note
To use server-side (or named) cursors, you must be using PostgresqlExtDatabase
.
To execute a query using a server-side cursor, simply wrap your select queryusing the ServerSide()
helper:
- large_query = PageView.select() # Build query normally.
- # Iterate over large query inside a transaction.
- for page_view in ServerSide(large_query):
- # do some interesting analysis here.
- pass
- # Server-side resources are released.
If you would like all SELECT
queries to automatically use a server-sidecursor, you can specify this when creating your PostgresqlExtDatabase
:
- from postgres_ext import PostgresqlExtDatabase
- ss_db = PostgresqlExtDatabase('my_db', server_side_cursors=True)
Note
Server-side cursors live only as long as the transaction, so for this reasonpeewee will not automatically call commit()
after executing a SELECT
query. If you do not commit
after you are done iterating, you will notrelease the server-side resources until the connection is closed (or thetransaction is committed later). Furthermore, since peewee will by defaultcache rows returned by the cursor, you should always call .iterator()
when iterating over a large query.
If you are using the ServerSide()
helper, the transaction andcall to iterator()
will be handled transparently.
Full-text search
Postgresql provides sophisticated full-text search using specialdata-types (tsvector
and tsquery
). Documents should be stored orconverted to the tsvector
type, and search queries should be converted totsquery
.
For simple cases, you can simply use the Match()
function, which willautomatically perform the appropriate conversions, and requires no schemachanges:
- def blog_search(search_term):
- return Blog.select().where(
- (Blog.status == Blog.STATUS_PUBLISHED) &
- Match(Blog.content, search_term))
The Match()
function will automatically convert the left-hand operandto a tsvector
, and the right-hand operand to a tsquery
. For betterperformance, it is recommended you create a GIN
index on the column youplan to search:
- CREATE INDEX blog_full_text_search ON blog USING gin(to_tsvector(content));
Alternatively, you can use the TSVectorField
to maintain adedicated column for storing tsvector
data:
- class Blog(Model):
- content = TextField()
- search_content = TSVectorField()
Note
TSVectorField
, will automatically be created with a GIN index.
You will need to explicitly convert the incoming text data to tsvector
wheninserting or updating the search_content
field:
- content = 'Excellent blog post about peewee ORM.'
- blog_entry = Blog.create(
- content=content,
- search_content=fn.to_tsvector(content))
To perform a full-text search, use TSVectorField.match()
:
- terms = 'python & (sqlite | postgres)'
- results = Blog.select().where(Blog.search_content.match(terms))
For more information, see the Postgres full-text search docs.
postgres_ext API notes
- class
PostgresqlExtDatabase
(database[, server_side_cursors=False[, register_hstore=False[, …]]]) - Identical to
PostgresqlDatabase
but required in order to support:
Parameters:
- database (str) – Name of database to connect to.
- server_side_cursors (bool) – Whether
SELECT
queries should utilizeserver-side cursors. - register_hstore (bool) – Register the HStore extension with the connection.
- Server-side cursors
ArrayField
DateTimeTZField
JSONField
BinaryJSONField
HStoreField
TSVectorField
If you wish to use the HStore extension, you must specifyregister_hstore=True
.
If using server_side_cursors
, also be sure to wrap your queries withServerSide()
.
Parameters:select_query – a SelectQuery
instance.Rtype generator:
Wrap the given select query in a transaction, and call it’siterator()
method to avoid caching row instances. Inorder for the server-side resources to be released, be sure to exhaust thegenerator (iterate over all the rows).
Usage:
- large_query = PageView.select()
- for page_view in ServerSide(large_query):
- # Do something interesting.
- pass
- # At this point server side resources are released.
- class
ArrayField
([field_class=IntegerField[, field_kwargs=None[, dimensions=1[, convert_values=False]]]])
Parameters:
- field_class – a subclass of
Field
, e.g.IntegerField
. - field_kwargs (dict) – arguments to initialize
field_class
. - dimensions (int) – dimensions of array.
- convert_values (bool) – apply
field_class
value conversion to array data.
Field capable of storing arrays of the provided field_class.
Note
By default ArrayField will use a GIN index. To disable this, initializethe field with index=False
.
You can store and retrieve lists (or lists-of-lists):
- class BlogPost(BaseModel):
- content = TextField()
- tags = ArrayField(CharField)
- post = BlogPost(content='awesome', tags=['foo', 'bar', 'baz'])
Additionally, you can use the getitem
API to query values or slicesin the database:
- # Get the first tag on a given blog post.
- first_tag = (BlogPost
- .select(BlogPost.tags[0].alias('first_tag'))
- .where(BlogPost.id == 1)
- .dicts()
- .get())
- # first_tag = {'first_tag': 'foo'}
Get a slice of values:
- # Get the first two tags.
- two_tags = (BlogPost
- .select(BlogPost.tags[:2].alias('two'))
- .dicts()
- .get())
- # two_tags = {'two': ['foo', 'bar']}
Parameters:items – One or more items that must be in the given array field.
- # Get all blog posts that are tagged with both "python" and "django".
- Blog.select().where(Blog.tags.contains('python', 'django'))
Parameters:items – One or more items to search for in the given array field.
Like contains()
, except will match rows where thearray contains any of the given items.
- # Get all blog posts that are tagged with "flask" and/or "django".
- Blog.select().where(Blog.tags.contains_any('flask', 'django'))
- class
DateTimeTZField
(*args, **kwargs) - A timezone-aware subclass of
DateTimeField
.
- class
HStoreField
(*args, **kwargs) - A field for storing and retrieving arbitrary key/value pairs. For detailson usage, see hstore support.
Attention
To use the HStoreField
you will need to be sure thehstore extension is registered with the connection. To accomplishthis, instantiate the PostgresqlExtDatabase
withregister_hstore=True
.
Note
By default HStoreField
will use a GiST index. To disable this,initialize the field with index=False
.
- >>> for h in House.select(House.address, House.features.keys().alias('keys')):
- ... print(h.address, h.keys)
- 123 Main St [u'bath', u'garage']
- >>> for h in House.select(House.address, House.features.values().alias('vals')):
- ... print(h.address, h.vals)
- 123 Main St [u'2 bath', u'2 cars']
- >>> for h in House.select(House.address, House.features.items().alias('mtx')):
- ... print(h.address, h.mtx)
- 123 Main St [[u'bath', u'2 bath'], [u'garage', u'2 cars']]
- >>> for h in House.select(House.address, House.features.slice('garage').alias('garage_data')):
- ... print(h.address, h.garage_data)
- 123 Main St {'garage': '2 cars'}
- >>> for h in House.select(House.address, House.features.exists('garage').alias('has_garage')):
- ... print(h.address, h.has_garage)
- 123 Main St True
- >>> for h in House.select().where(House.features.exists('garage')):
- ... print(h.address, h.features['garage']) # <-- just houses w/garage data
- 123 Main St 2 cars
defined
(key)Query for whether the given key has a value associated with it.
- Perform an atomic update to the keys/values for a given row or rows.
- >>> query = House.update(features=House.features.update(
- ... sqft=2000,
- ... year_built=2012))
- >>> query.where(House.id == 1).execute()
Note
We will use an UPDATE
query.
- >>> query = House.update(features=House.features.delete(
- ... 'sqft', 'year_built'))
- >>> query.where(House.id == 1).execute()
Parameters:value – Either a dict
, a list
of keys, or a single key.
Query rows for the existence of either:
- a partial dictionary.
- a list of keys.
- a single key.
- >>> query = House.select()
- >>> has_garage = query.where(House.features.contains('garage'))
- >>> garage_bath = query.where(House.features.contains(['garage', 'bath']))
- >>> twocar = query.where(House.features.contains({'garage': '2 cars'}))
Parameters:keys – One or more keys to search for.
Query rows for the existence of any key.
Parameters:dumps – The default is to call json.dumps() or the dumps function.You can override this method to create a customized JSON wrapper.
Field class suitable for storing and querying arbitrary JSON. When usingthis on a model, set the field’s value to a Python object (either adict
or a list
). When you retrieve your value from the database itwill be returned as a Python data structure.
Note
You must be using Postgres 9.2 / psycopg2 2.5 or greater.
Note
If you are using Postgres 9.4, strongly consider using theBinaryJSONField
instead as it offers better performance andmore powerful querying options.
Example model declaration:
- db = PostgresqlExtDatabase('my_db')
- class APIResponse(Model):
- url = CharField()
- response = JSONField()
- class Meta:
- database = db
Example of storing JSON data:
- url = 'http://foo.com/api/resource/'
- resp = json.loads(urllib2.urlopen(url).read())
- APIResponse.create(url=url, response=resp)
- APIResponse.create(url='http://foo.com/baz/', response={'key': 'value'})
To query, use Python’s []
operators to specify nested key or array lookups:
- APIResponse.select().where(
- APIResponse.response['key1']['nested-key'] == 'some-value')
To illustrate the use of the []
operators, imagine we have thefollowing data stored in an APIResponse
:
- {
- "foo": {
- "bar": ["i1", "i2", "i3"],
- "baz": {
- "huey": "mickey",
- "peewee": "nugget"
- }
- }
- }
Here are the results of a few queries:
- def get_data(expression):
- # Helper function to just retrieve the results of a
- # particular expression.
- query = (APIResponse
- .select(expression.alias('my_data'))
- .dicts()
- .get())
- return query['my_data']
- # Accessing the foo -> bar subkey will return a JSON
- # representation of the list.
- get_data(APIResponse.data['foo']['bar'])
- # '["i1", "i2", "i3"]'
- # In order to retrieve this list as a Python list,
- # we will call .as_json() on the expression.
- get_data(APIResponse.data['foo']['bar'].as_json())
- # ['i1', 'i2', 'i3']
- # Similarly, accessing the foo -> baz subkey will
- # return a JSON representation of the dictionary.
- get_data(APIResponse.data['foo']['baz'])
- # '{"huey": "mickey", "peewee": "nugget"}'
- # Again, calling .as_json() will return an actual
- # python dictionary.
- get_data(APIResponse.data['foo']['baz'].as_json())
- # {'huey': 'mickey', 'peewee': 'nugget'}
- # When dealing with simple values, either way works as
- # you expect.
- get_data(APIResponse.data['foo']['bar'][0])
- # 'i1'
- # Calling .as_json() when the result is a simple value
- # will return the same thing as the previous example.
- get_data(APIResponse.data['foo']['bar'][0].as_json())
- # 'i1'
Parameters:dumps – The default is to call json.dumps() or the dumps function.You can override this method to create a customized JSON wrapper.
Store and query arbitrary JSON documents. Data should be stored usingnormal Python dict
and list
objects, and when data is returned fromthe database, it will be returned using dict
and list
as well.
For examples of basic query operations, see the above code samples forJSONField
. The example queries below will use the sameAPIResponse
model described above.
Note
By default BinaryJSONField will use a GiST index. To disable this,initialize the field with index=False
.
Note
You must be using Postgres 9.4 / psycopg2 2.5 or newer. If you are using Postgres 9.2 or 9.3, you can use the regular JSONField
instead.
Example:
- search_fragment = {
- 'foo': {'bar': ['i2']}
- }
- query = (APIResponse
- .select()
- .where(APIResponse.data.contains(search_fragment)))
- # If we're searching for a list, the list items do not need to
- # be ordered in a particular way:
- query = (APIResponse
- .select()
- .where(APIResponse.data.contains({
- 'foo': {'bar': ['i2', 'i1']}})))
We can pass in simple keys as well. To find APIResponses that contain the key foo
at the top-level:
- APIResponse.select().where(APIResponse.data.contains('foo'))
We can also search sub-keys using square-brackets:
- APIResponse.select().where(
- APIResponse.data['foo']['bar'].contains(['i2', 'i1']))
- APIResponse.select().where(
- APIResponse.data.contains_any('foo', 'baz', 'nugget'))
Like contains()
, we can also search sub-keys:
- APIResponse.select().where(
- APIResponse.data['foo']['bar'].contains_any('i2', 'ix'))
- APIResponse.select().where(
- APIResponse.data.contains_all('foo'))
Like contains_any()
, we can also search sub-keys:
- APIResponse.select().where(
- APIResponse.data['foo']['bar'].contains_all('i1', 'i2', 'i3'))
containedby
(_other)- Test whether the given JSON document is contained by (is a subset of) the given JSON document. This method is the inverse of
contains()
.
- big_doc = {
- 'foo': {
- 'bar': ['i1', 'i2', 'i3'],
- 'baz': {
- 'huey': 'mickey',
- 'peewee': 'nugget',
- }
- },
- 'other_key': ['nugget', 'bear', 'kitten'],
- }
- APIResponse.select().where(
- APIResponse.data.contained_by(big_doc))
concat
(data)Concatentate two field data and the provided data. Note that thisoperation does not merge or do a “deep concat”.
Test whether the key exists at the top-level of the JSON object.
- Remove one or more keys from the top-level of the JSON object.
Match
(field, query)- Generate a full-text search expression, automatically converting theleft-hand operand to a
tsvector
, and the right-hand operand to atsquery
.
Example:
- def blog_search(search_term):
- return Blog.select().where(
- (Blog.status == Blog.STATUS_PUBLISHED) &
- Match(Blog.content, search_term))
- class
TSVectorField
- Field type suitable for storing
tsvector
data. This field willautomatically be created with aGIN
index for improved searchperformance.
Note
Data stored in this field will still need to be manually converted tothe tsvector
type.
Note
By default TSVectorField will use a GIN index. To disable this,initialize the field withindex=False
.
Example usage:
- class Blog(Model):
- content = TextField()
- search_content = TSVectorField()
- content = 'this is a sample blog entry.'
- blog_entry = Blog.create(
- content=content,
- search_content=fn.to_tsvector(content)) # Note `to_tsvector()`.
Parameters:
- **query** (_str_) – the full-text search query.
- **language** (_str_) – language name (optional).
- **plain** (_bool_) – parse search query using plain (simple) parser.Returns:
an expression representing full-text search/match.
Example:
- # Perform a search using the "match" method.
- terms = 'python & (sqlite | postgres)'
- results = Blog.select().where(Blog.search_content.match(terms))