SQLite Extensions

The default SqliteDatabase already includes many SQLite-specific features:

The playhouse.sqlite_ext includes even more SQLite features, including:

Getting started

To get started with the features described in this document, you will want to use the SqliteExtDatabase class from the playhouse.sqlite_ext module. Furthermore, some features require the playhouse._sqlite_ext C extension – these features will be noted in the documentation.

Instantiating a SqliteExtDatabase:

  1. from playhouse.sqlite_ext import SqliteExtDatabase
  2. db = SqliteExtDatabase('my_app.db', pragmas=(
  3. ('cache_size', -1024 * 64), # 64MB page-cache.
  4. ('journal_mode', 'wal'), # Use WAL-mode (you should always use this!).
  5. ('foreign_keys', 1)) # Enforce foreign-key constraints.

APIs

class SqliteExtDatabase(database[, pragmas=None[, timeout=5[, c_extensions=None[, rank_functions=True[, hash_functions=False[, regexp_function=False[, bloomfilter=False]]]]]]])

Parameters:
  • pragmas (list) – A list of 2-tuples containing pragma key and value to set every time a connection is opened.
  • timeout – Set the busy-timeout on the SQLite driver (in seconds).
  • c_extensions (bool) – Declare that C extension speedups must/must-not be used. If set to True and the extension module is not available, will raise an ImproperlyConfigured exception.
  • rank_functions (bool) – Make search result ranking functions available.
  • hash_functions (bool) – Make hashing functions available (md5, sha1, etc).
  • regexp_function (bool) – Make the REGEXP function available.
  • bloomfilter (bool) – Make the sqlite-bloomfilter available.

Extends SqliteDatabase and inherits methods for declaring user-defined functions, pragmas, etc.

class CSqliteExtDatabase(database[, pragmas=None[, timeout=5[, c_extensions=None[, rank_functions=True[, hash_functions=False[, regexp_function=False[, bloomfilter=False[, replace_busy_handler=False]]]]]]]])

Parameters:
  • pragmas (list) – A list of 2-tuples containing pragma key and value to set every time a connection is opened.
  • timeout – Set the busy-timeout on the SQLite driver (in seconds).
  • c_extensions (bool) – Declare that C extension speedups must/must-not be used. If set to True and the extension module is not available, will raise an ImproperlyConfigured exception.
  • rank_functions (bool) – Make search result ranking functions available.
  • hash_functions (bool) – Make hashing functions available (md5, sha1, etc).
  • regexp_function (bool) – Make the REGEXP function available.
  • bloomfilter (bool) – Make the sqlite-bloomfilter available.
  • replace_busy_handler (bool) – Use a smarter busy-handler implementation.

Extends SqliteExtDatabase and requires that the playhouse._sqlite_ext extension module be available.

  • on_commit(fn)

    Register a callback to be executed whenever a transaction is committed on the current connection. The callback accepts no parameters and the return value is ignored.

    However, if the callback raises a ValueError, the transaction will be aborted and rolled-back.

    Example:

    1. db = CSqliteExtDatabase(':memory:')
    2. @db.on_commit
    3. def on_commit():
    4. logger.info('COMMITing changes')
  • on_rollback(fn)

    Register a callback to be executed whenever a transaction is rolled back on the current connection. The callback accepts no parameters and the return value is ignored.

    Example:

    1. @db.on_rollback
    2. def on_rollback():
    3. logger.info('Rolling back changes')
  • on_update(fn)

    Register a callback to be executed whenever the database is written to (via an UPDATE, INSERT or DELETE query). The callback should accept the following parameters:

    • query - the type of query, either INSERT, UPDATE or DELETE.
    • database name - the default database is named main.
    • table name - name of table being modified.
    • rowid - the rowid of the row being modified.

    The callback’s return value is ignored.

    Example:

    1. db = CSqliteExtDatabase(':memory:')
    2. @db.on_update
    3. def on_update(query_type, db, table, rowid):
    4. # e.g. INSERT row 3 into table users.
    5. logger.info('%s row %s into table %s', query_type, rowid, table)
  • changes()

    Return the number of rows modified in the currently-open transaction.

  • autocommit

    Property which returns a boolean indicating if autocommit is enabled. By default, this value will be True except when inside a transaction (or atomic() block).

    Example:

    1. >>> db = CSqliteExtDatabase(':memory:')
    2. >>> db.autocommit
    3. True
    4. >>> with db.atomic():
    5. ... print(db.autocommit)
    6. ...
    7. False
    8. >>> db.autocommit
    9. True
  • backup(destination[, pages=None, name=None, progress=None])

    Parameters:
    • destination (SqliteDatabase) – Database object to serve as destination for the backup.
    • pages (int) – Number of pages per iteration. Default value of -1 indicates all pages should be backed-up in a single step.
    • name (str) – Name of source database (may differ if you used ATTACH DATABASE to load multiple databases). Defaults to “main”.
    • progress – Progress callback, called with three parameters: the number of pages remaining, the total page count, and whether the backup is complete.

    Example:

    1. master = CSqliteExtDatabase('master.db')
    2. replica = CSqliteExtDatabase('replica.db')
    3. # Backup the contents of master to replica.
    4. master.backup(replica)
  • backup_to_file(filename[, pages, name, progress])

    Parameters:
    • filename – Filename to store the database backup.
    • pages (int) – Number of pages per iteration. Default value of -1 indicates all pages should be backed-up in a single step.
    • name (str) – Name of source database (may differ if you used ATTACH DATABASE to load multiple databases). Defaults to “main”.
    • progress – Progress callback, called with three parameters: the number of pages remaining, the total page count, and whether the backup is complete.

    Backup the current database to a file. The backed-up data is not a database dump, but an actual SQLite database file.

    Example:

    1. db = CSqliteExtDatabase('app.db')
    2. def nightly_backup():
    3. filename = 'backup-%s.db' % (datetime.date.today())
    4. db.backup_to_file(filename)
  • blob_open(table, column, rowid[, read_only=False])

    Parameters:
    • table (str) – Name of table containing data.
    • column (str) – Name of column containing data.
    • rowid (int) – ID of row to retrieve.
    • read_only (bool) – Open the blob for reading only.
    Returns:

    Blob instance which provides efficient access to the underlying binary data.

    Return type:

    Blob

    See Blob and ZeroBlob for more information.

    Example:

    1. class Image(Model):
    2. filename = TextField()
    3. data = BlobField()
    4. buf_size = 1024 * 1024 * 8 # Allocate 8MB for storing file.
    5. rowid = Image.insert({Image.filename: 'thefile.jpg',
    6. Image.data: ZeroBlob(buf_size)}).execute()
    7. # Open the blob, returning a file-like object.
    8. blob = db.blob_open('image', 'data', rowid)
    9. # Write some data to the blob.
    10. blob.write(image_data)
    11. img_size = blob.tell()
    12. # Read the data back out of the blob.
    13. blob.seek(0)
    14. image_data = blob.read(img_size)

class RowIDField

Primary-key field that corresponds to the SQLite rowid field. For more information, see the SQLite documentation on rowid tables..

Example:

  1. class Note(Model):
  2. rowid = RowIDField() # Will be primary key.
  3. content = TextField()
  4. timestamp = TimestampField()

class DocIDField

Subclass of RowIDField for use on virtual tables that specifically use the convention of docid for the primary key. As far as I know this only pertains to tables using the FTS3 and FTS4 full-text search extensions.

Attention

In FTS3 and FTS4, “docid” is simply an alias for “rowid”. To reduce confusion, it’s probably best to just always use RowIDField and never use DocIDField.

  1. class NoteIndex(FTSModel):
  2. docid = DocIDField() # "docid" is used as an alias for "rowid".
  3. content = SearchField()
  4. class Meta:
  5. database = db

class AutoIncrementField

SQLite, by default, may reuse primary key values after rows are deleted. To ensure that the primary key is always monotonically increasing, regardless of deletions, you should use AutoIncrementField. There is a small performance cost for this feature. For more information, see the SQLite docs on autoincrement.

class JSONField

Field class suitable for storing JSON data, with special methods designed to work with the json1 extension.

SQLite 3.9.0 added JSON support in the form of an extension library. The SQLite json1 extension provides a number of helper functions for working with JSON data. These APIs are exposed as methods of a special field-type, JSONField.

Most functions that operate on JSON fields take a path argument. The JSON extension documents specify that the path should begin with $ followed by zero or more instances of .objectlabel or [arrayindex]. Peewee simplifies this by allowing you to omit the $ character and just specify the path you need or None for an empty path:

  • path='' –> '$'
  • path='tags' –> '$.tags'
  • path='[0][1].bar' –> '$[0][1].bar'
  • path='metadata[0]' –> '$.metadata[0]'
  • path='user.data.email' –> '$.user.data.email'

Rather than specifying the paths as a string, you can also use the JSONPath helper (exposed as the J object):

  • J –> '$'
  • J.tags –> '$.tags'
  • J[0][1].bar –> '$[0][1].bar'
  • J.metadata[0] –> '$.metadata[0]'
  • J.user.data.email –> '$.user.data.email'
  • J['1337'] –> '$.1337' (key “1337” rather an array index)

  • length(\paths*)

    Parameters:paths (JSONPath) – Zero or more JSON paths.

    Returns the length of the JSON object stored, either in the column, or at one or more paths within the column data.

    Example:

    1. # Get APIResponses annotated with the count of tags where the
    2. # category key has a value of "posts".
    3. query = (APIResponse
    4. .select(
    5. APIResponse,
    6. APIResponse.json_data.length(J.metadata.tags).alias('tag_count'))
    7. .where(APIResponse.json_data['category'] == 'posts'))
  • extract(\paths*)

    Parameters:paths (JSONPath) – One or more JSON paths.

    Extracts the JSON objects at the given path(s) from the column data. For example if you have a complex JSON object and only need to work with the value of a specific key, you can use the extract method, specifying the path to the key, to return only the data you need.

    Instead of using extract(), you can also use square brackets to express the same thing.

    Example:

    1. # Query for the "title" and "category" values stored in the
    2. # json_data column for APIResponses whose category is "posts".
    3. query = (APIResponse
    4. .select(APIResponse.json_data[J.title].alias('title'),
    5. APIResponse.json_data[J.metadata.tags].alias('tags'))
    6. .where(APIResponse.json_data[J.category] == 'posts'))
    7. for response in query:
    8. print(response.title, response.tags)
    9. # Example (note that JSON lists are returned as Python lists):
    10. # ('Post 1', ['foo', 'bar'])
    11. # ('Post 2', ['baz', 'nug'])
    12. # ('Post 3', [])
  • insert(\pairs, **data*)

    Parameters:
    • pairs – A flat list consisting of key, value pairs. E.g., k1, v1, k2, v2, k3, v3. The key may be a simple string or a JSONPath instance.
    • data – keyword arguments mapping paths to values to insert.

    Insert the values at the given keys (or paths) in the column data. If the key/path specified already has a value, it will not be overwritten.

    Example of adding a new key/value to a sub-key:

    1. # Existing data in column is preserved and "new_key": "new value"
    2. # is stored in the "metadata" dictionary. If "new_key" already
    3. # existed, however, the existing data would not be overwritten.
    4. nrows = (APIResponse
    5. .update(json_data=APIResponse.json_data.insert(
    6. 'metadata.new_key', 'new value'))
    7. .where(APIResponse.json_data[J.category] == 'posts')
    8. .execute())
  • replace(\pairs, **data*)

    Parameters:
    • pairs – A flat list consisting of key, value pairs. E.g., k1, v1, k2, v2, k3, v3. The key may be a simple string or a JSONPath instance.
    • data – keyword arguments mapping paths to values to replace.

    Replace the values at the given keys (or paths) in the column data. If the key/path specified does not exist, a new key will not be created. Data must exist first in order to be replaced.

    Example of replacing the value of an existing key:

    1. # Rename the "posts" category to "notes".
    2. nrows = (APIResponse
    3. .update(json_data=APIResponse.json_data.replace(
    4. 'category', 'notes'))
    5. .where(APIResponse.json_data[J.category] == 'posts')
    6. .execute())
  • set(\pairs, **data*)

    Parameters:
    • pairs – A flat list consisting of key, value pairs. E.g., k1, v1, k2, v2, k3, v3. The key may be a simple string or a JSONPath instance.
    • data – keyword arguments mapping paths to values to set.

    Set the values at the given keys (or paths) in the column data. The values will be created/updated regardless of whether the key exists already.

    Example of setting two new key/value pairs:

    1. nrows = (APIResponse
    2. .update(json_data=APIResponse.json_data.set(
    3. 'metadata.key1', 'value1',
    4. 'metadata.key2', [1, 2, 3]))
    5. .execute())
    6. # Retrieve an arbitrary row from the db to inspect it's metadata.
    7. obj = APIResponse.get()
    8. print(obj.json_data['metadata']) # key1 and key2 are present.
    9. # {'key2': [1, 2, 3], 'key1': 'value1', 'tags': ['foo', 'bar']}
  • remove(\paths*)

    Parameters:paths (JSONPath) – One or more JSON paths.

    Remove the data at the given paths from the column data.

    Example of removing two paths:

    1. # Update the data, removing "key1" and "key2" from the "metadata"
    2. # object.
    3. (APIResponse
    4. .update(json_data=APIResponse.json_data.remove(
    5. 'metadata.key1',
    6. 'metadata.key2'))
    7. .execute())
    8. # Equivalent, using J:
    9. (APIResponse
    10. .update(json_data=APIResponse.json_data.remove(
    11. J.metadata.key1,
    12. J.metadata.key2))
    13. .execute())
  • update(data)

    Parameters:data – A JSON value.

    Updates the column data in-place, merging the new data with the data already present in the column. This is different than set(), as sub-dictionaries will be merged with other sub-dictionaries, recursively.

    1. >>> data = {'k1': {'foo': 1, 'bar': 2}, 'k2': {'baz': 3}}
    2. >>> resp = APIResponse.create(json_data=data)
    3. >>> resp
    4. <__main__.APIResponse at 0x7f0b28115cc0>
    5. >>> patch = {'k1': {'foo': 1337, 'nug': 0}, 'k3': [1, 2]}
    6. >>> (APIResponse
    7. ... .update(json_data=APIResponse.json_data.update(patch))
    8. ... .where(APIResponse.id == resp.id)
    9. ... .execute())
    10. 1
    11. >>> APIResponse.get(APIResponse.id == resp.id).json_data
    12. {'k1': {'bar': 2, 'foo': 1337, 'nug': 0}, 'k2': {'baz': 3}, 'k3': [1, 2]}
  • json_type([path=None])

    Parameters:path (JSONPath) – A JSON path (optional).

    Return a string identifying the type of value stored in the column (or at the given path).

    The type returned will be one of:

    • object
    • array
    • integer
    • real
    • true
    • false
    • text
    • null <– the string “null” means an actual NULL value
    • NULL <– an actual NULL value means the path was not found
  • children([path=None])

    The children function corresponds to json_each, a table-valued function that walks the JSON value provided and returns the immediate children of the top-level array or object. If a path is specified, then that path is treated as the top-most element.

    The rows returned by calls to children() have the following attributes:

    • key: the key of the current element relative to its parent.
    • value: the value of the current element.
    • type: one of the data-types (see json_type()).
    • atom: the scalar value for primitive types, NULL for arrays and objects.
    • id: a unique ID referencing the current node in the tree.
    • parent: the ID of the containing node.
    • fullkey: the full path describing the current element.
    • path: the path to the container of the current row.

    For examples, see my blog post on JSON1.

    SQLite documentation on json_each.

  • tree([path=None])

    The tree function corresponds to json_tree, a table-valued function that recursively walks the JSON value provided and returns information about the keys at each level. If a path is specified, then that path is treated as the top-most element.

    The rows returned by calls to tree() have the same attributes as rows returned by calls to children():

    • key: the key of the current element relative to its parent.
    • value: the value of the current element.
    • type: one of the data-types (see json_type()).
    • atom: the scalar value for primitive types, NULL for arrays and objects.
    • id: a unique ID referencing the current node in the tree.
    • parent: the ID of the containing node.
    • fullkey: the full path describing the current element.
    • path: the path to the container of the current row.

    For examples, see my blog post on JSON1.

    SQLite documentation on json_tree.

class JSONPath([path=None])

Parameters:path (list) – Components comprising the JSON path.

A convenient, Pythonic way of representing JSON paths for use with JSONField.

The JSONPath object implements __getitem__, accumulating path components, which it can turn into the corresponding json-path expression.

Attention

Rather than instantiating this class directly, use the J instance to create JSON paths:

  1. from playhouse.sqlite_ext import J
  2. class APIResponse(Model):
  3. data = JSONField()
  4. # Select the "title" and "metadata"."tags" paths from the data
  5. # field, filtering on "category" is 'post'.
  6. query = (APIResponse
  7. .select(APIResponse.data[J.title].alias('title'),
  8. APIResponse.data[J.metadata.tags].alias('tags'))
  9. .where(APIResponse.data[J.category] == 'post'))

For example (using the J mnemonic, as described above):

  • J -> $ - root element lookup.
  • J.category -> $.category
  • J.metadata.tags[0] -> $.metadata.tags[0]
  • J[0] -> $[0] - Lookup the first element in an array.
  • J[‘0’] -> $.0 - Here we would look up the key “0” rather than the first element in an array.
  • J[‘foo’] (same as J.foo) -> $.foo

class SearchField([unindexed=False[, column_name=None]])

Field-class to be used for columns on models representing full-text search virtual tables. The full-text search extensions prohibit the specification of any typing or constraints on columns. This behavior is enforced by the SearchField, which raises an exception if any configuration is attempted that would be incompatible with the full-text search extensions.

Example model for document search index (timestamp is stored in the table but it’s data is not searchable):

  1. class DocumentIndex(FTSModel):
  2. title = SearchField()
  3. content = SearchField()
  4. tags = SearchField()
  5. timestamp = SearchField(unindexed=True)

class VirtualModel

Model class designed to be used to represent virtual tables. The default metadata settings are slightly different, to match those frequently used by virtual tables.

Metadata options:

  • arguments - arguments passed to the virtual table constructor.
  • extension_module - name of extension to use for virtual table.

  • options - a dictionary of settings to apply in virtual table

    constructor.

  • primary_key - defaults to False, indicating no primary key.

class FTSModel

Subclass of VirtualModel to be used with the FTS3 and FTS4 full-text search extensions.

FTSModel subclasses should be defined normally, however there are a couple caveats:

  • Unique constraints, not null constraints, check constraints and foreign keys are not supported.
  • Indexes on fields and multi-column indexes are ignored completely
  • Sqlite will treat all column types as TEXT (although you can store other data types, Sqlite will treat them as text).
  • FTS models contain a rowid field which is automatically created and managed by SQLite (unless you choose to explicitly set it during model creation). Lookups on this column are fast and efficient.

Given these constraints, it is strongly recommended that all fields declared on an FTSModel subclass be instances of SearchField (though an exception is made for explicitly declaring a RowIDField). Using SearchField will help prevent you accidentally creating invalid column constraints. If you wish to store metadata in the index but would not like it to be included in the full-text index, then specify unindexed=True when instantiating the SearchField.

The only exception to the above is for the rowid primary key, which can be declared using RowIDField. Lookups on the rowid are very efficient. If you are using FTS4 you can also use DocIDField, which is an alias for the rowid (though there is no benefit to doing so).

Because of the lack of secondary indexes, it usually makes sense to use the rowid primary key as a pointer to a row in a regular table. For example:

  1. class Document(Model):
  2. # Canonical source of data, stored in a regular table.
  3. author = ForeignKeyField(User, backref='documents')
  4. title = TextField(null=False, unique=True)
  5. content = TextField(null=False)
  6. timestamp = DateTimeField()
  7. class Meta:
  8. database = db
  9. class DocumentIndex(FTSModel):
  10. # Full-text search index.
  11. rowid = RowIDField()
  12. title = SearchField()
  13. content = SearchField()
  14. class Meta:
  15. database = db
  16. # Use the porter stemming algorithm to tokenize content.
  17. options = {'tokenize': 'porter'}

To store a document in the document index, we will INSERT a row into the DocumentIndex table, manually setting the rowid so that it matches the primary-key of the corresponding Document:

  1. def store_document(document):
  2. DocumentIndex.insert({
  3. DocumentIndex.rowid: document.id,
  4. DocumentIndex.title: document.title,
  5. DocumentIndex.content: document.content}).execute()

To perform a search and return ranked results, we can query the Document table and join on the DocumentIndex. This join will be efficient because lookups on an FTSModel’s rowid field are fast:

  1. def search(phrase):
  2. # Query the search index and join the corresponding Document
  3. # object on each search result.
  4. return (Document
  5. .select()
  6. .join(
  7. DocumentIndex,
  8. on=(Document.id == DocumentIndex.rowid))
  9. .where(DocumentIndex.match(phrase))
  10. .order_by(DocumentIndex.bm25()))

Warning

All SQL queries on FTSModel classes will be slow except full-text searches and rowid lookups.

If the primary source of the content you are indexing exists in a separate table, you can save some disk space by instructing SQLite to not store an additional copy of the search index content. SQLite will still create the metadata and data-structures needed to perform searches on the content, but the content itself will not be stored in the search index.

To accomplish this, you can specify a table or column using the content option. The FTS4 documentation has more information.

Here is a short example illustrating how to implement this with peewee:

  1. class Blog(Model):
  2. title = TextField()
  3. pub_date = DateTimeField(default=datetime.datetime.now)
  4. content = TextField() # We want to search this.
  5. class Meta:
  6. database = db
  7. class BlogIndex(FTSModel):
  8. content = SearchField()
  9. class Meta:
  10. database = db
  11. options = {'content': Blog.content} # <-- specify data source.
  12. db.create_tables([Blog, BlogIndex])
  13. # Now, we can manage content in the BlogIndex. To populate the
  14. # search index:
  15. BlogIndex.rebuild()
  16. # Optimize the index.
  17. BlogIndex.optimize()

The content option accepts either a single Field or a Model and can reduce the amount of storage used by the database file. However, content will need to be manually moved to/from the associated FTSModel.

  • classmethod match(term)

    Parameters:term – Search term or expression.

    Generate a SQL expression representing a search for the given term or expression in the table. SQLite uses the MATCH operator to indicate a full-text search.

    Example:

    1. # Search index for "search phrase" and return results ranked
    2. # by relevancy using the BM25 algorithm.
    3. query = (DocumentIndex
    4. .select()
    5. .where(DocumentIndex.match('search phrase'))
    6. .order_by(DocumentIndex.bm25()))
    7. for result in query:
    8. print('Result: %s' % result.title)
  • classmethod search(term[, weights=None[, with_score=False[, score_alias=’score’[, explicit_ordering=False]]]])

    Parameters:
    • term (str) – Search term to use.
    • weights – A list of weights for the columns, ordered with respect to the column’s position in the table. Or, a dictionary keyed by the field or field name and mapped to a value.
    • with_score – Whether the score should be returned as part of the SELECT statement.
    • score_alias (str) – Alias to use for the calculated rank score. This is the attribute you will use to access the score if with_score=True.
    • explicit_ordering (bool) – Order using full SQL function to calculate rank, as opposed to simply referencing the score alias in the ORDER BY clause.

    Shorthand way of searching for a term and sorting results by the quality of the match.

    Note

    This method uses a simplified algorithm for determining the relevance rank of results. For more sophisticated result ranking, use the search_bm25() method.

    1. # Simple search.
    2. docs = DocumentIndex.search('search term')
    3. for result in docs:
    4. print(result.title)
    5. # More complete example.
    6. docs = DocumentIndex.search(
    7. 'search term',
    8. weights={'title': 2.0, 'content': 1.0},
    9. with_score=True,
    10. score_alias='search_score')
    11. for result in docs:
    12. print(result.title, result.search_score)
  • classmethod search_bm25(term[, weights=None[, with_score=False[, score_alias=’score’[, explicit_ordering=False]]]])

    Parameters:
    • term (str) – Search term to use.
    • weights – A list of weights for the columns, ordered with respect to the column’s position in the table. Or, a dictionary keyed by the field or field name and mapped to a value.
    • with_score – Whether the score should be returned as part of the SELECT statement.
    • score_alias (str) – Alias to use for the calculated rank score. This is the attribute you will use to access the score if with_score=True.
    • explicit_ordering (bool) – Order using full SQL function to calculate rank, as opposed to simply referencing the score alias in the ORDER BY clause.

    Shorthand way of searching for a term and sorting results by the quality of the match using the BM25 algorithm.

    Attention

    The BM25 ranking algorithm is only available for FTS4. If you are using FTS3, use the search() method instead.

  • classmethod search_bm25f(term[, weights=None[, with_score=False[, score_alias=’score’[, explicit_ordering=False]]]])

    Same as FTSModel.search_bm25(), but using the BM25f variant of the BM25 ranking algorithm.

  • classmethod search_lucene(term[, weights=None[, with_score=False[, score_alias=’score’[, explicit_ordering=False]]]])

    Same as FTSModel.search_bm25(), but using the result ranking algorithm from the Lucene search engine.

  • classmethod rank([col1_weight, col2_weight…coln_weight])

    Parameters:col_weight (float) – (Optional) weight to give to the ith column of the model. By default all columns have a weight of 1.0.

    Generate an expression that will calculate and return the quality of the search match. This rank can be used to sort the search results. A higher rank score indicates a better match.

    The rank function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.

    Note

    The algorithm used by rank() is simple and relatively quick. For more sophisticated result ranking, use:

    1. query = (DocumentIndex
    2. .select(
    3. DocumentIndex,
    4. DocumentIndex.rank().alias('score'))
    5. .where(DocumentIndex.match('search phrase'))
    6. .order_by(DocumentIndex.rank()))
    7. for search_result in query:
    8. print search_result.title, search_result.score
  • classmethod bm25([col1_weight, col2_weight…coln_weight])

    Parameters:col_weight (float) – (Optional) weight to give to the ith column of the model. By default all columns have a weight of 1.0.

    Generate an expression that will calculate and return the quality of the search match using the BM25 algorithm. This value can be used to sort the search results, with higher scores corresponding to better matches.

    Like rank(), bm25 function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.

    Attention

    The BM25 result ranking algorithm requires FTS4. If you are using FTS3, use rank() instead.

    1. query = (DocumentIndex
    2. .select(
    3. DocumentIndex,
    4. DocumentIndex.bm25().alias('score'))
    5. .where(DocumentIndex.match('search phrase'))
    6. .order_by(DocumentIndex.bm25()))
    7. for search_result in query:
    8. print(search_result.title, search_result.score)

    Note

    The above code example is equivalent to calling the search_bm25() method:

    1. query = DocumentIndex.search_bm25('search phrase', with_score=True)
    2. for search_result in query:
    3. print(search_result.title, search_result.score)
  • classmethod bm25f([col1_weight, col2_weight…coln_weight])

    Identical to bm25(), except that it uses the BM25f variant of the BM25 ranking algorithm.

  • classmethod lucene([col1_weight, col2_weight…coln_weight])

    Identical to bm25(), except that it uses the Lucene search result ranking algorithm.

  • classmethod rebuild()

    Rebuild the search index – this only works when the content option was specified during table creation.

  • classmethod optimize()

    Optimize the search index.

class FTS5Model

Subclass of VirtualModel to be used with the FTS5 full-text search extensions.

FTS5Model subclasses should be defined normally, however there are a couple caveats:

  • FTS5 explicitly disallows specification of any constraints, data-type or indexes on columns. For that reason, all columns must be instances of SearchField.
  • FTS5 models contain a rowid field which is automatically created and managed by SQLite (unless you choose to explicitly set it during model creation). Lookups on this column are fast and efficient.
  • Indexes on fields and multi-column indexes are not supported.

The FTS5 extension comes with a built-in implementation of the BM25 ranking function. Therefore, the search and search_bm25 methods have been overridden to use the builtin ranking functions rather than user-defined functions.

  • classmethod fts5_installed()

    Return a boolean indicating whether the FTS5 extension is installed. If it is not installed, an attempt will be made to load the extension.

  • classmethod search(term[, weights=None[, with_score=False[, score_alias=’score’]]])

    Parameters:
    • term (str) – Search term to use.
    • weights – A list of weights for the columns, ordered with respect to the column’s position in the table. Or, a dictionary keyed by the field or field name and mapped to a value.
    • with_score – Whether the score should be returned as part of the SELECT statement.
    • score_alias (str) – Alias to use for the calculated rank score. This is the attribute you will use to access the score if with_score=True.
    • explicit_ordering (bool) – Order using full SQL function to calculate rank, as opposed to simply referencing the score alias in the ORDER BY clause.

    Shorthand way of searching for a term and sorting results by the quality of the match. The FTS5 extension provides a built-in implementation of the BM25 algorithm, which is used to rank the results by relevance.

    Higher scores correspond to better matches.

    1. # Simple search.
    2. docs = DocumentIndex.search('search term')
    3. for result in docs:
    4. print(result.title)
    5. # More complete example.
    6. docs = DocumentIndex.search(
    7. 'search term',
    8. weights={'title': 2.0, 'content': 1.0},
    9. with_score=True,
    10. score_alias='search_score')
    11. for result in docs:
    12. print(result.title, result.search_score)
  • classmethod search_bm25(term[, weights=None[, with_score=False[, score_alias=’score’]]])

    With FTS5, search_bm25() is identical to the search() method.

  • classmethod rank([col1_weight, col2_weight…coln_weight])

    Parameters:col_weight (float) – (Optional) weight to give to the ith column of the model. By default all columns have a weight of 1.0.

    Generate an expression that will calculate and return the quality of the search match using the BM25 algorithm. This value can be used to sort the search results, with higher scores corresponding to better matches.

    The rank() function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.

    1. query = (DocumentIndex
    2. .select(
    3. DocumentIndex,
    4. DocumentIndex.rank().alias('score'))
    5. .where(DocumentIndex.match('search phrase'))
    6. .order_by(DocumentIndex.rank()))
    7. for search_result in query:
    8. print(search_result.title, search_result.score)

    Note

    The above code example is equivalent to calling the search() method:

    1. query = DocumentIndex.search('search phrase', with_score=True)
    2. for search_result in query:
    3. print(search_result.title, search_result.score)
  • classmethod bm25([col1_weight, col2_weight…coln_weight])

    Because FTS5 provides built-in support for BM25, the bm25() method is identical to the rank() method.

  • classmethod VocabModel([table_type=’row’|’col’|’instance’[, table_name=None]])

    Parameters:
    • table_type (str) – Either ‘row’, ‘col’ or ‘instance’.
    • table_name – Name for the vocab table. If not specified, will be “fts5tablename_v”.

    Generate a model class suitable for accessing the vocab table corresponding to FTS5 search index.

class TableFunction

Implement a user-defined table-valued function. Unlike a simple scalar or aggregate function, which returns a single scalar value, a table-valued function can return any number of rows of tabular data.

Simple example:

  1. from playhouse.sqlite_ext import TableFunction
  2. class Series(TableFunction):
  3. # Name of columns in each row of generated data.
  4. columns = ['value']
  5. # Name of parameters the function may be called with.
  6. params = ['start', 'stop', 'step']
  7. def initialize(self, start=0, stop=None, step=1):
  8. """
  9. Table-functions declare an initialize() method, which is
  10. called with whatever arguments the user has called the
  11. function with.
  12. """
  13. self.start = self.current = start
  14. self.stop = stop or float('Inf')
  15. self.step = step
  16. def iterate(self, idx):
  17. """
  18. Iterate is called repeatedly by the SQLite database engine
  19. until the required number of rows has been read **or** the
  20. function raises a `StopIteration` signalling no more rows
  21. are available.
  22. """
  23. if self.current > self.stop:
  24. raise StopIteration
  25. ret, self.current = self.current, self.current + self.step
  26. return (ret,)
  27. # Register the table-function with our database, which ensures it
  28. # is declared whenever a connection is opened.
  29. db.table_function('series')(Series)
  30. # Usage:
  31. cursor = db.execute_sql('SELECT * FROM series(?, ?, ?)', (0, 5, 2))
  32. for value, in cursor:
  33. print(value)

Note

A TableFunction must be registered with a database connection before it can be used. To ensure the table function is always available, you can use the SqliteDatabase.table_function() decorator to register the function with the database.

TableFunction implementations must provide two attributes and implement two methods, described below.

  • columns

    A list containing the names of the columns for the data returned by the function. For example, a function that is used to split a string on a delimiter might specify 3 columns: [substring, start_idx, end_idx].

  • params

    The names of the parameters the function may be called with. All parameters, including optional parameters, should be listed. For example, a function that is used to split a string on a delimiter might specify 2 params: [string, delimiter].

  • name

    Optional - specify the name for the table function. If not provided, name will be taken from the class name.

  • initialize(\*parameter_values*)

    Parameters:parameter_values – Parameters the function was called with.
    Returns:No return value.

    The initialize method is called to initialize the table function with the parameters the user specified when calling the function.

  • iterate(idx)

    Parameters:idx (int) – current iteration step
    Returns:A tuple of row data corresponding to the columns named in the columns attribute.
    Raises:StopIteration – To signal that no more rows are available.

    This function is called repeatedly and returns successive rows of data. The function may terminate before all rows are consumed (especially if the user specified a LIMIT on the results). Alternatively, the function can signal that no more data is available by raising a StopIteration exception.

  • classmethod register(conn)

    Parameters:conn – A sqlite3.Connection object.

    Register the table function with a DB-API 2.0 sqlite3.Connection object. Table-valued functions must be registered before they can be used in a query.

    Example:

    1. class MyTableFunction(TableFunction):
    2. name = 'my_func'
    3. # ... other attributes and methods ...
    4. db = SqliteDatabase(':memory:')
    5. db.connect()
    6. MyTableFunction.register(db.connection())

    To ensure the TableFunction is registered every time a connection is opened, use the table_function() decorator.

ClosureTable(model_class[, foreign_key=None[, referencing_class=None[, referencing_key=None]]])

Parameters:
  • model_class – The model class containing the nodes in the tree.
  • foreign_key – The self-referential parent-node field on the model class. If not provided, peewee will introspect the model to find a suitable key.
  • referencing_class – Intermediate table for a many-to-many relationship.
  • referencing_key – For a many-to-many relationship, the originating side of the relation.
Returns:

Returns a VirtualModel for working with a closure table.

Factory function for creating a model class suitable for working with a transitive closure table. Closure tables are VirtualModel subclasses that work with the transitive closure SQLite extension. These special tables are designed to make it easy to efficiently query heirarchical data. The SQLite extension manages an AVL tree behind-the-scenes, transparently updating the tree when your table changes and making it easy to perform common queries on heirarchical data.

To use the closure table extension in your project, you need:

  1. A copy of the SQLite extension. The source code can be found in the SQLite code repository or by cloning this gist:

    1. $ git clone https://gist.github.com/coleifer/7f3593c5c2a645913b92 closure
    2. $ cd closure/
  2. Compile the extension as a shared library, e.g.

    1. $ gcc -g -fPIC -shared closure.c -o closure.so
  3. Create a model for your hierarchical data. The only requirement here is that the model has an integer primary key and a self-referential foreign key. Any additional fields are fine.

    1. class Category(Model):
    2. name = CharField()
    3. metadata = TextField()
    4. parent = ForeignKeyField('self', index=True, null=True) # Required.
    5. # Generate a model for the closure virtual table.
    6. CategoryClosure = ClosureTable(Category)

    The self-referentiality can also be achieved via an intermediate table (for a many-to-many relation).

    1. class User(Model):
    2. name = CharField()
    3. class UserRelations(Model):
    4. user = ForeignKeyField(User)
    5. knows = ForeignKeyField(User, backref='_known_by')
    6. class Meta:
    7. primary_key = CompositeKey('user', 'knows') # Alternatively, a unique index on both columns.
    8. # Generate a model for the closure virtual table, specifying the UserRelations as the referencing table
    9. UserClosure = ClosureTable(
    10. User,
    11. referencing_class=UserRelations,
    12. foreign_key=UserRelations.knows,
    13. referencing_key=UserRelations.user)
  4. In your application code, make sure you load the extension when you instantiate your Database object. This is done by passing the path to the shared library to the load_extension() method.

    1. db = SqliteExtDatabase('my_database.db')
    2. db.load_extension('/path/to/closure')

Warning

There are two caveats you should be aware of when using the transitive_closure extension. First, it requires that your source model have an integer primary key. Second, it is strongly recommended that you create an index on the self-referential foreign key.

Example:

  1. class Category(Model):
  2. name = CharField()
  3. metadata = TextField()
  4. parent = ForeignKeyField('self', index=True, null=True) # Required.
  5. # Generate a model for the closure virtual table.
  6. CategoryClosure = ClosureTable(Category)
  7. # Create the tables if they do not exist.
  8. db.create_tables([Category, CategoryClosure], True)

It is now possible to perform interesting queries using the data from the closure table:

  1. # Get all ancestors for a particular node.
  2. laptops = Category.get(Category.name == 'Laptops')
  3. for parent in Closure.ancestors(laptops):
  4. print parent.name
  5. # Computer Hardware
  6. # Computers
  7. # Electronics
  8. # All products
  9. # Get all descendants for a particular node.
  10. hardware = Category.get(Category.name == 'Computer Hardware')
  11. for node in Closure.descendants(hardware):
  12. print node.name
  13. # Laptops
  14. # Desktops
  15. # Hard-drives
  16. # Monitors
  17. # LCD Monitors
  18. # LED Monitors

API of the VirtualModel returned by ClosureTable().

  • class BaseClosureTable

    • id

      A field for the primary key of the given node.

    • depth

      A field representing the relative depth of the given node.

    • root

      A field representing the relative root node.

    • descendants(node[, depth=None[, include_node=False]])

      Retrieve all descendants of the given node. If a depth is specified, only nodes at that depth (relative to the given node) will be returned.

      1. node = Category.get(Category.name == 'Electronics')
      2. # Direct child categories.
      3. children = CategoryClosure.descendants(node, depth=1)
      4. # Grand-child categories.
      5. children = CategoryClosure.descendants(node, depth=2)
      6. # Descendants at all depths.
      7. all_descendants = CategoryClosure.descendants(node)
    • ancestors(node[, depth=None[, include_node=False]])

      Retrieve all ancestors of the given node. If a depth is specified, only nodes at that depth (relative to the given node) will be returned.

      1. node = Category.get(Category.name == 'Laptops')
      2. # All ancestors.
      3. all_ancestors = CategoryClosure.ancestors(node)
      4. # Grand-parent category.
      5. grandparent = CategoryClosure.ancestores(node, depth=2)
    • siblings(node[, include_node=False])

      Retrieve all nodes that are children of the specified node’s parent.

Note

For an in-depth discussion of the SQLite transitive closure extension, check out this blog post, Querying Tree Structures in SQLite using Python and the Transitive Closure Extension.

class LSMTable

VirtualModel subclass suitable for working with the lsm1 extension The lsm1 extension is a virtual table that provides a SQL interface to the lsm key/value storage engine from SQLite4.

Note

The LSM1 extension has not been released yet (SQLite version 3.22 at time of writing), so consider this feature experimental with potential to change in subsequent releases.

LSM tables define one primary key column and an arbitrary number of additional value columns (which are serialized and stored in a single value field in the storage engine). The primary key must be all of the same type and use one of the following field types:

Since the LSM storage engine is a key/value store, primary keys (including integers) must be specified by the application.

Attention

Secondary indexes are not supported by the LSM engine, so the only efficient queries will be lookups (or range queries) on the primary key. Other fields can be queried and filtered on, but may result in a full table-scan.

Example model declaration:

  1. db = SqliteExtDatabase('my_app.db')
  2. db.load_extension('lsm.so') # Load shared library.
  3. class EventLog(LSMTable):
  4. timestamp = IntegerField(primary_key=True)
  5. action = TextField()
  6. sender = TextField()
  7. target = TextField()
  8. class Meta:
  9. database = db
  10. filename = 'eventlog.ldb' # LSM data is stored in separate db.
  11. # Declare virtual table.
  12. EventLog.create_table()

Example queries:

  1. # Use dictionary operators to get, set and delete rows from the LSM
  2. # table. Slices may be passed to represent a range of key values.
  3. def get_timestamp():
  4. # Return time as integer expressing time in microseconds.
  5. return int(time.time() * 1000000)
  6. # Create a new row, at current timestamp.
  7. ts = get_timestamp()
  8. EventLog[ts] = ('pageview', 'search', '/blog/some-post/')
  9. # Retreive row from event log.
  10. log = EventLog[ts]
  11. print(log.action, log.sender, log.target)
  12. # Prints ("pageview", "search", "/blog/some-post/")
  13. # Delete the row.
  14. del EventLog[ts]
  15. # We can also use the "create()" method.
  16. EventLog.create(
  17. timestamp=get_timestamp(),
  18. action='signup',
  19. sender='newsletter',
  20. target='sqlite-news')

Simple key/value model declaration:

  1. class KV(LSMTable):
  2. key = TextField(primary_key=True)
  3. value = TextField()
  4. class Meta:
  5. database = db
  6. filename = 'kv.ldb'
  7. db.create_tables([KV])

For tables consisting of a single value field, Peewee will return the value directly when getting a single item. You can also request slices of rows, in which case Peewee returns a corresponding Select query, which can be iterated over. Below are some examples:

  1. >>> KV['k0'] = 'v0'
  2. >>> print(KV['k0'])
  3. 'v0'
  4. >>> data = [{'key': 'k%d' % i, 'value': 'v%d' % i} for i in range(20)]
  5. >>> KV.insert_many(data).execute()
  6. >>> KV.select().count()
  7. 20
  8. >>> KV['k8']
  9. 'v8'
  10. >>> list(KV['k4.1':'k7.x']
  11. [Row(key='k5', value='v5'),
  12. Row(key='k6', value='v6'),
  13. Row(key='k7', value='v7')]
  14. >>> list(KV['k6xxx':])
  15. [Row(key='k7', value='v7'),
  16. Row(key='k8', value='v8'),
  17. Row(key='k9', value='v9')]

You can also index the LSMTable using expressions:

  1. >>> list(KV[KV.key > 'k6'])
  2. [Row(key='k7', value='v7'),
  3. Row(key='k8', value='v8'),
  4. Row(key='k9', value='v9')]
  5. >>> list(KV[(KV.key > 'k6') & (KV.value != 'v8')])
  6. [Row(key='k7', value='v7'),
  7. Row(key='k9', value='v9')]

You can delete single rows using del or multiple rows using slices or expressions:

  1. >>> del KV['k1']
  2. >>> del KV['k3x':'k8']
  3. >>> del KV[KV.key.between('k10', 'k18')]
  4. >>> list(KV[:])
  5. [Row(key='k0', value='v0'),
  6. Row(key='k19', value='v19'),
  7. Row(key='k2', value='v2'),
  8. Row(key='k3', value='v3'),
  9. Row(key='k9', value='v9')]

Attempting to get a single non-existant key will result in a KeyError, but slices will not raise an exception:

  1. >>> KV['k1']
  2. ...
  3. KeyError: 'k1'
  4. >>> list(KV['k1':'k1'])
  5. []

class ZeroBlob(length)

Parameters:length (int) – Size of blob in bytes.

ZeroBlob is used solely to reserve space for storing a BLOB that supports incremental I/O. To use the SQLite BLOB-store it is necessary to first insert a ZeroBlob of the desired size into the row you wish to use with incremental I/O.

For example, see Blob.

class Blob(database, table, column, rowid[, read_only=False])

Parameters:
  • databaseSqliteExtDatabase instance.
  • table (str) – Name of table being accessed.
  • column (str) – Name of column being accessed.
  • rowid (int) – Primary-key of row being accessed.
  • read_only (bool) – Prevent any modifications to the blob data.

Open a blob, stored in the given table/column/row, for incremental I/O. To allocate storage for new data, you can use the ZeroBlob, which is very efficient.

  1. class RawData(Model):
  2. data = BlobField()
  3. # Allocate 100MB of space for writing a large file incrementally:
  4. query = RawData.insert({'data': ZeroBlob(1024 * 1024 * 100)})
  5. rowid = query.execute()
  6. # Now we can open the row for incremental I/O:
  7. blob = Blob(db, 'rawdata', 'data', rowid)
  8. # Read from the file and write to the blob in chunks of 4096 bytes.
  9. while True:
  10. data = file_handle.read(4096)
  11. if not data:
  12. break
  13. blob.write(data)
  14. bytes_written = blob.tell()
  15. blob.close()
  • read([n=None])

    Parameters:n (int) – Only read up to n bytes from current position in file.

    Read up to n bytes from the current position in the blob file. If n is not specified, the entire blob will be read.

  • seek(offset[, whence=0])

    Parameters:
    • offset (int) – Seek to the given offset in the file.
    • whence (int) – Seek relative to the specified frame of reference.

    Values for whence:

    • 0: beginning of file
    • 1: current position
    • 2: end of file
  • tell()

    Return current offset within the file.

  • write(data)

    Parameters:data (bytes) – Data to be written

    Writes the given data, starting at the current position in the file.

  • close()

    Close the file and free associated resources.

  • reopen(rowid)

    Parameters:rowid (int) – Primary key of row to open.

    If a blob has already been opened for a given table/column, you can use the reopen() method to re-use the same Blob object for accessing multiple rows in the table.

Additional Features

The SqliteExtDatabase accepts an initialization option to register support for a simple bloom filter. The bloom filter, once initialized, can then be used for efficient membership queries on large set of data.

Here’s an example:

  1. db = CSqliteExtDatabase(':memory:', bloomfilter=True)
  2. # Create and define a table to store some data.
  3. db.execute_sql('CREATE TABLE "register" ("data" TEXT)')
  4. Register = Table('register', ('data',)).bind(db)
  5. # Populate the database with a bunch of text.
  6. with db.atomic():
  7. for i in 'abcdefghijklmnopqrstuvwxyz':
  8. keys = [i * j for j in range(1, 10)] # a, aa, aaa, ... aaaaaaaaa
  9. Register.insert([{'data': key} for key in keys]).execute()
  10. # Collect data into a 16KB bloomfilter.
  11. query = Register.select(fn.bloomfilter(Register.data, 16 * 1024).alias('buf'))
  12. row = query.get()
  13. buf = row['buf']
  14. # Use bloomfilter buf to test whether other keys are members.
  15. test_keys = (
  16. ('aaaa', True),
  17. ('abc', False),
  18. ('zzzzzzz', True),
  19. ('zyxwvut', False))
  20. for key, is_present in test_keys:
  21. query = Register.select(fn.bloomfilter_contains(key, buf).alias('is_member'))
  22. answer = query.get()['is_member']
  23. assert answer == is_present

The SqliteExtDatabase can also register other useful functions:

  • rank_functions (enabled by default): registers functions for ranking search results, such as bm25 and lucene.
  • hash_functions: registers md5, sha1, sha256, adler32, crc32 and murmurhash functions.
  • regexp_function: registers a regexp function.

Examples:

  1. def create_new_user(username, password):
  2. # DO NOT DO THIS IN REAL LIFE. PLEASE.
  3. query = User.insert({'username': username, 'password': fn.sha1(password)})
  4. new_user_id = query.execute()

You can use the murmurhash function to hash bytes to an integer for compact storage:

  1. >>> db = SqliteExtDatabase(':memory:', hash_functions=True)
  2. >>> db.execute_sql('SELECT murmurhash(?)', ('abcdefg',)).fetchone()
  3. (4188131059,)