06 The database abstraction layer - Advanced features - 《web2py Reference Manual, 6th Edition (pre-release)》

Advanced features

Advanced features

Table inheritance

It is possible to create a table that contains all the fields from another table. It is sufficient to pass the other table in place of a field to define_table. For example

>>> db.define_table('person', Field('name'), Field('gender'))
<Table person (id, name, gender)>
>>> db.define_table('doctor', db.person, Field('specialization'))
<Table doctor (id, name, gender, specialization)>

It is also possible to define a dummy table that is not stored in a database in order to reuse it in multiple other places. For example:

signature = db.Table(db, 'signature',
                     Field('is_active', 'boolean', default=True),
                     Field('created_on', 'datetime', default=request.now),
                     Field('created_by', db.auth_user, default=auth.user_id),
                     Field('modified_on', 'datetime', update=request.now),
                     Field('modified_by', db.auth_user, update=auth.user_id))
db.define_table('payment', Field('amount', 'double'), signature)

This example assumes that standard web2py authentication is enabled.

Notice that if you use Auth web2py already creates one such table for you:

auth = Auth(db)
db.define_table('payment', Field('amount', 'double'), auth.signature)

When using table inheritance, if you want the inheriting table to inherit validators, be sure to define the validators of the parent table before defining the inheriting table.

`filter_in` and `filter_out`

It is possible to define a filter for each field to be called before a value is inserted into the database for that field and after a value is retrieved from the database.

Imagine for example that you want to store a serializable Python data structure in a field in the json format. Here is how it could be accomplished:

>>> import json
>>> db.define_table('anyobj',
...                 Field('name'),
...                 Field('data', 'text'))
<Table anyobj (id, name, data)>
>>> db.anyobj.data.filter_in = lambda obj: json.dumps(obj)
>>> db.anyobj.data.filter_out = lambda txt: json.loads(txt)
>>> myobj = ['hello', 'world', 1, {2: 3}]
>>> aid = db.anyobj.insert(name='myobjname', data=myobj)
>>> row = db.anyobj[aid]
>>> row.data
['hello', 'world', 1, {'2': 3}]

Another way to accomplish the same is by using a Field of type SQLCustomType, as discussed in next Custom Field types section.

callbacks on record insert, delete and update

Web2py provides a mechanism to register callbacks to be called before and/or after insert, update and delete of records.

Each table stores six lists of callbacks:

db.mytable._before_insert
db.mytable._after_insert
db.mytable._before_update
db.mytable._after_update
db.mytable._before_delete
db.mytable._after_delete

You can register a callback function by appending it to the corresponding list. The caveat is that depending on the functionality, the callback has different signature.

This is best explained via some examples.

>>> db.define_table('person', Field('name'))
<Table person (id, name)>
>>> def pprint(callback, *args):
...     print "%s%s" % (callback, args)
... 
>>> db.person._before_insert.append(lambda f: pprint('before_insert', f))
>>> db.person._after_insert.append(lambda f, i: pprint('after_insert', f, i))
>>> db.person.insert(name='John')
before_insert(<OpRow {'name': 'John'}>,)
after_insert(<OpRow {'name': 'John'}>, 1L)
1L
>>> db.person._before_update.append(lambda s, f: pprint('before_update', s, f))
>>> db.person._after_update.append(lambda s, f: pprint('after_update', s, f))
>>> db(db.person.id == 1).update(name='Tim')
before_update(<Set ("person"."id" = 1)>, <OpRow {'name': 'Tim'}>)
after_update(<Set ("person"."id" = 1)>, <OpRow {'name': 'Tim'}>)
1
>>> db.person._before_delete.append(lambda s: pprint('before_delete', s))
>>> db.person._after_delete.append(lambda s: pprint('after_delete', s))
>>> db(db.person.id == 1).delete()
before_delete(<Set ("person"."id" = 1)>,)
after_delete(<Set ("person"."id" = 1)>,)
1

As you can see:

f gets passed the OpRow object with data for insert or update.
i gets passed the id of the newly inserted record.
s gets passed the Set object used for update or delete.

OpRow is an helper object specialized in storing (field, value) pairs, you can think of it as a normal dictionary that you can use even with the syntax of attribute notation (that is f.name and f['name'] are equivalent).

The return values of these callback should be None or False. If any of the _before_* callback returns a True value it will abort the actual insert/update/delete operation.

Some times a callback may need to perform an update in the same or a different table and one wants to avoid firing other callbacks, which could cause an infinite loop.

For this purpose there the Set objects have an update_naive method that works like update but ignores before and after callbacks.

Database cascades

Database schema can define relationships which trigger deletions of related records, known as cascading. The DAL is not informed when a record is deleted due to a cascade. So no *_delete callaback will ever be called as conseguence of a cascade-deletion.

Record versioning

It is possible to ask web2py to save every copy of a record when the record is individually modified. There are different ways to do it and it can be done for all tables at once using the syntax:

auth.enable_record_versioning(db)

this requires Auth and it is discussed in Chapter 9. It can also be done for each individual table as discussed below.

Consider the following table:

db.define_table('stored_item',
                Field('name'),
                Field('quantity', 'integer'),
                Field('is_active', 'boolean',
                      writable=False, readable=False, default=True))

Notice the hidden boolean field called is_active and defaulting to True.

We can tell web2py to create a new table (in the same or a different database) and store all previous versions of each record in the table, when modified.

This is done in the following way:

db.stored_item._enable_record_versioning()

or in a more verbose syntax:

db.stored_item._enable_record_versioning(archive_db=db,
                                         archive_name='stored_item_archive',
                                         current_record='current_record',
                                         is_active='is_active')

The archive_db=db tells web2py to store the archive table in the same database as the stored_item table. The archive_name sets the name for the archive table. The archive table has the same fields as the original table stored_item except that unique fields are no longer unique (because it needs to store multiple versions) and has an extra field which name is specified by current_record and which is a reference to the current record in the stored_item table.

When records are deleted, they are not really deleted. A deleted record is copied in the stored_item_archive table (like when it is modified) and the is_active field is set to False. By enabling record versioning web2py sets a common_filter on this table that hides all records in table stored_item where the is_active field is set to False. The is_active parameter in the _enable_record_versioning method allows to specify the name of the field used by the common_filter to determine if the field was deleted or not.

common_filters will be discussed in next Common filters section..

Common fields and multi-tenancy

db._common_fields is a list of fields that should belong to all the tables. This list can also contain tables and it is understood as all fields from the table.

For example occasionally you find yourself in need to add a signature to all your tables but the Auth tables. In this case, after you auth.define_tables() but before defining any other table, insert:

db._common_fields.append(auth.signature)

One field is special: request_tenant, you can set a different name in db._request_tenant. This field does not exist but you can create it and add it to any of your tables (or all of them):

db._common_fields.append(Field('request_tenant',
                               default=request.env.http_host,
                               writable=False))

For every table with such a field, all records for all queries are always automatically filtered by:

db.table.request_tenant == db.table.request_tenant.default

and for every record inserted, this field is set to the default value. In the example above we have chosen:

default = request.env.http_host

this means we have chosen to ask our app to filter all tables in all queries with:

db.table.request_tenant == request.env.http_host

This simple trick allow us to turn any application into a multi-tenant application. Even though we run one instance of the application and we use one single database, when the application is accessed under two or more domains the visitors will see different data depending on the domain (in the example the domain name is retrieved from request.env.http_host).

You can turn off multi tenancy filters using ignore_common_filters=True at Set creation time:

db(query, ignore_common_filters=True)

Common filters

A common filter is a generalization of the above multi-tenancy idea. It provides an easy way to prevent repeating of the same query. Consider for example the following table:

db.define_table('blog_post',
                Field('subject'),
                Field('post_text', 'text'),
                Field('is_public', 'boolean'),
                common_filter = lambda query: db.blog_post.is_public == True)

Any select, delete or update in this table, will include only public blog posts. The attribute can also be modified at runtime:

db.blog_post._common_filter = lambda query: ...

It serves both as a way to avoid repeating the “db.blog_post.is_public==True” phrase in each blog post search, and also as a security enhancement, that prevents you from forgetting to disallow viewing of non-public posts.

In case you actually do want items left out by the common filter (for example, allowing the admin to see non-public posts), you can either remove the filter:

db.blog_post._common_filter = None

or ignore it:

db(query, ignore_common_filters=True)

Note that common_filters are ignored by the appadmin interface.

Custom `Field` types

Aside for using filter_in and filter_out, it is possible to define new/custom field types. For example, suppose that you want to define a custom type to store an IP address:

>>> def ip2int(sv):
...     "Convert an IPV4 to an integer."
...     sp = sv.split('.'); assert len(sp) == 4 # IPV4 only
...     iip = 0
...     for i in map(int, sp): iip = (iip<<8) + i
...     return iip
... 
>>> def int2ip(iv):
...     "Convert an integer to an IPV4."
...     assert iv > 0
...     iv = (iv,); ov = []
...     for i in range(3):
...         iv = divmod(iv[0], 256)
...         ov.insert(0, iv[1])
...     ov.insert(0, iv[0])
...     return '.'.join(map(str, ov))
... 
>>> from gluon.dal import SQLCustomType
>>> ipv4 = SQLCustomType(type='string', native='integer',
...                      encoder=lambda x : str(ip2int(x)), decoder=int2ip)
>>> db.define_table('website',
...                 Field('name'),
...                 Field('ipaddr', type=ipv4))
<Table website (id, name, ipaddr)>
>>> db.website.insert(name='wikipedia', ipaddr='91.198.174.192')
1
>>> db.website.insert(name='google', ipaddr='172.217.11.174')
2
>>> db.website.insert(name='youtube', ipaddr='74.125.65.91')
3
>>> db.website.insert(name='github', ipaddr='207.97.227.239')
4
>>> rows = db(db.website.ipaddr > '100.0.0.0').select(orderby=~db.website.ipaddr)
>>> for row in rows:
...     print row.name, row.ipaddr
... 
github 207.97.227.239
google 172.217.11.174

SQLCustomType is a field type factory. Its type argument must be one of the standard web2py types. It tells web2py how to treat the field values at the web2py level. native is the type of the field as far as the database is concerned. Allowed names depend on the database engine. encoder is an optional transformation function applied when the data is stored and decoder is the optional reverse transformation function.

This feature is marked as experimental. In practice it has been in web2py for a long time and it works but it can make the code not portable, for example when the native type is database specific.

It does not work on Google App Engine NoSQL.

Using DAL without define tables

The DAL can be used from any Python program simply by doing this:

from gluon import DAL
db = DAL('sqlite://storage.sqlite', folder='path/to/app/databases')

i.e. import the DAL, connect and specify the folder which contains the .table files (the app/databases folder).

To access the data and its attributes we still have to define all the tables we are going to access with db.define_table.

If we just need access to the data but not to the web2py table attributes, we get away without re-defining the tables but simply asking web2py to read the necessary info from the metadata in the .table files:

from gluon import DAL
db = DAL('sqlite://storage.sqlite', folder='path/to/app/databases', auto_import=True)

This allows us to access any db.table without need to re-define it.

PostGIS, SpatiaLite, and MS Geo (experimental)

The DAL supports geographical APIs using PostGIS (for PostgreSQL), SpatiaLite (for SQLite), and MSSQL and Spatial Extensions. This is a feature that was sponsored by the Sahana project and implemented by Denes Lengyel.

DAL provides geometry and geography fields types and the following functions:

st_asgeojson (PostGIS only)
st_astext
st_contains
st_distance
st_equals
st_intersects
st_overlaps
st_simplify (PostGIS only)
st_touches
st_within
st_x
st_y

Here are some examples:

>>> from gluon.dal import DAL, Field, geoPoint, geoLine, geoPolygon
>>> db = DAL("mssql://user:pass@host:db")
>>> sp = db.define_table('spatial', Field('loc', 'geometry()'))

Below we insert a point, a line, and a polygon:

>>> sp.insert(loc=geoPoint(1, 1))
1
>>> sp.insert(loc=geoLine((100, 100), (20, 180), (180, 180)))
2
>>> sp.insert(loc=geoPolygon((0, 0), (150, 0), (150, 150), (0, 150), (0, 0)))
3

Notice that

rows = db(sp).select()

Always returns the geometry data serialized as text. You can also do the same more explicitly using st_astext():

>>> print db(sp).select(sp.id, sp.loc.st_astext())
spatial.id,spatial.loc.STAsText()
1,"POINT (1 2)"
2,"LINESTRING (100 100, 20 180, 180 180)"
3,"POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0))"

You can ask for the native representation by using st_asgeojson() (in PostGIS only):

>>> print db(sp).select(sp.id, sp.loc.st_asgeojson().with_alias('loc'))
spatial.id,loc
1,[1, 2]
2,[[100, 100], [20 180], [180, 180]]
3,[[[0, 0], [150, 0], [150, 150], [0, 150], [0, 0]]]

(notice an array is a point, an array of arrays is a line, and an array of array of arrays is a polygon).

Here are example of how to use geographical functions:

>>> query = sp.loc.st_intersects(geoLine((20, 120), (60, 160)))
>>> query = sp.loc.st_overlaps(geoPolygon((1, 1), (11, 1), (11, 11), (11, 1), (1, 1)))
>>> query = sp.loc.st_contains(geoPoint(1, 1))
>>> print db(query).select(sp.id, sp.loc)
spatial.id,spatial.loc
3,"POLYGON ((0 0, 150 0, 150 150, 0 150, 0 0))"

Computed distances can also be retrieved as floating point numbers:

>>> dist = sp.loc.st_distance(geoPoint(-1,2)).with_alias('dist')
>>> print db(sp).select(sp.id, dist)
spatial.id,dist
1,2.0
2,140.714249456
3,1.0

Copy data from one db into another

Consider the situation in which you have been using the following database:

db = DAL('sqlite://storage.sqlite')

and you wish to move to another database using a different connection string:

db = DAL('postgres://username:password@localhost/mydb')

Before you switch, you want to move the data and rebuild all the metadata for the new database. We assume the new database to exist but we also assume it is empty.

Web2py provides a script that does this work for you:

cd web2py
python scripts/cpdb.py \
   -f applications/app/databases \
   -y 'sqlite://storage.sqlite' \
   -Y 'postgres://username:password@localhost/mydb' \
   -d ../gluon

After running the script you can simply switch the connection string in the model and everything should work out of the box. The new data should be there.

This script provides various command line options that allows you to move data from one application to another, move all tables or only some tables, clear the data in the tables. For more info try:

python scripts/cpdb.py -h

Note on new DAL and adapters

The source code of the Database Abstraction Layer was completely rewritten in 2010. While it stays backward compatible, the rewrite made it more modular and easier to extend. Here we explain the main logic.

The file “gluon/dal.py” defines, among other, the following classes.

ConnectionPool
BaseAdapter extends ConnectionPool
Row
DAL
Reference
Table
Expression
Field
Query
Set
Rows

Their use has been explained in the previous sections, except for BaseAdapter. When the methods of a Table or Set object need to communicate with the database they delegate to methods of the adapter the task to generate the SQL and or the function call.

For example:

db.mytable.insert(myfield='myvalue')

calls

Table.insert(myfield='myvalue')

which delegates the adapter by returning:

db._adapter.insert(db.mytable, db.mytable._listify(dict(myfield='myvalue')))

Here db.mytable._listify converts the dict of arguments into a list of (field,value) and calls the insert method of the adapter. db._adapter does more or less the following:

query = db._adapter._insert(db.mytable, list_of_fields)
db._adapter.execute(query)

where the first line builds the query and the second executes it.

BaseAdapter defines the interface for all adapters.

“gluon/dal.py” at the moment of writing this book, contains the following adapters:

SQLiteAdapter extends BaseAdapter
JDBCSQLiteAdapter extends SQLiteAdapter
MySQLAdapter extends BaseAdapter
PostgreSQLAdapter extends BaseAdapter
JDBCPostgreSQLAdapter extends PostgreSQLAdapter
OracleAdapter extends BaseAdapter
MSSQLAdapter extends BaseAdapter
MSSQL2Adapter extends MSSQLAdapter
MSSQL3Adapter extends MSSQLAdapter
MSSQL4Adapter extends MSSQLAdapter
FireBirdAdapter extends BaseAdapter
FireBirdEmbeddedAdapter extends FireBirdAdapter
InformixAdapter extends BaseAdapter
DB2Adapter extends BaseAdapter
IngresAdapter extends BaseAdapter
IngresUnicodeAdapter extends IngresAdapter
GoogleSQLAdapter extends MySQLAdapter
NoSQLAdapter extends BaseAdapter
GoogleDatastoreAdapter extends NoSQLAdapter
CubridAdapter extends MySQLAdapter (experimental)
TeradataAdapter extends DB2Adapter (experimental)
SAPDBAdapter extends BaseAdapter (experimental)
CouchDBAdapter extends NoSQLAdapter (experimental)
IMAPAdapter extends NoSQLAdapter (experimental)
MongoDBAdapter extends NoSQLAdapter (experimental)
VerticaAdapter extends MSSQLAdapter (experimental)
SybaseAdapter extends MSSQLAdapter (experimental)

which override the behavior of the BaseAdapter.

Each adapter has more or less this structure:

class MySQLAdapter(BaseAdapter):
    # specify a driver to use
    driver = globals().get('pymysql', None)
    # map web2py types into database types
    types = {
        'boolean': 'CHAR(1)',
        'string': 'VARCHAR(%(length)s)',
        'text': 'LONGTEXT',
        ...
        }
    # connect to the database using driver
    def __init__(self, db, uri, pool_size=0, folder=None, db_codec ='UTF-8',
                 credential_decoder=lambda x:x, driver_args={},
                 adapter_args={}):
        # parse uri string and store parameters in driver_args
        ...
        # define a connection function
        def connect(driver_args=driver_args):
            return self.driver.connect(**driver_args)
        # place it in the pool
        self.pool_connection(connect)
        # set optional parameters (after connection)
        self.execute('SET FOREIGN_KEY_CHECKS=1;')
        self.execute("SET sql_mode='NO_BACKSLASH_ESCAPES';")
   # override BaseAdapter methods as needed
   def lastrowid(self, table):
        self.execute('select last_insert_id();')
        return int(self.cursor.fetchone()[0])

Looking at the various adapters as example should be easy to write new ones.

When db instance is created:

db = DAL('mysql://...')

the prefix in the uri string defines the adapter. The mapping is defined in the following dictionary also in “gluon/dal.py”:

ADAPTERS = {
    'sqlite': SQLiteAdapter,
    'spatialite': SpatiaLiteAdapter,
    'sqlite:memory': SQLiteAdapter,
    'spatialite:memory': SpatiaLiteAdapter,
    'mysql': MySQLAdapter,
    'postgres': PostgreSQLAdapter,
    'postgres:psycopg2': PostgreSQLAdapter,
    'postgres2:psycopg2': NewPostgreSQLAdapter,
    'oracle': OracleAdapter,
    'mssql': MSSQLAdapter,
    'mssql2': MSSQL2Adapter,
    'mssql3': MSSQL3Adapter,
    'mssql4' : MSSQL4Adapter,
    'vertica': VerticaAdapter,
    'sybase': SybaseAdapter,
    'db2': DB2Adapter,
    'teradata': TeradataAdapter,
    'informix': InformixAdapter,
    'informix-se': InformixSEAdapter,
    'firebird': FireBirdAdapter,
    'firebird_embedded': FireBirdAdapter,
    'ingres': IngresAdapter,
    'ingresu': IngresUnicodeAdapter,
    'sapdb': SAPDBAdapter,
    'cubrid': CubridAdapter,
    'jdbc:sqlite': JDBCSQLiteAdapter,
    'jdbc:sqlite:memory': JDBCSQLiteAdapter,
    'jdbc:postgres': JDBCPostgreSQLAdapter,
    'gae': GoogleDatastoreAdapter, # discouraged, for backward compatibility
    'google:datastore': GoogleDatastoreAdapter,
    'google:datastore+ndb': GoogleDatastoreAdapter,
    'google:sql': GoogleSQLAdapter,
    'couchdb': CouchDBAdapter,
    'mongodb': MongoDBAdapter,
    'imap': IMAPAdapter
}

the uri string is then parsed in more detail by the adapter itself.

For any adapter you can replace the driver with a different one:

import MySQLdb as mysqldb
from gluon.dal import MySQLAdapter
MySQLAdapter.driver = mysqldb

i.e. mysqldb has to be that module with a .connect() method. You can specify optional driver arguments and adapter arguments:

db =DAL(..., driver_args={}, adapter_args={})

Advanced features

Advanced features

Table inheritance

filter_in and filter_out

callbacks on record insert, delete and update

Database cascades

Record versioning

Common fields and multi-tenancy

Common filters

Custom Field types

Using DAL without define tables

PostGIS, SpatiaLite, and MS Geo (experimental)

Copy data from one db into another

Note on new DAL and adapters

`filter_in` and `filter_out`

Custom `Field` types