06 The database abstraction layer - select - 《web2py Reference Manual, 6th Edition (pre-release)》

select

`select`

Given a Set, s, you can fetch the records with the command select:

>>> rows = s.select()

It returns an iterable object of class pydal.objects.Rows whose elements are Row objects. pydal.objects.Row objects act like dictionaries, but their elements can also be accessed as attributes, like gluon.storage.Storage.The former differ from the latter because its values are read-only.

The Rows object allows looping over the result of the select and printing the selected field values for each row:

>>> for row in rows:
...     print row.id, row.name
... 
1 Alex

You can do all the steps in one statement:

>>> for row in db(db.person.name == 'Alex').select():
...     print row.name
... 
Alex

The select command can take arguments. All unnamed arguments are interpreted as the names of the fields that you want to fetch. For example, you can be explicit on fetching field “id” and field “name”:

>>> for row in db().select(db.person.id, db.person.name):
...     print row.name
... 
Alex
Bob
Carl

The table attribute ALL allows you to specify all fields:

>>> for row in db().select(db.person.ALL):
...     print row.id, row.name
... 
1 Alex
2 Bob
3 Carl

Notice that there is no query string passed to db. web2py understands that if you want all fields of the table person without additional information then you want all records of the table person.

An equivalent alternative syntax is the following:

>>> for row in db(db.person).select():
...     print row.id, row.name
... 
1 Alex
2 Bob
3 Carl

and web2py understands that if you ask for all records of the table person without additional information, then you want all the fields of table person.

Given one row

>>> row = rows[0]

you can extract its values using multiple equivalent expressions:

>>> row.name
Alex
>>> row['name']
Alex
>>> row('person.name')
Alex

The latter syntax is particularly handy when selecting en expression instead of a column. We will show this later.

You can also do

rows.compact = False

to disable the notation

rows[i].name

and enable, instead, the less compact notation:

rows[i].person.name

Yes this is unusual and rarely needed.

Row objects also have two important methods:

row.delete_record()

and

row.update_record(name="new value")

Using an iterator-based select for lower memory use

Python “iterators” are a type of “lazy-evaluation”. They ‘feed’ data one step at time; traditional Python loops create the entire set of data in memory before looping.

The traditional use of select is:

for row in db(db.table).select():
    ...

but for large numbers of rows, using an iterator-based alternative has dramatically lower memory use:

for row in db(db.table).iterselect():
    ...

Testing shows this is around 10% faster as well, even on machines with large RAM.

Rendering rows using represent

You may wish to rewrite rows returned by select to take advantage of formatting information contained in the represents setting of the fields.

rows = db(query).select()  
repr_row = rows.render(0)

If you don’t specify an index, you get a generator to iterate over all the rows:

for row in rows.render():
    print row.myfield

Can also be applied to slices:

for row in rows[0:10].render():
    print row.myfield

If you only want to transform selected fields via their “represent” attribute, you can list them in the “fields” argument:

repr_row = row.render(0, fields=[db.mytable.myfield])

Note, it returns a transformed copy of the original Row, so there’s no update_record (which you wouldn’t want anyway) or delete_record.

Shortcuts

The DAL supports various code-simplifying shortcuts. In particular:

myrecord = db.mytable[id]

returns the record with the given id if it exists. If the id does not exist, it returns None. The above statement is equivalent to

myrecord = db(db.mytable.id == id).select().first()

You can delete records by id:

del db.mytable[id]

and this is equivalent to

db(db.mytable.id == id).delete()

and deletes the record with the given id, if it exists.

Note: this delete shortcut syntax does not currently work if versioning is activated

You can insert records:

db.mytable[None] = dict(myfield='somevalue')

It is equivalent to

db.mytable.insert(myfield='somevalue')

and it creates a new record with field values specified by the dictionary on the right hand side.

Note: insert shortcut was previously db.table[0] = .... It has changed in PyDAL 19.02 to permit normal usage of id 0.

You can update records:

db.mytable[id] = dict(myfield='somevalue')

which is equivalent to

db(db.mytable.id == id).update(myfield='somevalue')

and it updates an existing record with field values specified by the dictionary on the right hand side.

Fetching a `Row`

Yet another convenient syntax is the following:

record = db.mytable(id)
record = db.mytable(db.mytable.id == id)
record = db.mytable(id, myfield='somevalue')

Apparently similar to db.mytable[id] the above syntax is more flexible and safer. First of all it checks whether id is an int (or str(id) is an int) and returns None if not (it never raises an exception). It also allows to specify multiple conditions that the record must meet. If they are not met, it also returns None.

Recursive `select`s

Consider the previous table person and a new table “thing” referencing a “person”:

db.define_table('thing',
                Field('name'),
                Field('owner_id', 'reference person'))

and a simple select from this table:

things = db(db.thing).select()

which is equivalent to

things = db(db.thing._id != None).select()

where _id is a reference to the primary key of the table. Normally db.thing._id is the same as db.thing.id and we will assume that in most of this book.

For each Row of things it is possible to fetch not just fields from the selected table (thing) but also from linked tables (recursively):

for thing in things:
    print thing.name, thing.owner_id.name

Here thing.owner_id.name requires one database select for each thing in things and it is therefore inefficient. We suggest using joins whenever possible instead of recursive selects, nevertheless this is convenient and practical when accessing individual records.

You can also do it backwards, by selecting the things referenced by a person:

person = db.person(id)
for thing in person.thing.select(orderby=db.thing.name):
    print person.name, 'owns', thing.name

In this last expression person.thing is a shortcut for

db(db.thing.owner_id == person.id)

i.e. the Set of things referenced by the current person. This syntax breaks down if the referencing table has multiple references to the referenced table. In this case one needs to be more explicit and use a full Query.

Serializing `Rows` in views

Given the following action containing a query

def index():
    return dict(rows = db(query).select())

The result of a select can be displayed in a view with the following syntax:

{{extend 'layout.html'}}
<h1>Records</h1>
{{=rows}}

Which is equivalent to:

{{extend 'layout.html'}}
<h1>Records</h1>
{{=SQLTABLE(rows)}}

SQLTABLE converts the rows into an HTML table with a header containing the column names and one row per record. The rows are marked as alternating class “even” and class “odd”. Under the hood, Rows is first converted into a SQLTABLE object (not to be confused with Table) and then serialized. The values extracted from the database are also formatted by the validators associated to the field and then escaped.

Yet it is possible and sometimes convenient to call SQLTABLE explicitly.

The SQLTABLE constructor takes the following optional arguments:

linkto lambda function or an action to be used to link reference fields (default to None).

If you assign it a string with the name of an action, it will generate a link to that function passing it, as args, the name of the table and the id of each record (in this order). Example:

linkto = 'pointed_function' # generates something like <a href="pointed_function/table_name/id_value">

If you want a different link to be generated, you can specify a lambda, wich will receive as parameters, the value of the id, the type of the object (e.g. table), and the name of the object. For example, if you want to receive the args in reverse order:

linkto = lambda id, type, name: URL(f='pointed_function', args=[id, name])

upload the URL or the download action to allow downloading of uploaded files (default to None)
headers a dictionary mapping field names to their labels to be used as headers (default to {}). It can also be an instruction. Currently we support headers='fieldname:capitalize'.
truncate the number of characters for truncating long values in the table (default is 16)
columns the list of fieldnames to be shown as columns (in tablename.fieldname format). Those not listed are not displayed (defaults to all).
**attributes generic helper attributes to be passed to the most external TABLE object.

Here is an example:

{{extend 'layout.html'}}
<h1>Records</h1>
{{=SQLTABLE(rows,
            headers='fieldname:capitalize',
            truncate=100,
            upload=URL('download'))
}}

SQLTABLE is useful but there are times when one needs more. SQLFORM.grid is an extension of SQLTABLE that creates a table with search features and pagination, as well as ability to open detailed records, create, edit and delete records. SQLFORM.smartgrid is a further generalization that allows all of the above but also creates buttons to access referencing records.

Here is an example of usage of SQLFORM.grid:

def index():
    return dict(grid=SQLFORM.grid(query))

and the corresponding view:

{{extend 'layout.html'}}
{{=grid}}

For working with multiple rows, SQLFORM.grid and SQLFORM.smartgrid are preferred to SQLTABLE because they are more powerful. Please see Chapter 7.

`orderby`, `groupby`, `limitby`, `distinct`, `having`, `orderby_on_limitby`, `join`, `left`, `cache`

The select command takes a number of optional arguments.

orderby

You can fetch the records sorted by name:

>>> for row in db().select(db.person.ALL, orderby=db.person.name):
...     print row.name
... 
Alex
Bob
Carl

You can fetch the records sorted by name in reverse order (notice the tilde):

>>> for row in db().select(db.person.ALL, orderby=~db.person.name):
...     print row.name
...
Carl
Bob
Alex

You can have the fetched records appear in random order:

>>> for row in db().select(db.person.ALL, orderby='<random>'):
...     print row.name
... 
Carl
Alex
Bob

The use of orderby='<random>' is not supported on Google NoSQL. However, to overcome this limit, sorting can be accomplished on selected rows:
import random
rows = db(...).select().sort(lambda row: random.random())

You can sort the records according to multiple fields by concatenating them with a “|”:

>>> for row in db().select(db.person.name, orderby=db.person.name|db.person.id):
...     print row.name
... 
Alex
Bob
Carl

groupby, having

Using groupby together with orderby, you can group records with the same value for the specified field (this is back-end specific, and is not on the Google NoSQL):

>>> for row in db().select(db.person.ALL,
...                        orderby=db.person.name,
...                        groupby=db.person.name):
...     print row.name
... 
Alex
Bob
Carl

You can use having in conjunction with groupby to group conditionally (only those having the condition are grouped).

>>> print db(query1).select(db.person.ALL, groupby=db.person.name, having=query2)

Notice that query1 filters records to be displayed, query2 filters records to be grouped.

distinct

With the argument distinct=True, you can specify that you only want to select distinct records. This has the same effect as grouping using all specified fields except that it does not require sorting. When using distinct it is important not to select ALL fields, and in particular not to select the “id” field, else all records will always be distinct.

Here is an example:

>>> for row in db().select(db.person.name, distinct=True):
...     print row.name
... 
Alex
Bob
Carl

Notice that distinct can also be an expression, for example:

>>> for row in db().select(db.person.name, distinct=db.person.name):
...     print row.name
... 
Alex
Bob
Carl

limitby

With limitby=(min, max), you can select a subset of the records from offset=min to but not including offset=max. In the next example we select the first two records starting at zero:

>>> for row in db().select(db.person.ALL, limitby=(0, 2)):
...     print row.name
... 
Alex
Bob

orderby_on_limitby

Note that the DAL defaults to implicitly adding an orderby when using a limitby. This ensures the same query returns the same results each time, important for pagination. But it can cause performance problems. use orderby_on_limitby = False to change this (this defaults to True).

join, left

These are involved in managing one to many relations. They are described in Inner join and Left outer join sections respectively.

cache, cacheable

An example use which gives much faster selects is:

rows = db(query).select(cache=(cache.ram, 3600), cacheable=True)

Look at Caching selects section in this chapter, to understand what the trade-offs are.

Logical operators

Queries can be combined using the binary AND operator “&“:

>>> rows = db((db.person.name=='Alex') & (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
>>> len(rows)
0

and the binary OR operator “|“:

>>> rows = db((db.person.name == 'Alex') | (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
1 Alex

You can negate a sub-query inverting its operator:

>>> rows = db((db.person.name != 'Alex') | (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
2 Bob
3 Carl

or by explicit negation with the “~“ unary operator:

>>> rows = db(~(db.person.name == 'Alex') | (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
2 Bob
3 Carl

Due to Python restrictions in overloading “and“ and “or“ operators, these cannot be used in forming queries. The binary operators “&“ and “|“ must be used instead. Note that these operators (unlike “and“ and “or“) have higher precedence than comparison operators, so the “extra” parentheses in the above examples are mandatory. Similarly, the unary operator “~“ has higher precedence than comparison operators, so ~-negated comparisons must also be parenthesized.

It is also possible to build queries using in-place logical operators:

>>> query = db.person.name != 'Alex'
>>> query &= db.person.id > 3
>>> query |= db.person.name == 'John'

`count`, `isempty`, `delete`, `update`

You can count records in a set:

>>> db(db.person.name != 'William').count()
3

Notice that count takes an optional distinct argument which defaults to False, and it works very much like the same argument for select. count has also a cache argument that works very much like the equivalent argument of the select method.

Sometimes you may need to check if a table is empty. A more efficient way than counting is using the isempty method:

>>> db(db.person).isempty()
False

You can delete records in a set:

>>> db(db.person.id > 3).delete()
0

The delete method returns the number of records that were deleted.

And you can update all records in a set by passing named arguments corresponding to the fields that need to be updated:

>>> db(db.person.id > 2).update(name='Ken')
1

The update method returns the number of records that were updated.

Expressions

The value assigned an update statement can be an expression. For example consider this model

db.define_table('person',
                Field('name'),
                Field('visits', 'integer', default=0))
db(db.person.name == 'Massimo').update(visits = db.person.visits + 1)

The values used in queries can also be expressions

db.define_table('person',
                Field('name'),
                Field('visits', 'integer', default=0),
                Field('clicks', 'integer', default=0))
db(db.person.visits == db.person.clicks + 1).delete()

`case`

An expression can contain a case clause for example:

>>> condition = db.person.name.startswith('B')
>>> yes_or_no = condition.case('Yes', 'No')
>>> for row in db().select(db.person.name, yes_or_no):
...     print row.person.name, row[yes_or_no]  # could be row(yes_or_no) too
... 
Alex No
Bob Yes
Ken No

`update_record`

web2py also allows updating a single record that is already in memory using update_record

>>> row = db(db.person.id == 2).select().first()
>>> row.update_record(name='Curt')
<Row {'id': 2L, 'name': 'Curt'}>

update_record should not be confused with

>>> row.update(name='Curt')

because for a single row, the method update updates the row object but not the database record, as in the case of update_record.

It is also possible to change the attributes of a row (one at a time) and then call update_record() without arguments to save the changes:

>>> row = db(db.person.id > 2).select().first()
>>> row.name = 'Philip'
>>> row.update_record() # saves above change
<Row {'id': 3L, 'name': 'Philip'}>

Note, you should avoid using row.update_record() with no arguments when the row object contains fields that have an update attribute (e.g., Field('modified_on', update=request.now)). Calling row.update_record() will retain all of the existing values in the row object, so any fields with update attributes will have no effect in this case. Be particularly mindful of this with tables that include auth.signature.

The update_record method is available only if the table’s id field is included in the select, and cacheable is not set to True.

Inserting and updating from a dictionary

A common issue consists of needing to insert or update records in a table where the name of the table, the field to be updated, and the value for the field are all stored in variables. For example: tablename, fieldname, and value.

The insert can be done using the following syntax:

db[tablename].insert(**{fieldname:value})

The update of record with given id can be done with:

db(db[tablename]._id == id).update(**{fieldname:value})

Notice we used table._id instead of table.id. In this way the query works even for tables with a primary key field with type other than “id”.

`first` and `last`

Given a Rows object containing records:

rows = db(query).select()
first_row = rows.first()
last_row = rows.last()

are equivalent to

first_row = rows[0] if len(rows) else None
last_row = rows[-1] if len(rows) else None

Notice, first() and last() allow you to obtain obviously the first and last record present in your query, but this won’t mean that these records are going to be the first or last inserted records. In case you want the first or last record inputted in a given table don’t forget to use orderby=db.table_name.id. If you forget you will only get the first and last record returned by your query which are often in a random order determined by the backend query optimiser.

`as_dict` and `as_list`

A Row object can be serialized into a regular dictionary using the as_dict() method and a Rows object can be serialized into a list of dictionaries using the as_list() method. Here are some examples:

rows = db(query).select()
rows_list = rows.as_list()
first_row_dict = rows.first().as_dict()

These methods are convenient for passing Rows to generic views and or to store Rows in sessions (since Rows objects themselves cannot be serialized since contain a reference to an open DB connection):

rows = db(query).select()
session.rows = rows  # not allowed!
session.rows = rows.as_list()  # allowed!

Combining rows

Rows objects can be combined at the Python level. Here we assume:

>>> print rows1
person.name
Max
Tim
>>> print rows2
person.name
John
Tim

You can do union of the records in two sets of rows:

>>> rows3 = rows1 + rows2
>>> print rows3
person.name
Max
Tim
John
Tim

You can do union of the records removing duplicates:

>>> rows3 = rows1 | rows2
>>> print rows3
person.name
Max
Tim
John

You can do intersection of the records in two sets of rows:

>>> rows3 = rows1 & rows2
>>> print rows3
person.name
Tim

`find`, `exclude`, `sort`

Some times you need to perform two selects and one contains a subset of a previous select. In this case it is pointless to access the database again. The find, exclude and sort objects allow you to manipulate a Rows object and generate another one without accessing the database. More specifically:

find returns a new set of Rows filtered by a condition and leaves the original unchanged.
exclude returns a new set of Rows filtered by a condition and removes them from the original Rows.
sort returns a new set of Rows sorted by a condition and leaves the original unchanged.

All these methods take a single argument, a function that acts on each individual row.

Here is an example of usage:

>>> db.define_table('person', Field('name'))
<Table person (id, name)>
>>> db.person.insert(name='John')
1
>>> db.person.insert(name='Max')
2
>>> db.person.insert(name='Alex')
3
>>> rows = db(db.person).select()
>>> for row in rows.find(lambda row: row.name[0]=='M'):
...     print row.name
... 
Max
>>> len(rows)
3
>>> for row in rows.exclude(lambda row: row.name[0]=='M'):
...     print row.name
... 
Max
>>> len(rows)
2
>>> for row in rows.sort(lambda row: row.name):
...     print row.name
... 
Alex
John

They can be combined:

>>> rows = db(db.person).select()
>>> rows = rows.find(lambda row: 'x' in row.name).sort(lambda row: row.name)
>>> for row in rows:
...     print row.name
... 
Alex
Max

Sort takes an optional argument reverse=True with the obvious meaning.

The find method has an optional limitby argument with the same syntax and functionality as the Set select method.

select