select
- Using an iterator-based select for lower memory use
- Rendering rows using represent
- Shortcuts
- Fetching a
Row
- Recursive
select
s - Serializing
Rows
in views orderby
,groupby
,limitby
,distinct
,having
,orderby_on_limitby
,join
,left
,cache
- Logical operators
count
,isempty
,delete
,update
- Expressions
case
update_record
- Inserting and updating from a dictionary
first
andlast
as_dict
andas_list
- Combining rows
find
,exclude
,sort
select
Given a Set, s
, you can fetch the records with the command select
:
>>> rows = s.select()
It returns an iterable object of class pydal.objects.Rows
whose elements are Row objects. pydal.objects.Row
objects act like dictionaries, but their elements can also be accessed as attributes, like gluon.storage.Storage
.The former differ from the latter because its values are read-only.
The Rows object allows looping over the result of the select and printing the selected field values for each row:
>>> for row in rows:
... print row.id, row.name
...
1 Alex
You can do all the steps in one statement:
>>> for row in db(db.person.name == 'Alex').select():
... print row.name
...
Alex
The select command can take arguments. All unnamed arguments are interpreted as the names of the fields that you want to fetch. For example, you can be explicit on fetching field “id” and field “name”:
>>> for row in db().select(db.person.id, db.person.name):
... print row.name
...
Alex
Bob
Carl
The table attribute ALL allows you to specify all fields:
>>> for row in db().select(db.person.ALL):
... print row.id, row.name
...
1 Alex
2 Bob
3 Carl
Notice that there is no query string passed to db. web2py understands that if you want all fields of the table person without additional information then you want all records of the table person.
An equivalent alternative syntax is the following:
>>> for row in db(db.person).select():
... print row.id, row.name
...
1 Alex
2 Bob
3 Carl
and web2py understands that if you ask for all records of the table person without additional information, then you want all the fields of table person.
Given one row
>>> row = rows[0]
you can extract its values using multiple equivalent expressions:
>>> row.name
Alex
>>> row['name']
Alex
>>> row('person.name')
Alex
The latter syntax is particularly handy when selecting en expression instead of a column. We will show this later.
You can also do
rows.compact = False
to disable the notation
rows[i].name
and enable, instead, the less compact notation:
rows[i].person.name
Yes this is unusual and rarely needed.
Row objects also have two important methods:
row.delete_record()
and
row.update_record(name="new value")
Using an iterator-based select for lower memory use
Python “iterators” are a type of “lazy-evaluation”. They ‘feed’ data one step at time; traditional Python loops create the entire set of data in memory before looping.
The traditional use of select is:
for row in db(db.table).select():
...
but for large numbers of rows, using an iterator-based alternative has dramatically lower memory use:
for row in db(db.table).iterselect():
...
Testing shows this is around 10% faster as well, even on machines with large RAM.
Rendering rows using represent
You may wish to rewrite rows returned by select to take advantage of formatting information contained in the represents setting of the fields.
rows = db(query).select()
repr_row = rows.render(0)
If you don’t specify an index, you get a generator to iterate over all the rows:
for row in rows.render():
print row.myfield
Can also be applied to slices:
for row in rows[0:10].render():
print row.myfield
If you only want to transform selected fields via their “represent” attribute, you can list them in the “fields” argument:
repr_row = row.render(0, fields=[db.mytable.myfield])
Note, it returns a transformed copy of the original Row, so there’s no update_record (which you wouldn’t want anyway) or delete_record.
Shortcuts
The DAL supports various code-simplifying shortcuts. In particular:
myrecord = db.mytable[id]
returns the record with the given id
if it exists. If the id
does not exist, it returns None
. The above statement is equivalent to
myrecord = db(db.mytable.id == id).select().first()
You can delete records by id:
del db.mytable[id]
and this is equivalent to
db(db.mytable.id == id).delete()
and deletes the record with the given id
, if it exists.
Note: this delete shortcut syntax does not currently work if versioning is activated
You can insert records:
db.mytable[None] = dict(myfield='somevalue')
It is equivalent to
db.mytable.insert(myfield='somevalue')
and it creates a new record with field values specified by the dictionary on the right hand side.
Note: insert shortcut was previously db.table[0] = ...
. It has changed in PyDAL 19.02 to permit normal usage of id 0.
You can update records:
db.mytable[id] = dict(myfield='somevalue')
which is equivalent to
db(db.mytable.id == id).update(myfield='somevalue')
and it updates an existing record with field values specified by the dictionary on the right hand side.
Fetching a Row
Yet another convenient syntax is the following:
record = db.mytable(id)
record = db.mytable(db.mytable.id == id)
record = db.mytable(id, myfield='somevalue')
Apparently similar to db.mytable[id]
the above syntax is more flexible and safer. First of all it checks whether id
is an int (or str(id)
is an int) and returns None
if not (it never raises an exception). It also allows to specify multiple conditions that the record must meet. If they are not met, it also returns None
.
Recursive select
s
Consider the previous table person and a new table “thing” referencing a “person”:
db.define_table('thing',
Field('name'),
Field('owner_id', 'reference person'))
and a simple select from this table:
things = db(db.thing).select()
which is equivalent to
things = db(db.thing._id != None).select()
where _id
is a reference to the primary key of the table. Normally db.thing._id
is the same as db.thing.id
and we will assume that in most of this book.
For each Row of things it is possible to fetch not just fields from the selected table (thing) but also from linked tables (recursively):
for thing in things:
print thing.name, thing.owner_id.name
Here thing.owner_id.name
requires one database select for each thing in things and it is therefore inefficient. We suggest using joins whenever possible instead of recursive selects, nevertheless this is convenient and practical when accessing individual records.
You can also do it backwards, by selecting the things referenced by a person:
person = db.person(id)
for thing in person.thing.select(orderby=db.thing.name):
print person.name, 'owns', thing.name
In this last expression person.thing
is a shortcut for
db(db.thing.owner_id == person.id)
i.e. the Set of thing
s referenced by the current person
. This syntax breaks down if the referencing table has multiple references to the referenced table. In this case one needs to be more explicit and use a full Query.
Serializing Rows
in views
Given the following action containing a query
def index():
return dict(rows = db(query).select())
The result of a select can be displayed in a view with the following syntax:
{{extend 'layout.html'}}
<h1>Records</h1>
{{=rows}}
Which is equivalent to:
{{extend 'layout.html'}}
<h1>Records</h1>
{{=SQLTABLE(rows)}}
SQLTABLE
converts the rows into an HTML table with a header containing the column names and one row per record. The rows are marked as alternating class “even” and class “odd”. Under the hood, Rows is first converted into a SQLTABLE object (not to be confused with Table) and then serialized. The values extracted from the database are also formatted by the validators associated to the field and then escaped.
Yet it is possible and sometimes convenient to call SQLTABLE explicitly.
The SQLTABLE constructor takes the following optional arguments:
linkto
lambda function or an action to be used to link reference fields (default to None).
If you assign it a string with the name of an action, it will generate a link to that function passing it, as args, the name of the table and the id of each record (in this order). Example:
linkto = 'pointed_function' # generates something like <a href="pointed_function/table_name/id_value">
If you want a different link to be generated, you can specify a lambda, wich will receive as parameters, the value of the id, the type of the object (e.g. table), and the name of the object. For example, if you want to receive the args in reverse order:
linkto = lambda id, type, name: URL(f='pointed_function', args=[id, name])
upload
the URL or the download action to allow downloading of uploaded files (default to None)headers
a dictionary mapping field names to their labels to be used as headers (default to{}
). It can also be an instruction. Currently we supportheaders='fieldname:capitalize'
.truncate
the number of characters for truncating long values in the table (default is 16)columns
the list of fieldnames to be shown as columns (in tablename.fieldname format). Those not listed are not displayed (defaults to all).**attributes
generic helper attributes to be passed to the most external TABLE object.
Here is an example:
{{extend 'layout.html'}}
<h1>Records</h1>
{{=SQLTABLE(rows,
headers='fieldname:capitalize',
truncate=100,
upload=URL('download'))
}}
SQLTABLE
is useful but there are times when one needs more.SQLFORM.grid
is an extension of SQLTABLE that creates a table with search features and pagination, as well as ability to open detailed records, create, edit and delete records.SQLFORM.smartgrid
is a further generalization that allows all of the above but also creates buttons to access referencing records.
Here is an example of usage of SQLFORM.grid
:
def index():
return dict(grid=SQLFORM.grid(query))
and the corresponding view:
{{extend 'layout.html'}}
{{=grid}}
For working with multiple rows, SQLFORM.grid
and SQLFORM.smartgrid
are preferred to SQLTABLE
because they are more powerful. Please see Chapter 7.
orderby
, groupby
, limitby
, distinct
, having
, orderby_on_limitby
, join
, left
, cache
The select
command takes a number of optional arguments.
orderby
You can fetch the records sorted by name:
>>> for row in db().select(db.person.ALL, orderby=db.person.name):
... print row.name
...
Alex
Bob
Carl
You can fetch the records sorted by name in reverse order (notice the tilde):
>>> for row in db().select(db.person.ALL, orderby=~db.person.name):
... print row.name
...
Carl
Bob
Alex
You can have the fetched records appear in random order:
>>> for row in db().select(db.person.ALL, orderby='<random>'):
... print row.name
...
Carl
Alex
Bob
The use of
orderby='<random>'
is not supported on Google NoSQL. However, to overcome this limit, sorting can be accomplished on selected rows:
import random
rows = db(...).select().sort(lambda row: random.random())
You can sort the records according to multiple fields by concatenating them with a “|”:
>>> for row in db().select(db.person.name, orderby=db.person.name|db.person.id):
... print row.name
...
Alex
Bob
Carl
groupby, having
Using groupby
together with orderby
, you can group records with the same value for the specified field (this is back-end specific, and is not on the Google NoSQL):
>>> for row in db().select(db.person.ALL,
... orderby=db.person.name,
... groupby=db.person.name):
... print row.name
...
Alex
Bob
Carl
You can use having
in conjunction with groupby
to group conditionally (only those having
the condition are grouped).
>>> print db(query1).select(db.person.ALL, groupby=db.person.name, having=query2)
Notice that query1 filters records to be displayed, query2 filters records to be grouped.
distinct
With the argument distinct=True
, you can specify that you only want to select distinct records. This has the same effect as grouping using all specified fields except that it does not require sorting. When using distinct it is important not to select ALL fields, and in particular not to select the “id” field, else all records will always be distinct.
Here is an example:
>>> for row in db().select(db.person.name, distinct=True):
... print row.name
...
Alex
Bob
Carl
Notice that distinct
can also be an expression, for example:
>>> for row in db().select(db.person.name, distinct=db.person.name):
... print row.name
...
Alex
Bob
Carl
limitby
With limitby=(min, max)
, you can select a subset of the records from offset=min to but not including offset=max. In the next example we select the first two records starting at zero:
>>> for row in db().select(db.person.ALL, limitby=(0, 2)):
... print row.name
...
Alex
Bob
orderby_on_limitby
Note that the DAL defaults to implicitly adding an orderby when using a limitby. This ensures the same query returns the same results each time, important for pagination. But it can cause performance problems. use orderby_on_limitby = False
to change this (this defaults to True).
join, left
These are involved in managing one to many relations. They are described in Inner join and Left outer join sections respectively.
cache, cacheable
An example use which gives much faster selects is:
rows = db(query).select(cache=(cache.ram, 3600), cacheable=True)
Look at Caching selects section in this chapter, to understand what the trade-offs are.
Logical operators
Queries can be combined using the binary AND operator “&
“:
>>> rows = db((db.person.name=='Alex') & (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
>>> len(rows)
0
and the binary OR operator “|
“:
>>> rows = db((db.person.name == 'Alex') | (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
1 Alex
You can negate a sub-query inverting its operator:
>>> rows = db((db.person.name != 'Alex') | (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
2 Bob
3 Carl
or by explicit negation with the “~
“ unary operator:
>>> rows = db(~(db.person.name == 'Alex') | (db.person.id > 3)).select()
>>> for row in rows: print row.id, row.name
2 Bob
3 Carl
Due to Python restrictions in overloading “
and
“ and “or
“ operators, these cannot be used in forming queries. The binary operators “&
“ and “|
“ must be used instead. Note that these operators (unlike “and
“ and “or
“) have higher precedence than comparison operators, so the “extra” parentheses in the above examples are mandatory. Similarly, the unary operator “~
“ has higher precedence than comparison operators, so~
-negated comparisons must also be parenthesized.
It is also possible to build queries using in-place logical operators:
>>> query = db.person.name != 'Alex'
>>> query &= db.person.id > 3
>>> query |= db.person.name == 'John'
count
, isempty
, delete
, update
You can count records in a set:
>>> db(db.person.name != 'William').count()
3
Notice that count
takes an optional distinct
argument which defaults to False, and it works very much like the same argument for select
. count
has also a cache
argument that works very much like the equivalent argument of the select
method.
Sometimes you may need to check if a table is empty. A more efficient way than counting is using the isempty
method:
>>> db(db.person).isempty()
False
You can delete records in a set:
>>> db(db.person.id > 3).delete()
0
The delete
method returns the number of records that were deleted.
And you can update all records in a set by passing named arguments corresponding to the fields that need to be updated:
>>> db(db.person.id > 2).update(name='Ken')
1
The update
method returns the number of records that were updated.
Expressions
The value assigned an update statement can be an expression. For example consider this model
db.define_table('person',
Field('name'),
Field('visits', 'integer', default=0))
db(db.person.name == 'Massimo').update(visits = db.person.visits + 1)
The values used in queries can also be expressions
db.define_table('person',
Field('name'),
Field('visits', 'integer', default=0),
Field('clicks', 'integer', default=0))
db(db.person.visits == db.person.clicks + 1).delete()
case
An expression can contain a case clause for example:
>>> condition = db.person.name.startswith('B')
>>> yes_or_no = condition.case('Yes', 'No')
>>> for row in db().select(db.person.name, yes_or_no):
... print row.person.name, row[yes_or_no] # could be row(yes_or_no) too
...
Alex No
Bob Yes
Ken No
update_record
web2py also allows updating a single record that is already in memory using update_record
>>> row = db(db.person.id == 2).select().first()
>>> row.update_record(name='Curt')
<Row {'id': 2L, 'name': 'Curt'}>
update_record
should not be confused with
>>> row.update(name='Curt')
because for a single row, the method update
updates the row object but not the database record, as in the case of update_record
.
It is also possible to change the attributes of a row (one at a time) and then call update_record()
without arguments to save the changes:
>>> row = db(db.person.id > 2).select().first()
>>> row.name = 'Philip'
>>> row.update_record() # saves above change
<Row {'id': 3L, 'name': 'Philip'}>
Note, you should avoid using
row.update_record()
with no arguments when therow
object contains fields that have anupdate
attribute (e.g.,Field('modified_on', update=request.now)
). Callingrow.update_record()
will retain all of the existing values in therow
object, so any fields withupdate
attributes will have no effect in this case. Be particularly mindful of this with tables that includeauth.signature
.
The update_record
method is available only if the table’s id
field is included in the select, and cacheable
is not set to True
.
Inserting and updating from a dictionary
A common issue consists of needing to insert or update records in a table where the name of the table, the field to be updated, and the value for the field are all stored in variables. For example: tablename
, fieldname
, and value
.
The insert can be done using the following syntax:
db[tablename].insert(**{fieldname:value})
The update of record with given id can be done with:
db(db[tablename]._id == id).update(**{fieldname:value})
Notice we used table._id
instead of table.id
. In this way the query works even for tables with a primary key field with type other than “id”.
first
and last
Given a Rows object containing records:
rows = db(query).select()
first_row = rows.first()
last_row = rows.last()
are equivalent to
first_row = rows[0] if len(rows) else None
last_row = rows[-1] if len(rows) else None
Notice, first()
and last()
allow you to obtain obviously the first and last record present in your query, but this won’t mean that these records are going to be the first or last inserted records. In case you want the first or last record inputted in a given table don’t forget to use orderby=db.table_name.id
. If you forget you will only get the first and last record returned by your query which are often in a random order determined by the backend query optimiser.
as_dict
and as_list
A Row object can be serialized into a regular dictionary using the as_dict()
method and a Rows object can be serialized into a list of dictionaries using the as_list()
method. Here are some examples:
rows = db(query).select()
rows_list = rows.as_list()
first_row_dict = rows.first().as_dict()
These methods are convenient for passing Rows to generic views and or to store Rows in sessions (since Rows objects themselves cannot be serialized since contain a reference to an open DB connection):
rows = db(query).select()
session.rows = rows # not allowed!
session.rows = rows.as_list() # allowed!
Combining rows
Rows objects can be combined at the Python level. Here we assume:
>>> print rows1
person.name
Max
Tim
>>> print rows2
person.name
John
Tim
You can do union of the records in two sets of rows:
>>> rows3 = rows1 + rows2
>>> print rows3
person.name
Max
Tim
John
Tim
You can do union of the records removing duplicates:
>>> rows3 = rows1 | rows2
>>> print rows3
person.name
Max
Tim
John
You can do intersection of the records in two sets of rows:
>>> rows3 = rows1 & rows2
>>> print rows3
person.name
Tim
find
, exclude
, sort
Some times you need to perform two selects and one contains a subset of a previous select. In this case it is pointless to access the database again. The find
, exclude
and sort
objects allow you to manipulate a Rows object and generate another one without accessing the database. More specifically:
find
returns a new set of Rows filtered by a condition and leaves the original unchanged.exclude
returns a new set of Rows filtered by a condition and removes them from the original Rows.sort
returns a new set of Rows sorted by a condition and leaves the original unchanged.
All these methods take a single argument, a function that acts on each individual row.
Here is an example of usage:
>>> db.define_table('person', Field('name'))
<Table person (id, name)>
>>> db.person.insert(name='John')
1
>>> db.person.insert(name='Max')
2
>>> db.person.insert(name='Alex')
3
>>> rows = db(db.person).select()
>>> for row in rows.find(lambda row: row.name[0]=='M'):
... print row.name
...
Max
>>> len(rows)
3
>>> for row in rows.exclude(lambda row: row.name[0]=='M'):
... print row.name
...
Max
>>> len(rows)
2
>>> for row in rows.sort(lambda row: row.name):
... print row.name
...
Alex
John
They can be combined:
>>> rows = db(db.person).select()
>>> rows = rows.find(lambda row: 'x' in row.name).sort(lambda row: row.name)
>>> for row in rows:
... print row.name
...
Alex
Max
Sort takes an optional argument reverse=True
with the obvious meaning.
The find
method has an optional limitby
argument with the same syntax and functionality as the Set select
method.