06 The database abstraction layer - Exporting and importing data - 《web2py Reference Manual, 6th Edition (pre-release)》

Exporting and importing data

Exporting and importing data

CSV (one Table at a time)

When a Rows object is converted to a string it is automatically serialized in CSV:

>>> rows = db(db.person.id == db.thing.owner_id).select()
>>> print rows
person.id,person.name,thing.id,thing.name,thing.owner_id
1,Alex,1,Boat,1
1,Alex,2,Chair,1
2,Bob,3,Shoes,2

You can serialize a single table in CSV and store it in a file “test.csv”:

with open('test.csv', 'wb') as dumpfile:
    dumpfile.write(str(db(db.person).select()))

Or in Python 3:

>>> open('test.csv', 'w', encoding='utf-8', newline='').write(str(db(db.person.id).select()))

This is equivalent to

rows = db(db.person).select()
with open('test.csv', 'wb') as dumpfile:
    rows.export_to_csv_file(dumpfile)

You can read the CSV file back with:

with open('test.csv', 'rb') as dumpfile:
    db.person.import_from_csv_file(dumpfile)

Or in Python 3:

>>> rows = db(db.person.id).select()
>>> rows.export_to_csv_file(open('test.csv', 'w', encoding='utf-8', newline=''))

You can read the CSV file back with:

>>> db.person.import_from_csv_file(open('test.csv', 'r', encoding='utf-8', newline=''))

When importing, web2py looks for the field names in the CSV header. In this example, it finds two columns: “person.id” and “person.name”. It ignores the “person.” prefix, and it ignores the “id” fields. Then all records are appended and assigned new ids. Both of these operations can be performed via the appadmin web interface.

CSV (all tables at once)

In web2py, you can backup/restore an entire database with two commands:

To export:

with open('somefile.csv', 'wb') as dumpfile:
    db.export_to_csv_file(dumpfile)

To import:

with open('somefile.csv', 'rb') as dumpfile:
    db.import_from_csv_file(dumpfile)

Or in Python 3:

To export:

>>> db.export_to_csv_file(open('test.csv', 'w', encoding='utf-8', newline=''))

To import:

>>> db.import_from_csv_file(open('test.csv', 'r', encoding='utf-8', newline=''))

This mechanism can be used even if the importing database is of a different type than the exporting database.

The data is stored in “somefile.csv” as a CSV file where each table starts with one line that indicates the tablename, and another line with the fieldnames:

TABLE tablename
field1,field2,field3,...

Two tables are separated by \r\n\r\n (that is two empty lines). The file ends with the line

END

The file does not include uploaded files if these are not stored in the database. The upload files stored on filesystem must be dumped separately, a zip of the “uploads” folder may suffice in most cases.

When importing, the new records will be appended to the database if it is not empty. In general the new imported records will not have the same record id as the original (saved) records but web2py will restore references so they are not broken, even if the id values may change.

If a table contains a field called uuid, this field will be used to identify duplicates. Also, if an imported record has the same uuid as an existing record, the previous record will be updated.

CSV and remote database synchronization

Consider once again the following model:

db.define_table('person',
                Field('name'))
db.define_table('thing',
                Field('name'),
                Field('owner_id', 'reference person'))
# usage example
if db(db.person).isempty():
    nid = db.person.insert(name='Massimo')
    db.thing.insert(name='Chair', owner_id=nid)

Each record is identified by an identifier and referenced by that id. If you have two copies of the database used by distinct web2py installations, the id is unique only within each database and not across the databases. This is a problem when merging records from different databases.

In order to make records uniquely identifiable across databases, they must:

have a unique id (UUID),
have a last modification time to track the most recent among multiple copies,
reference the UUID instead of the id.

This can be achieved changing the above model into:

import uuid
db.define_table('person',
                Field('uuid', length=64),
                Field('modified_on', 'datetime', default=request.now, update=request.now),
                Field('name'))
db.define_table('thing',
                Field('uuid', length=64),
                Field('modified_on', 'datetime', default=request.now, update=request.now),
                Field('name'),
                Field('owner_id', length=64))
db.person.uuid.default = db.thing.uuid.default = lambda:str(uuid.uuid4())
db.thing.owner_id.requires = IS_IN_DB(db, 'person.uuid', '%(name)s')
# usage example
if db(db.person).isempty():
    nid = str(uuid.uuid4())
    db.person.insert(uuid=nid, name='Massimo')
    db.thing.insert(name='Chair', owner_id=nid)

Notice that in the above table definitions, the default value for the two uuid fields is set to a lambda function, which returns a UUID (converted to a string). The lambda function is called once for each record inserted, ensuring that each record gets a unique UUID, even if multiple records are inserted in a single transaction.

Create a controller action to export the database:

def export():
    s = StringIO.StringIO()
    db.export_to_csv_file(s)
    response.headers['Content-Type'] = 'text/csv'
    return s.getvalue()

Create a controller action to import a saved copy of the other database and sync records:

def import_and_sync():
    form = FORM(INPUT(_type='file', _name='data'), INPUT(_type='submit'))
    if form.process().accepted:
        db.import_from_csv_file(form.vars.data.file, unique=False)
        # for every table
        for tablename in db.tables:
            table = db[tablename]
            # for every uuid, delete all but the latest
            items = db(table).select(table.id, table.uuid,
                                     orderby=~table.modified_on,
                                     groupby=table.uuid)
            for item in items:
                db((table.uuid == item.uuid) & (table.id != item.id)).delete()
    return dict(form=form)

Optionally you should create an index manually to make the search by uuid faster.

Alternatively, you can use XML-RPC to export/import the file.

If the records reference uploaded files, you also need to export/import the content of the uploads folder. Notice that files therein are already labeled by UUIDs so you do not need to worry about naming conflicts and references.

HTML and XML (one Table at a time)

Rows objects also have an xml method (like helpers) that serializes it to XML/HTML:

>>> rows = db(db.person.id == db.thing.owner_id).select()
>>> print rows.xml()

<table>
<thead>
<tr><th>person.id</th><th>person.name</th><th>thing.id</th><th>thing.name</th><th>thing.owner_id</th></tr>
</thead>
<tbody>
<tr class="w2p_odd odd"><td>1</td><td>Alex</td><td>1</td><td>Boat</td><td>1</td></tr>
<tr class="w2p_even even"><td>1</td><td>Alex</td><td>2</td><td>Chair</td><td>1</td></tr>
<tr class="w2p_odd odd"><td>2</td><td>Bob</td><td>3</td><td>Shoes</td><td>2</td></tr>
</tbody>
</table>

If you need to serialize the Rows in any other XML format with custom tags, you can easily do that using the universal TAG helper (described in Chapter 5) and the Python syntax *<iterable> allowed in function calls:

>>> rows = db(db.person).select()
>>> print TAG.result(*[TAG.row(*[TAG.field(r[f], _name=f) for f in db.person.fields]) for r in rows])

<result>
<row><field name="id">1</field><field name="name">Alex</field></row>
<row><field name="id">2</field><field name="name">Bob</field></row>
<row><field name="id">3</field><field name="name">Carl</field></row>
</result>

Data representation

The Rows.export_to_csv_file method accepts a keyword argument named represent. When True it will use the columns represent function while exporting the data instead of the raw data.

The function also accepts a keyword argument named colnames that should contain a list of column names one wish to export. It defaults to all columns.

Both export_to_csv_file and import_from_csv_file accept keyword arguments that tell the csv parser the format to save/load the files:

delimiter: delimiter to separate values (default ‘,’)
quotechar: character to use to quote string values (default to double quotes)
quoting: quote system (default csv.QUOTE_MINIMAL)

Here is some example usage:

import csv
rows = db(query).select()
with open('/tmp/test.txt', 'wb') as oufile:
    rows.export_to_csv_file(oufile,
                            delimiter='|',
                            quotechar='"',
                            quoting=csv.QUOTE_NONNUMERIC)

Which would render something similar to

"hello"|35|"this is the text description"|"2013-03-03"

For more information consult the official Python documentation [quoteall]