Additional Persistence Techniques

Additional Persistence Techniques

Embedding SQL Insert/Update Expressions into a Flush

This feature allows the value of a database column to be set to a SQLexpression instead of a literal value. It’s especially useful for atomicupdates, calling stored procedures, etc. All you do is assign an expression toan attribute:

class SomeClass(Base):
    __tablename__ = "some_table"
 
    # ...
 
    value = Column(Integer)
 
someobject = session.query(SomeClass).get(5)
 
# set 'value' attribute to a SQL expression adding one
someobject.value = SomeClass.value + 1
 
# issues "UPDATE some_table SET value=value+1"
session.commit()

This technique works both for INSERT and UPDATE statements. After theflush/commit operation, the value attribute on someobject above isexpired, so that when next accessed the newly generated value will be loadedfrom the database.

The feature also has conditional support to work in conjunction withprimary key columns. A database that supports RETURNING, e.g. PostgreSQL,Oracle, or SQL Server, or as a special case when using SQLite with the pysqlitedriver and a single auto-increment column, a SQL expression may be assignedto a primary key column as well. This allows both the SQL expression tobe evaluated, as well as allows any server side triggers that modify theprimary key value on INSERT, to be successfully retrieved by the ORM aspart of the object’s primary key:

class Foo(Base):
    __tablename__ = 'foo'
    pk = Column(Integer, primary_key=True)
    bar = Column(Integer)
 
e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.create_all(e)
 
session = Session(e)
 
foo = Foo(pk=sql.select([sql.func.coalesce(sql.func.max(Foo.pk) + 1, 1)])
session.add(foo)
session.commit()

On PostgreSQL, the above Session will emit the following INSERT:

INSERT INTO foo (foopk, bar) VALUES
((SELECT coalesce(max(foo.foopk) + %(max_1)s, %(coalesce_2)s) AS coalesce_1
FROM foo), %(bar)s) RETURNING foo.foopk

New in version 1.3: SQL expressions can now be passed to a primary key column during an ORMflush; if the database supports RETURNING, or if pysqlite is in use, theORM will be able to retrieve the server-generated value as the valueof the primary key attribute.

Using SQL Expressions with Sessions

SQL expressions and strings can be executed via theSession within its transactional context.This is most easily accomplished using theexecute() method, which returns aResultProxy in the same manner as anEngine orConnection:

Session = sessionmaker(bind=engine)
session = Session()
 
# execute a string statement
result = session.execute("select * from table where id=:id", {'id':7})
 
# execute a SQL expression construct
result = session.execute(select([mytable]).where(mytable.c.id==7))

The current Connection held by theSession is accessible using theconnection() method:

connection = session.connection()

The examples above deal with a Session that’sbound to a single Engine orConnection. To execute statements using aSession which is bound either to multipleengines, or none at all (i.e. relies upon bound metadata), bothexecute() andconnection() accept a mapper keywordargument, which is passed a mapped class orMapper instance, which is used to locate theproper context for the desired engine:

Session = sessionmaker()
session = Session()
 
# need to specify mapper or class when executing
result = session.execute("select * from table where id=:id", {'id':7}, mapper=MyMappedClass)
 
result = session.execute(select([mytable], mytable.c.id==7), mapper=MyMappedClass)
 
connection = session.connection(MyMappedClass)

Forcing NULL on a column with a default

The ORM considers any attribute that was never set on an object as a“default” case; the attribute will be omitted from the INSERT statement:

class MyObject(Base):
    __tablename__ = 'my_table'
    id = Column(Integer, primary_key=True)
    data = Column(String(50), nullable=True)
 
obj = MyObject(id=1)
session.add(obj)
session.commit()  # INSERT with the 'data' column omitted; the database
                  # itself will persist this as the NULL value

Omitting a column from the INSERT means that the column willhave the NULL value set, unless the column has a default set up,in which case the default value will be persisted. This holds trueboth from a pure SQL perspective with server-side defaults, as well as thebehavior of SQLAlchemy’s insert behavior with both client-side and server-sidedefaults:

class MyObject(Base):
    __tablename__ = 'my_table'
    id = Column(Integer, primary_key=True)
    data = Column(String(50), nullable=True, server_default="default")
 
obj = MyObject(id=1)
session.add(obj)
session.commit()  # INSERT with the 'data' column omitted; the database
                  # itself will persist this as the value 'default'

However, in the ORM, even if one assigns the Python value None explicitlyto the object, this is treated the same as though the value were neverassigned:

class MyObject(Base):
    __tablename__ = 'my_table'
    id = Column(Integer, primary_key=True)
    data = Column(String(50), nullable=True, server_default="default")
 
obj = MyObject(id=1, data=None)
session.add(obj)
session.commit()  # INSERT with the 'data' column explicitly set to None;
                  # the ORM still omits it from the statement and the
                  # database will still persist this as the value 'default'

The above operation will persist into the data column theserver default value of "default" and not SQL NULL, even though Nonewas passed; this is a long-standing behavior of the ORM that many applicationshold as an assumption.

So what if we want to actually put NULL into this column, even though thecolumn has a default value? There are two approaches. One is thaton a per-instance level, we assign the attribute using thenull SQL construct:

from sqlalchemy import null
 
obj = MyObject(id=1, data=null())
session.add(obj)
session.commit()  # INSERT with the 'data' column explicitly set as null();
                  # the ORM uses this directly, bypassing all client-
                  # and server-side defaults, and the database will
                  # persist this as the NULL value

The null SQL construct always translates into the SQLNULL value being directly present in the target INSERT statement.

If we’d like to be able to use the Python value None and have thisalso be persisted as NULL despite the presence of column defaults,we can configure this for the ORM using a Core-level modifierTypeEngine.evaluates_none(), which indicatesa type where the ORM should treat the value None the same as any othervalue and pass it through, rather than omitting it as a “missing” value:

class MyObject(Base):
    __tablename__ = 'my_table'
    id = Column(Integer, primary_key=True)
    data = Column(
      String(50).evaluates_none(),  # indicate that None should always be passed
      nullable=True, server_default="default")
 
obj = MyObject(id=1, data=None)
session.add(obj)
session.commit()  # INSERT with the 'data' column explicitly set to None;
                  # the ORM uses this directly, bypassing all client-
                  # and server-side defaults, and the database will
                  # persist this as the NULL value

Evaluating None

The TypeEngine.evaluates_none() modifier is primarily intended tosignal a type where the Python value “None” is significant, the primaryexample being a JSON type which may want to persist the JSON null valuerather than SQL NULL. We are slightly repurposing it here in order tosignal to the ORM that we’d like None to be passed into the type wheneverpresent, even though no special type-level behaviors are assigned to it.

New in version 1.1: added the TypeEngine.evaluates_none() methodin order to indicate that a “None” value should be treated as significant.

Fetching Server-Generated Defaults

As introduced in the sections Server-invoked DDL-Explicit Default Expressions and Marking Implicitly Generated Values, timestamps, and Triggered Columns,the Core supports the notion of database columns for which the databaseitself generates a value upon INSERT and in less common cases upon UPDATEstatements. The ORM features support for such columns regarding beingable to fetch these newly generated values upon flush. This behavior isrequired in the case of primary key columns that are generated by the server,since the ORM has to know the primary key of an object once it is persisted.

In the vast majority of cases, primary key columns that have their valuegenerated automatically by the database are simple integer columns, which areimplemented by the database as either a so-called “autoincrement” column, orfrom a sequence associated with the column. Every database dialect withinSQLAlchemy Core supports a method of retrieving these primary key values whichis often native to the Python DBAPI, and in general this process is automatic,with the exception of a database like Oracle that requires us to specify aSequence explicitly. There is more documentation regarding thisat Column.autoincrement.

For server-generating columns that are not primary key columns or that are notsimple autoincrementing integer columns, the ORM requires that these columnsare marked with an appropriate server_default directive that allows the ORM toretrieve this value. Not all methods are supported on all backends, however,so care must be taken to use the appropriate method. The two questions to beanswered are, 1. is this column part of the primary key or not, and 2. does thedatabase support RETURNING or an equivalent, such as “OUTPUT inserted”; theseare SQL phrases which return a server-generated value at the same time as theINSERT or UPDATE statement is invoked. Databases that support RETURNING orequivalent include PostgreSQL, Oracle, and SQL Server. Databases that do notinclude SQLite and MySQL.

Case 1: non primary key, RETURNING or equivalent is supported

In this case, columns should be marked as FetchedValue or with anexplicit Column.server_default. Theorm.mapper.eager_defaults flag may be used to indicate that thesecolumns should be fetched immediately upon INSERT and sometimes UPDATE:

class MyModel(Base):
    __tablename__ = 'my_table'
 
    id = Column(Integer, primary_key=True)
    timestamp = Column(DateTime(), server_default=func.now())
 
    # assume a database trigger populates a value into this column
    # during INSERT
    special_identifier = Column(String(50), server_default=FetchedValue())
 
    __mapper_args__ = {"eager_defaults": True}

Above, an INSERT statement that does not specify explicit values for“timestamp” or “special_identifier” from the client side will include the“timestamp” and “special_identifier” columns within the RETURNING clause sothey are available immediately. On the PostgreSQL database, an INSERT for theabove table will look like:

INSERT INTO my_table DEFAULT VALUES RETURNING my_table.id, my_table.timestamp, my_table.special_identifier

Case 2: non primary key, RETURNING or equivalent is not supported or not needed

This case is the same as case 1 above, except we don’t specifyorm.mapper.eager_defaults:

class MyModel(Base):
    __tablename__ = 'my_table'
 
    id = Column(Integer, primary_key=True)
    timestamp = Column(DateTime(), server_default=func.now())
 
    # assume a database trigger populates a value into this column
    # during INSERT
    special_identifier = Column(String(50), server_default=FetchedValue())

After a record with the above mapping is INSERTed, the “timestamp” and“special_identifier” columns will remain empty, and will be fetched viaa second SELECT statement when they are first accessed after the flush, e.g.they are marked as “expired”.

If the orm.mapper.eager_defaults is still used, and the backenddatabase does not support RETURNING or an equivalent, the ORM will emit thisSELECT statement immediately following the INSERT statement. This is oftenundesirable as it adds additional SELECT statements to the flush process thatmay not be needed. Using the above mapping with theorm.mapper.eager_defaults flag set to True against MySQL resultsin SQL like this upon flush (minus the comment, which is for clarification only):

INSERT INTO my_table () VALUES ()
 
-- when eager_defaults **is** used, but RETURNING is not supported
SELECT my_table.timestamp AS my_table_timestamp, my_table.special_identifier AS my_table_special_identifier
FROM my_table WHERE my_table.id = %s

Case 3: primary key, RETURNING or equivalent is supported

A primary key column with a server-generated value must be fetched immediatelyupon INSERT; the ORM can only access rows for which it has a primary key value,so if the primary key is generated by the server, the ORM needs a way for thedatabase to give us that new value immediately upon INSERT.

As mentioned above, for integer “autoincrement” columns as well asPostgreSQL SERIAL, these types are handled automatically by the Core; databasesinclude functions for fetching the “last inserted id” where RETURNINGis not supported, and where RETURNING is supported SQLAlchemy will use that.

However, for non-integer values, as well as for integer values that must beexplicitly linked to a sequence or other triggered routine, the server defaultgeneration must be marked in the table metadata.

For an explicit sequence as we use with Oracle, this just means we are usingthe Sequence construct:

class MyOracleModel(Base):
    __tablename__ = 'my_table'
 
    id = Column(Integer, Sequence("my_sequence"), primary_key=True)
    data = Column(String(50))

The INSERT for a model as above on Oracle looks like:

INSERT INTO my_table (id, data) VALUES (my_sequence.nextval, :data) RETURNING my_table.id INTO :ret_0

Where above, SQLAlchemy renders my_sequence.nextval for the primary key columnand also uses RETURNING to get the new value back immediately.

For datatypes that generate values automatically, or columns that are populatedby a trigger, we use FetchedValue. Below is a model that uses aSQL Server TIMESTAMP column as the primary key, which generates values automatically:

class MyModel(Base):
    __tablename__ = 'my_table'
 
    timestamp = Column(TIMESTAMP(), server_default=FetchedValue(), primary_key=True)

An INSERT for the above table on SQL Server looks like:

INSERT INTO my_table OUTPUT inserted.timestamp DEFAULT VALUES

Case 4: primary key, RETURNING or equivalent is not supported

In this area we are generating rows for a database such as SQLite or MySQLwhere some means of generating a default is occurring on the server, but isoutside of the database’s usual autoincrement routine. In this case, we have tomake sure SQLAlchemy can “pre-execute” the default, which means it has to be anexplicit SQL expression.

Note

This section will illustrate multiple recipes involvingdatetime values for MySQL and SQLite, since the datetime datatypes on thesetwo backends have additional idiosyncratic requirements that are useful toillustrate. Keep in mind however that SQLite and MySQL require an explicit“pre-executed” default generator for any auto-generated datatype used asthe primary key other than the usual single-column autoincrementing integervalue.

MySQL with DateTime primary key

Using the example of a DateTime column for MySQL, we add an explicitpre-execute-supported default using the “NOW()” SQL function:

class MyModel(Base):
    __tablename__ = 'my_table'
 
    timestamp = Column(DateTime(), default=func.now(), primary_key=True)

Where above, we select the “NOW()” function to deliver a datetime valueto the column. The SQL generated by the above is:

SELECT now() AS anon_1
INSERT INTO my_table (timestamp) VALUES (%s)
('2018-08-09 13:08:46',)

MySQL with TIMESTAMP primary key

When using the TIMESTAMP datatype with MySQL, MySQL ordinarilyassociates a server-side default with this datatype automatically. Howeverwhen we use one as a primary key, the Core cannot retrieve the newly generatedvalue unless we execute the function ourselves. As TIMESTAMP onMySQL actually stores a binary value, we need to add an additional “CAST” to ourusage of “NOW()” so that we retrieve a binary value that can be persistedinto the column:

from sqlalchemy import cast, Binary
 
class MyModel(Base):
    __tablename__ = 'my_table'
 
    timestamp = Column(
        TIMESTAMP(),
        default=cast(func.now(), Binary),
        primary_key=True)

Above, in addition to selecting the “NOW()” function, we additionally makeuse of the Binary datatype in conjunction with cast() so thatthe returned value is binary. SQL rendered from the above within anINSERT looks like:

SELECT CAST(now() AS BINARY) AS anon_1
INSERT INTO my_table (timestamp) VALUES (%s)
(b'2018-08-09 13:08:46',)

SQLite with DateTime primary key

For SQLite, new timestamps can be generated using the SQL functiondatetime('now', 'localtime') (or specify 'utc' for UTC),however making things more complicated is that this returns a stringvalue, which is then incompatible with SQLAlchemy’s DateTimedatatype (even though the datatype converts the information back into astring for the SQLite backend, it must be passed through as a Python datetime).We therefore must also specify that we’d like to coerce the return value toDateTime when it is returned from the function, which we achieveby passing this as the type_ parameter:

class MyModel(Base):
    __tablename__ = 'my_table'
 
    timestamp = Column(
        DateTime,
        default=func.datetime('now', 'localtime', type_=DateTime),
        primary_key=True)

The above mapping upon INSERT will look like:

SELECT datetime(?, ?) AS datetime_1
('now', 'localtime')
INSERT INTO my_table (timestamp) VALUES (?)
('2018-10-02 13:37:33.000000',)

See also

Column INSERT/UPDATE Defaults

Partitioning Strategies (e.g. multiple database backends per Session)

Simple Vertical Partitioning

Vertical partitioning places different classes, class hierarchies,or mapped tables, across multiple databases, by configuring theSession with the Session.binds argument. Thisargument receives a dictionary that contains any combination ofORM-mapped classes, arbitrary classes within a mapped hierarchy (suchas declarative base classes or mixins), Table objects,and Mapper objects as keys, which then refer typically toEngine or less typically Connection objects as targets.The dictionary is consulted whenever the Session needs toemit SQL on behalf of a particular kind of mapped class in order to locatethe appropriate source of database connectivity:

engine1 = create_engine('postgresql://db1')
engine2 = create_engine('postgresql://db2')
 
Session = sessionmaker()
 
# bind User operations to engine 1, Account operations to engine 2
Session.configure(binds={User:engine1, Account:engine2})
 
session = Session()

Above, SQL operations against either class will make usage of the Enginelinked to that class. The functionality is comprehensive across bothread and write operations; a Query that is against entitiesmapped to engine1 (determined by looking at the first entity in thelist of items requested) will make use of engine1 to run the query. Aflush operation will make use of both engines on a per-class basis as itflushes objects of type User and Account.

In the more common case, there are typically base or mixin classes that can beused to distinguish between operations that are destined for different databaseconnections. The Session.binds argument can accommodate anyarbitrary Python class as a key, which will be used if it is found to be in themro (Python method resolution order) for a particular mapped class.Supposing two declarative bases are representing two different databaseconnections:

BaseA = declarative_base()
 
BaseB = declarative_base()
 
class User(BaseA):
    # ...
 
class Address(BaseA):
    # ...
 
 
class GameInfo(BaseB):
    # ...
 
class GameStats(BaseB):
    # ...
 
 
Session = sessionmaker()
 
# all User/Address operations will be on engine 1, all
# Game operations will be on engine 2
Session.configure(binds={BaseA:engine1, BaseB:engine2})

Above, classes which descend from BaseA and BaseB will have theirSQL operations routed to one of two engines based on which superclassthey descend from, if any. In the case of a class that descends from morethan one “bound” superclass, the superclass that is highest in the targetclass’ hierarchy will be chosen to represent which engine should be used.

See also

Session.binds

Coordination of Transactions for a multiple-engine Session

One caveat to using multiple bound engines is in the case where a commitoperation may fail on one backend after the commit has succeeded on another.This is an inconsistency problem that in relational databases is solvedusing a “two phase transaction”, which adds an additional “prepare” stepto the commit sequence that allows for multiple databases to agree to commitbefore actually completing the transaction.

Due to limited support within DBAPIs, SQLAlchemy has limited support for two-phase transactions across backends. Most typically, it is known to work wellwith the PostgreSQL backend and to a lesser extent with the MySQL backend.However, the Session is fully capable of taking advantage of the twophase transaction feature when the backend supports it, by setting theSession.use_twophase flag within sessionmaker orSession. See Enabling Two-Phase Commit for an example.

Custom Vertical Partitioning

More comprehensive rule-based class-level partitioning can be built byoverriding the Session.get_bind() method. Below we illustratea custom Session which delivers the following rules:

Flush operations are delivered to the engine named master.
Operations on objects that subclass MyOtherClass alloccur on the other engine.
Read operations for all other classes occur on a randomchoice of the slave1 or slave2 database.

engines = {
    'master':create_engine("sqlite:///master.db"),
    'other':create_engine("sqlite:///other.db"),
    'slave1':create_engine("sqlite:///slave1.db"),
    'slave2':create_engine("sqlite:///slave2.db"),
}
 
from sqlalchemy.orm import Session, sessionmaker
import random
 
class RoutingSession(Session):
    def get_bind(self, mapper=None, clause=None):
        if mapper and issubclass(mapper.class_, MyOtherClass):
            return engines['other']
        elif self._flushing:
            return engines['master']
        else:
            return engines[
                random.choice(['slave1','slave2'])
            ]

The above Session class is plugged in using the class_argument to sessionmaker:

Session = sessionmaker(class_=RoutingSession)

This approach can be combined with multiple MetaData objects,using an approach such as that of using the declarative abstractkeyword, described at abstract.

See also

Django-style Database Routers in SQLAlchemy - blog post on a more comprehensive example of Session.get_bind()

Horizontal Partitioning

Horizontal partitioning partitions the rows of a single table (or a set oftables) across multiple databases. The SQLAlchemy Sessioncontains support for this concept, however to use it fully requires thatSession and Query subclasses are used. A basic versionof these subclasses are available in the Horizontal ShardingORM extension. An example of use is at: Horizontal Sharding.

Bulk Operations

Note

Bulk Operations mode is a new series of operations made availableon the Session object for the purpose of invoking INSERT andUPDATE statements with greatly reduced Python overhead, at the expenseof much less functionality, automation, and error checking.As of SQLAlchemy 1.0, these features should be considered as “beta”, andadditionally are intended for advanced users.

New in version 1.0.0.

Bulk operations on the Session include Session.bulk_save_objects(),Session.bulk_insert_mappings(), and Session.bulk_update_mappings().The purpose of these methods is to directly expose internal elements of the unit of work system,such that facilities for emitting INSERT and UPDATE statements given dictionariesor object states can be utilized alone, bypassing the normal unit of workmechanics of state, relationship and attribute management. The advantagesto this approach is strictly one of reduced Python overhead:

The flush() process, including the survey of all objects, their state,their cascade status, the status of all objects associated with themvia relationship(), and the topological sort of all operations tobe performed is completely bypassed. This reduces a great amount ofPython overhead.
The objects as given have no defined relationship to the targetSession, even when the operation is complete, meaning there’s nooverhead in attaching them or managing their state in terms of the identitymap or session.
The Session.bulk_insert_mappings() and Session.bulk_update_mappings()methods accept lists of plain Python dictionaries, not objects; this furtherreduces a large amount of overhead associated with instantiating mappedobjects and assigning state to them, which normally is also subject toexpensive tracking of history on a per-attribute basis.
The set of objects passed to all bulk methods are processedin the order they are received. In the case ofSession.bulk_save_objects(), when objects of different types are passed,the INSERT and UPDATE statements are necessarily broken up into per-typegroups. In order to reduce the number of batch INSERT or UPDATE statementspassed to the DBAPI, ensure that the incoming list of objectsare grouped by type.
The process of fetching primary keys after an INSERT also is disabled bydefault. When performed correctly, INSERT statements can now more readilybe batched by the unit of work process into executemany() blocks, whichperform vastly better than individual statement invocations.
UPDATE statements can similarly be tailored such that all attributesare subject to the SET clause unconditionally, again making it much morelikely that executemany() blocks can be used.

The performance behavior of the bulk routines should be studied using thePerformance example suite. This is a series of examplescripts which illustrate Python call-counts across a variety of scenarios,including bulk insert and update scenarios.

See also

Performance - includes detailed examples of bulk operationscontrasted against traditional Core and ORM methods, including performancemetrics.

Usage

The methods each work in the context of the Session object’stransaction, like any other:

s = Session()
objects = [
    User(name="u1"),
    User(name="u2"),
    User(name="u3")
]
s.bulk_save_objects(objects)

For Session.bulk_insert_mappings(), and Session.bulk_update_mappings(),dictionaries are passed:

s.bulk_insert_mappings(User,
  [dict(name="u1"), dict(name="u2"), dict(name="u3")]
)

See also

Session.bulk_save_objects()

Session.bulk_insert_mappings()

Session.bulk_update_mappings()

Comparison to Core Insert / Update Constructs

The bulk methods offer performance that under particular circumstancescan be close to that of using the core Insert andUpdate constructs in an “executemany” context (for a descriptionof “executemany”, see Executing Multiple Statements in the Core tutorial).In order to achieve this, theSession.bulk_insert_mappings.return_defaultsflag should be disabled so that rows can be batched together. The examplesuite in Performance should be carefully studied in orderto gain familiarity with how fast bulk performance can be achieved.

ORM Compatibility

The bulk insert / update methods lose a significant amount of functionalityversus traditional ORM use. The following is a listing of features thatare not available when using these methods:

persistence along relationship() linkages
sorting of rows within order of dependency; rows are inserted or updateddirectly in the order in which they are passed to the methods
Session-management on the given objects, including attachment to thesession, identity map management.
Functionality related to primary key mutation, ON UPDATE cascade
SQL expression inserts / updates (e.g. Embedding SQL Insert/Update Expressions into a Flush)
ORM events such as MapperEvents.before_insert(), etc. The bulksession methods have no event support.

Features that are available include:

INSERTs and UPDATEs of mapped objects
Version identifier support
Multi-table mappings, such as joined-inheritance - however, an objectto be inserted across multiple tables either needs to have primary keyidentifiers fully populated ahead of time, else theSession.bulk_save_objects.return_defaults flag must be used,which will greatly reduce the performance benefits