Baked Queries
baked
provides an alternative creational pattern forQuery
objects, which allows for caching of the object’sconstruction and string-compilation steps. This means that for aparticular Query
building scenario that is used more thanonce, all of the Python function invocation involved in building the queryfrom its initial construction up through generating a SQL string will onlyoccur once, rather than for each time that query is built up and executed.
The rationale for this system is to greatly reduce Python interpreteroverhead for everything that occurs before the SQL is emitted.The caching of the “baked” system does not in any way reduce SQL calls orcache the return results from the database. A technique that demonstratesthe caching of the SQL calls and result sets themselves is available inDogpile Caching.
New in version 1.0.0.
Note
The sqlalchemy.ext.baked
extension is not for beginners. Usingit correctly requires a good high level understanding of how SQLAlchemy, thedatabase driver, and the backend database interact with each other. Thisextension presents a very specific kind of optimization that is not ordinarilyneeded. As noted above, it does not cache queries, only the stringformulation of the SQL itself.
Synopsis
Usage of the baked system starts by producing a so-called “bakery”, whichrepresents storage for a particular series of query objects:
- from sqlalchemy.ext import baked
- bakery = baked.bakery()
The above “bakery” will store cached data in an LRU cache that defaultsto 200 elements, noting that an ORM query will typically contain one entryfor the ORM query as invoked, as well as one entry per database dialect forthe SQL string.
The bakery allows us to build up a Query
object by specifyingits construction as a series of Python callables, which are typically lambdas.For succinct usage, it overrides the +=
operator so that a typicalquery build-up looks like the following:
- from sqlalchemy import bindparam
- def search_for_user(session, username, email=None):
- baked_query = bakery(lambda session: session.query(User))
- baked_query += lambda q: q.filter(User.name == bindparam('username'))
- baked_query += lambda q: q.order_by(User.id)
- if email:
- baked_query += lambda q: q.filter(User.email == bindparam('email'))
- result = baked_query(session).params(username=username, email=email).all()
- return result
Following are some observations about the above code:
The
bakedquery
object is an instance ofBakedQuery
. Thisobject is essentially the “builder” for a real ormQuery
object, but it is not itself the _actualQuery
object.The actual
Query
object is not built at all, until thevery end of the function whenResult.all()
is called.The steps that are added to the
baked_query
object are all expressedas Python functions, typically lambdas. The first lambda givento thebakery()
function receives aSession
as itsargument. The remaining lambdas each receive aQuery
as their argument.In the above code, even though our application may call upon
searchfor_user()
many times, and even though within each invocationwe build up an entirely newBakedQuery
object,_all of the lambdas are only called once. Each lambda is never calleda second time for as long as this query is cached in the bakery.The caching is achieved by storing references to the lambda objectsthemselves in order to formulate a cache key; that is, the fact that thePython interpreter assigns an in-Python identity to these functions iswhat determines how to identify the query on successive runs. Forthose invocations of
search_for_user()
where theemail
parameteris specified, the callablelambda q: q.filter(User.email == bindparam('email'))
will be part of the cache key that’s retrieved; whenemail
isNone
, this callable is not part of the cache key.Because the lambdas are all called only once, it is essential that novariables which may change across calls are referenced within thelambdas; instead, assuming these are values to be bound into theSQL string, we use
bindparam()
to construct named parameters,where we apply their actual values later usingResult.params()
.
Performance
The baked query probably looks a little odd, a little bit awkward anda little bit verbose. However, the savings inPython performance for a query which is invoked lots of times in anapplication are very dramatic. The example suite short_selects
demonstrated in Performance illustrates a comparisonof queries which each return only one row, such as the following regularquery:
- session = Session(bind=engine)
- for id_ in random.sample(ids, n):
- session.query(Customer).filter(Customer.id == id_).one()
compared to the equivalent “baked” query:
- bakery = baked.bakery()
- s = Session(bind=engine)
- for id_ in random.sample(ids, n):
- q = bakery(lambda s: s.query(Customer))
- q += lambda q: q.filter(Customer.id == bindparam('id'))
- q(s).params(id=id_).one()
The difference in Python function call count for an iteration of 10000calls to each block are:
- test_baked_query : test a baked query of the full entity.
- (10000 iterations); total fn calls 1951294
- test_orm_query : test a straight ORM query of the full entity.
- (10000 iterations); total fn calls 7900535
In terms of number of seconds on a powerful laptop, this comes out as:
- test_baked_query : test a baked query of the full entity.
- (10000 iterations); total time 2.174126 sec
- test_orm_query : test a straight ORM query of the full entity.
- (10000 iterations); total time 7.958516 sec
Note that this test very intentionally features queries that only return one row.For queries that return many rows, the performance advantage of the baked query will haveless and less of an impact, proportional to the time spent fetching rows.It is critical to keep in mind that the baked query feature only applies tobuilding the query itself, not the fetching of results. Using thebaked feature is by no means a guarantee to a much faster application; it isonly a potentially useful feature for those applications that have been measuredas being impacted by this particular form of overhead.
Measure twice, cut once
For background on how to profile a SQLAlchemy application, please seethe section Performance. It is essential that performancemeasurement techniques are used when attempting to improve the performanceof an application.
Rationale
The “lambda” approach above is a superset of what would be a moretraditional “parameterized” approach. Suppose we wished to builda simple system where we build a Query
just once, thenstore it in a dictionary for re-use. This is possible right now byjust building up the query, and removing its Session
by callingmy_cached_query = query.with_session(None)
:
- my_simple_cache = {}
- def lookup(session, id_argument):
- if "my_key" not in my_simple_cache:
- query = session.query(Model).filter(Model.id == bindparam('id'))
- my_simple_cache["my_key"] = query.with_session(None)
- else:
- query = my_simple_cache["my_key"].with_session(session)
- return query.params(id=id_argument).all()
The above approach gets us a very minimal performance benefit.By re-using a Query
, we save on the Python work withinthe session.query(Model)
constructor as well as calling uponfilter(Model.id == bindparam('id'))
, which will skip for us the buildingup of the Core expression as well as sending it to Query.filter()
.However, the approach still regenerates the full Select
object every time when Query.all()
is called and additionally thisbrand new Select
is sent off to the string compilation step everytime, which for a simple case like the above is probably about 70% of theoverhead.
To reduce the additional overhead, we need some more specialized logic,some way to memoize the construction of the select object and theconstruction of the SQL. There is an example of this on the wikiin the section BakedQuery,a precursor to this feature, however in that system, we aren’t cachingthe construction of the query. In order to remove all the overhead,we need to cache both the construction of the query as well as the SQLcompilation. Let’s assume we adapted the recipe in this wayand made ourselves a method .bake()
that pre-compiles the SQL for thequery, producing a new object that can be invoked with minimal overhead.Our example becomes:
- my_simple_cache = {}
- def lookup(session, id_argument):
- if "my_key" not in my_simple_cache:
- query = session.query(Model).filter(Model.id == bindparam('id'))
- my_simple_cache["my_key"] = query.with_session(None).bake()
- else:
- query = my_simple_cache["my_key"].with_session(session)
- return query.params(id=id_argument).all()
Above, we’ve fixed the performance situation, but we still have thisstring cache key to deal with.
We can use the “bakery” approach to re-frame the above in a way thatlooks less unusual than the “building up lambdas” approach, and more likea simple improvement upon the simple “reuse a query” approach:
- bakery = baked.bakery()
- def lookup(session, id_argument):
- def create_model_query(session):
- return session.query(Model).filter(Model.id == bindparam('id'))
- parameterized_query = bakery.bake(create_model_query)
- return parameterized_query(session).params(id=id_argument).all()
Above, we use the “baked” system in a manner that isvery similar to the simplistic “cache a query” system. However, ituses two fewer lines of code, does not need to manufacture a cache key of“my_key”, and also includes the same feature as our custom “bake” functionthat caches 100% of the Python invocation work from theconstructor of the query, to the filter call, to the productionof the Select
object, to the string compilation step.
From the above, if we ask ourselves, “what if lookup needs to make conditional decisionsas to the structure of the query?”, this is where hopefully it becomes apparentwhy “baked” is the way it is. Instead of a parameterized query buildingoff from exactly one function (which is how we thought baked might workoriginally), we can build it from any number of functions. Considerour naive example, if we needed to have an additional clause in ourquery on a conditional basis:
- my_simple_cache = {}
- def lookup(session, id_argument, include_frobnizzle=False):
- if include_frobnizzle:
- cache_key = "my_key_with_frobnizzle"
- else:
- cache_key = "my_key_without_frobnizzle"
- if cache_key not in my_simple_cache:
- query = session.query(Model).filter(Model.id == bindparam('id'))
- if include_frobnizzle:
- query = query.filter(Model.frobnizzle == True)
- my_simple_cache[cache_key] = query.with_session(None).bake()
- else:
- query = my_simple_cache[cache_key].with_session(session)
- return query.params(id=id_argument).all()
Our “simple” parameterized system must now be tasked with generatingcache keys which take into account whether or not the “include_frobnizzle”flag was passed, as the presence of this flag means that the generatedSQL would be entirely different. It should be apparent that as thecomplexity of query building goes up, the task of caching these queriesbecomes burdensome very quickly. We can convert the above exampleinto a direct use of “bakery” as follows:
- bakery = baked.bakery()
- def lookup(session, id_argument, include_frobnizzle=False):
- def create_model_query(session):
- return session.query(Model).filter(Model.id == bindparam('id'))
- parameterized_query = bakery.bake(create_model_query)
- if include_frobnizzle:
- def include_frobnizzle_in_query(query):
- return query.filter(Model.frobnizzle == True)
- parameterized_query = parameterized_query.with_criteria(
- include_frobnizzle_in_query)
- return parameterized_query(session).params(id=id_argument).all()
Above, we again cache not just the query object but all the work it needsto do in order to generate SQL. We also no longer need to deal withmaking sure we generate a cache key that accurately takes into accountall of the structural modifications we’ve made; this is now handledautomatically and without the chance of mistakes.
This code sample is a few lines shorter than the naive example, removesthe need to deal with cache keys, and has the vast performance benefitsof the full so-called “baked” feature. Butstill a little verbose! Hence we take methods like BakedQuery.add_criteria()
and BakedQuery.with_criteria()
and shorten them into operators, andencourage (though certainly not require!) using simple lambdas, only as ameans to reduce verbosity:
- bakery = baked.bakery()
- def lookup(session, id_argument, include_frobnizzle=False):
- parameterized_query = bakery.bake(
- lambda s: s.query(Model).filter(Model.id == bindparam('id'))
- )
- if include_frobnizzle:
- parameterized_query += lambda q: q.filter(Model.frobnizzle == True)
- return parameterized_query(session).params(id=id_argument).all()
Where above, the approach is simpler to implement and much more similarin code flow to what a non-cached querying function would look like,hence making code easier to port.
The above description is essentially a summary of the design process usedto arrive at the current “baked” approach. Starting from the“normal” approaches, the additional issues of cache key construction andmanagement, removal of all redundant Python execution, and queries built upwith conditionals needed to be addressed, leading to the final approach.
Special Query Techniques
This section will describe some techniques for specific query situations.
Using IN expressions
The ColumnOperators.in_()
method in SQLAlchemy historically rendersa variable set of bound parameters based on the list of items that’s passedto the method. This doesn’t work for baked queries as the length of thatlist can change on different calls. To solve this problem, thebindparam.expanding
parameter supports a late-rendered INexpression that is safe to be cached inside of baked query. The actual listof elements is rendered at statement execution time, rather than atstatement compilation time:
- bakery = baked.bakery()
- baked_query = bakery(lambda session: session.query(User))
- baked_query += lambda q: q.filter(
- User.name.in_(bindparam('username', expanding=True)))
- result = baked_query.with_session(session).params(
- username=['ed', 'fred']).all()
See also
Using Subqueries
When using Query
objects, it is often needed that one Query
object is used to generate a subquery within another. In the case where theQuery
is currently in baked form, an interim method may be used toretrieve the Query
object, using the BakedQuery.to_query()
method. This method is passed the Session
or Query
that isthe argument to the lambda callable used to generate a particular stepof the baked query:
- bakery = baked.bakery()
- # a baked query that will end up being used as a subquery
- my_subq = bakery(lambda s: s.query(User.id))
- my_subq += lambda q: q.filter(User.id == Address.user_id)
- # select a correlated subquery in the top columns list,
- # we have the "session" argument, pass that
- my_q = bakery(
- lambda s: s.query(Address.id, my_subq.to_query(s).as_scalar()))
- # use a correlated subquery in some of the criteria, we have
- # the "query" argument, pass that.
- my_q += lambda q: q.filter(my_subq.to_query(q).exists())
New in version 1.3.
Disabling Baked Queries Session-wide
The flag Session.enable_baked_queries
may be set to False,causing all baked queries to not use the cache when used against thatSession
:
- session = Session(engine, enable_baked_queries=False)
Like all session flags, it is also accepted by factory objects likesessionmaker
and methods like sessionmaker.configure()
.
The immediate rationale for this flag is to reduce memory use in the casethat the query baking used by relationship loaders and other loadersis not desirable. It also can be used in the case that an applicationwhich is seeing issues potentially due to cache key conflicts from user-definedbaked queries or other baked query issues can turn the behavior off, inorder to identify or eliminate baked queries as the cause of an issue.
New in version 1.2.
Lazy Loading Integration
The baked query system is integrated into SQLAlchemy’s lazy loader featureas used by relationship()
, and will cache queries for most lazyload conditions. A small subset of“lazy loads” may not be cached; these involve query options in conjunction with ad-hocaliased
structures that cannot produce a repeatable cachekey.
Changed in version 1.2: “baked” queries are now the foundation of thelazy-loader feature of relationship()
.
Opting out with the bake_queries flag
The relationship()
construct includes a flagrelationship.bake_queries
which when set to False will causethat relationship to opt out of caching queries. Additionally, theSession.enable_baked_queries
setting can be used to disableall “baked query” use. These flags can be useful to conserve memory,when memory conservation is more important than performance for a particularrelationship or for the application overall.
API Documentation
sqlalchemy.ext.baked.
bakery
(size=200, _size_alert=None)Construct a new bakery.
- Returns
- an instance of
Bakery
class
sqlalchemy.ext.baked.
BakedQuery
(bakery, initial_fn, args=())A builder object for
query.Query
objects.addcriteria
(_fn, *args)- Add a criteria function to this
BakedQuery
.
This is equivalent to using the +=
operator tomodify a BakedQuery
in-place.
- classmethod
bakery
(size=200, _size_alert=None) Construct a new bakery.
- Returns
- an instance of
Bakery
- Return a
Result
object for thisBakedQuery
.
This is equivalent to calling the BakedQuery
as aPython callable, e.g. result = my_baked_query(session)
.
The BakedQuery can continue to be used normally, however additionalcreational functions will not be cached; they will be calledon every invocation.
This is to support the case where a particular step in constructinga baked query disqualifies the query from being cacheable, suchas a variant that relies upon some uncacheable value.
- Parameters
-
full – if False, only functions added to thisBakedQuery
object subsequent to the spoil step will benon-cached; the state of the BakedQuery
up untilthis point will be pulled from the cache. If True, then theentire Query
object is built from scratch eachtime, with all creational functions being called on eachinvocation.
toquery
(_query_or_session)- Return the
Query
object for use as a subquery.
This method should be used within the lambda callable being usedto generate a step of an enclosing BakedQuery
. Theparameter should normally be the Query
object thatis passed to the lambda:
- sub_bq = self.bakery(lambda s: s.query(User.name))
- sub_bq += lambda q: q.filter(
- User.id == Address.user_id).correlate(Address)
- main_bq = self.bakery(lambda s: s.query(Address))
- main_bq += lambda q: q.filter(
- sub_bq.to_query(q).exists())
In the case where the subquery is used in the first callable againsta Session
, the Session
is also accepted:
- sub_bq = self.bakery(lambda s: s.query(User.name))
- sub_bq += lambda q: q.filter(
- User.id == Address.user_id).correlate(Address)
- main_bq = self.bakery(
- lambda s: s.query(Address.id, sub_bq.to_query(q).as_scalar())
- )
- Parameters
-
a Query
object or a classSession
object, that is assumed to be within the contextof an enclosing BakedQuery
callable.
New in version 1.3.
withcriteria
(_fn, *args)- Add a criteria function to a
BakedQuery
cloned from this one.
This is equivalent to using the +
operator toproduce a new BakedQuery
with modifications.
- class
sqlalchemy.ext.baked.
Bakery
(cls__, _cache) - Callable which returns a
BakedQuery
.
This object is returned by the class methodBakedQuery.bakery()
. It exists as an objectso that the “cache” can be easily inspected.
New in version 1.2.
- class
sqlalchemy.ext.baked.
Result
(bq, session) - Invokes a
BakedQuery
against aSession
.
The Result
object is where the actual query.Query
object gets created, or retrieved from the cache,against a target Session
, and is then invoked for results.
Equivalent to Query.all()
.
Equivalent to Query.count()
.
Note this uses a subquery to ensure an accurate count regardlessof the structure of the original statement.
New in version 1.1.6.
Equivalent to Query.first()
.
Equivalent to Query.get()
.
Equivalent to Query.one()
.
Equivalent to Query.one_or_none()
.
New in version 1.0.9.
params
(*args, **kw)Specify parameters to be replaced into the string SQL statement.
- Return the first element of the first result or Noneif no rows present. If multiple rows are returned,raises MultipleResultsFound.
Equivalent to Query.scalar()
.
New in version 1.1.6.
This adds a function that will be run against theQuery
object after it is retrieved from thecache. Functions here can be used to alter the query in waysthat do not affect the SQL output, such as execution optionsand shard identifiers (when using a shard-enabled query object)
Warning
Result.with_post_criteria()
functions are appliedto the Query
object after the query’s SQL statementobject has been retrieved from the cache. Any operations herewhich intend to modify the SQL should ensure thatBakedQuery.spoil()
was called first.
New in version 1.2.