Graph Structures
Debugging or profiling code written in Theano is not that simple if youdo not know what goes on under the hood. This chapter is meant tointroduce you to a required minimum of the inner workings of Theano.
The first step in writing Theano code is to write down all mathematicalrelations using symbolic placeholders (variables). When writing downthese expressions you use operations like +
, -
, ,
sum()
, tanh()
. All these are represented internally as ops**.An op represents a certain computation on some type of inputsproducing some type of output. You can see it as a _function definition_in most programming languages.
Theano represents symbolic mathematical computations as graphs. Thesegraphs are composed of interconnected Apply, Variable andOp nodes. Apply node represents the application of an op to somevariables. It is important to draw the difference between thedefinition of a computation represented by an op and its applicationto some actual data which is represented by the apply node.Furthermore, data types are represented by Type instances. Here is apiece of code and a diagram showing the structure built by that piece of code.This should help you understand how these pieces fit together:
Code
- import theano.tensor as T
- x = T.dmatrix('x')
- y = T.dmatrix('y')
- z = x + y
Diagram Arrows represent references to the Python objects pointed at. The bluebox is an Apply node. Red boxes are Variable nodes. Greencircles are Ops. Purple boxes are Types.
When we create Variables and then ApplyOps to them to make more Variables, we build abi-partite, directed, acyclic graph. Variables point to the Apply nodesrepresenting the function application producing them via theirowner
field. These Apply nodes point in turn to their input andoutput Variables via their inputs
and outputs
fields.(Apply instances also contain a list of references to their outputs
, butthose pointers don’t count in this graph.)
The owner
field of both x
and y
point to None
becausethey are not the result of another computation. If one of them was theresult of another computation, it’s owner
field would point to anotherblue box like z
does, and so on.
Note that the Apply
instance’s outputs points toz
, and z.owner
points back to the Apply
instance.
Traversing the graph
The graph can be traversed starting from outputs (the result of somecomputation) down to its inputs using the owner field.Take for example the following code:
- >>> import theano
- >>> x = theano.tensor.dmatrix('x')
- >>> y = x * 2.
If you enter type(y.owner)
you get <class 'theano.gof.graph.Apply'>
,which is the apply node that connects the op and the inputs to get thisoutput. You can now print the name of the op that is applied to gety:
- >>> y.owner.op.name
- 'Elemwise{mul,no_inplace}'
Hence, an elementwise multiplication is used to compute y. Thismultiplication is done between the inputs:
- >>> len(y.owner.inputs)
- 2
- >>> y.owner.inputs[0]
- x
- >>> y.owner.inputs[1]
- DimShuffle{x,x}.0
Note that the second input is not 2 as we would have expected. This isbecause 2 was first broadcasted to a matrix ofsame shape as x. This is done by using the op DimShuffle
:
- >>> type(y.owner.inputs[1])
- <class 'theano.tensor.var.TensorVariable'>
- >>> type(y.owner.inputs[1].owner)
- <class 'theano.gof.graph.Apply'>
- >>> y.owner.inputs[1].owner.op
- <theano.tensor.elemwise.DimShuffle object at 0x106fcaf10>
- >>> y.owner.inputs[1].owner.inputs
- [TensorConstant{2.0}]
Starting from this graph structure it is easier to understand howautomatic differentiation proceeds and how the symbolic relationscan be optimized for performance or stability.
Graph Structures
The following section outlines each type of structure that may be usedin a Theano-built computation graph. The following structures areexplained: Apply, Constant, Op, Variable andType.
Apply
An Apply node is a type of internal node used to represent acomputation graph in Theano. UnlikeVariable nodes, Apply nodes are usually notmanipulated directly by the end user. They may be accessed viaa Variable’s owner
field.
An Apply node is typically an instance of the Apply
class. It represents the applicationof an Op on one or more inputs, where each input is aVariable. By convention, each Op is responsible forknowing how to build an Apply node from a list ofinputs. Therefore, an Apply node may be obtained from an Opand a list of inputs by calling Op.make_node(*inputs)
.
Comparing with the Python language, an Apply node isTheano’s version of a function call whereas an Op isTheano’s version of a function definition.
An Apply instance has three important fields:
- op
- An Op that determines the function/transformation beingapplied here.
- inputs
- A list of Variables that represent the arguments ofthe function.
- outputs
- A list of Variables that represent the return valuesof the function.
An Apply instance can be created by calling gof.Apply(op, inputs, outputs)
.
Op
An Op in Theano defines a certain computation on some types ofinputs, producing some types of outputs. It is equivalent to afunction definition in most programming languages. From a list ofinput Variables and an Op, you can build an Applynode representing the application of the Op to the inputs.
It is important to understand the distinction between an Op (thedefinition of a function) and an Apply node (the application of afunction). If you were to interpret the Python language using Theano’sstructures, code going like def f(x): …
would produce an Op forf
whereas code like a = f(x)
or g(f(4), 5)
would produce anApply node involving the f
Op.
Type
A Type in Theano represents a set of constraints on potentialdata objects. These constraints allow Theano to tailor C code to handlethem and to statically optimize the computation graph. For instance,the irow type in the theano.tensor
packagegives the following constraints on the data the Variables of type irow
may contain:
- Must be an instance of
numpy.ndarray
:isinstance(x, numpy.ndarray)
- Must be an array of 32-bit integers:
str(x.dtype) == 'int32'
- Must have a shape of 1xN:
len(x.shape) == 2 and x.shape[0] == 1
Knowing these restrictions, Theano may generate C code for addition, etc.that declares the right data types and that contains the right numberof loops over the dimensions.
Note that a Theano Type is not equivalent to a Python type orclass. Indeed, in Theano, irow and dmatrix both use numpy.ndarray
as the underlying typefor doing computations and storing data, yet they are different TheanoTypes. Indeed, the constraints set by dmatrix
are:
- Must be an instance of
numpy.ndarray
:isinstance(x, numpy.ndarray)
- Must be an array of 64-bit floating point numbers:
str(x.dtype) == 'float64'
- Must have a shape of MxN, no restriction on M or N:
len(x.shape) == 2
These restrictions are different from those ofirow
which are listed above.
There are cases in which a Type can fully correspond to a Python type,such as the double
Type we will define here, which corresponds toPython’s float
. But, it’s good to know that this is not necessarilythe case. Unless specified otherwise, when we say “Type” we mean aTheano Type.
Variable
A Variable is the main data structure you work with when usingTheano. The symbolic inputs that you operate on are Variables and whatyou get from applying various Ops to these inputs are alsoVariables. For example, when I type
- >>> import theano
- >>> x = theano.tensor.ivector()
- >>> y = -x
x
and y
are both Variables, i.e. instances of the Variable
class. The Type of both x
andy
is theano.tensor.ivector
.
Unlike x
, y
is a Variable produced by a computation (in thiscase, it is the negation of x
). y
is the Variable corresponding tothe output of the computation, while x
is the Variablecorresponding to its input. The computation itself is represented byanother type of node, an Apply node, and may be accessedthrough y.owner
.
More specifically, a Variable is a basic structure in Theano thatrepresents a datum at a certain point in computation. It is typicallyan instance of the class Variable
orone of its subclasses.
A Variable r
contains four important fields:
- type
- a Type defining the kind of value this Variable can hold incomputation.
- owner
- this is either None or an Apply node of which the Variable isan output.
- index
- the integer such that
owner.outputs[index] is r
(ignored ifowner
is None) - name
- a string to use in pretty-printing and debugging.
Variable has one special subclass: Constant.
Constant
A Constant is a Variable with one extra field, data (onlysettable once). When used in a computation graph as the input of anOp application, it is assumed that said inputwill always take the value contained in the constant’s datafield. Furthermore, it is assumed that the Op will not underany circumstances modify the input. This means that a constant iseligible to participate in numerous optimizations: constant inliningin C code, constant folding, etc.
A constant does not need to be specified in a function
‘s listof inputs. In fact, doing so will raise an exception.
Graph Structures Extension
When we start the compilation of a Theano function, we compute someextra information. This section describes a portion of the informationthat is made available. Not everything is described, so emailtheano-dev if you need something that is missing.
The graph gets cloned at the start of compilation, so modifications doneduring compilation won’t affect the user graph.
Each variable receives a new field called clients. It is a list withreferences to every place in the graph where this variable is used. Ifits length is 0, it means the variable isn’t used. Each place where itis used is described by a tuple of 2 elements. There are two types ofpairs:
- The first element is an Apply node.
- The first element is the string “output”. It means thefunction outputs this variable.
In both types of pairs, the second element of the tuple is an index,such that: var.clients[*][0].inputs[index]
orfgraph.outputs[index]
is that variable.
- >>> import theano
- >>> v = theano.tensor.vector()
- >>> f = theano.function([v], (v+1).sum())
- >>> theano.printing.debugprint(f)
- Sum{acc_dtype=float64} [id A] '' 1
- |Elemwise{add,no_inplace} [id B] '' 0
- |TensorConstant{(1,) of 1.0} [id C]
- |<TensorType(float64, vector)> [id D]
- >>> # Sorted list of all nodes in the compiled graph.
- >>> topo = f.maker.fgraph.toposort()
- >>> topo[0].outputs[0].clients
- [(Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)]
- >>> topo[1].outputs[0].clients
- [('output', 0)]
- >>> # An internal variable
- >>> var = topo[0].outputs[0]
- >>> client = var.clients[0]
- >>> client
- (Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)
- >>> type(client[0])
- <class 'theano.gof.graph.Apply'>
- >>> assert client[0].inputs[client[1]] is var
- >>> # An output of the graph
- >>> var = topo[1].outputs[0]
- >>> client = var.clients[0]
- >>> client
- ('output', 0)
- >>> assert f.maker.fgraph.outputs[client[1]] is var
Automatic Differentiation
Having the graph structure, computing automatic differentiation issimple. The only thing tensor.grad()
has to do is to traverse thegraph from the outputs back towards the inputs through all apply_nodes (_apply nodes are those that define which computations thegraph does). For each such apply node, its op defineshow to compute the gradient of the node’s outputs with respect to itsinputs. Note that if an op does not provide this information,it is assumed that the gradient is not defined.Using thechain rulethese gradients can be composed in order to obtain the expression of thegradient of the graph’s output with respect to the graph’s inputs.
A following section of this tutorial will examine the topic of differentiationin greater detail.
Optimizations
When compiling a Theano function, what you give to thetheano.function
is actually a graph(starting from the output variables you can traverse the graph up tothe input variables). While this graph structure shows how to computethe output from the input, it also offers the possibility to improve theway this computation is carried out. The way optimizations work inTheano is by identifying and replacing certain patterns in the graphwith other specialized patterns that produce the same results but are eitherfaster or more stable. Optimizations can also detectidentical subgraphs and ensure that the same values are not computedtwice or reformulate parts of the graph to a GPU specific version.
For example, one (simple) optimization that Theano uses is to replacethe pattern by x.
Further information regarding the optimizationprocess and the specific optimizations that are applicableis respectively available in the library and on the entrance page of the documentation.
Example
Symbolic programming involves a change of paradigm: it will become cleareras we apply it. Consider the following example of optimization:
- >>> import theano
- >>> a = theano.tensor.vector("a") # declare symbolic variable
- >>> b = a + a ** 10 # build symbolic expression
- >>> f = theano.function([a], b) # compile function
- >>> print(f([0, 1, 2])) # prints `array([0,2,1026])`
- [ 0. 2. 1026.]
- >>> theano.printing.pydotprint(b, outfile="./pics/symbolic_graph_unopt.png", var_with_name_simple=True)
- The output file is available at ./pics/symbolic_graph_unopt.png
- >>> theano.printing.pydotprint(f, outfile="./pics/symbolic_graph_opt.png", var_with_name_simple=True)
- The output file is available at ./pics/symbolic_graph_opt.png
We used theano.printing.pydotprint()
to visualize the optimized graph(right), which is much more compact than the unoptimized graph (left).
Unoptimized graph | Optimized graph | |
---|---|---|