How Shape Information is Handled by Theano
It is not possible to strictly enforce the shape of a Theano variable whenbuilding a graph since the particular value provided at run-time for a parameter of aTheano function may condition the shape of the Theano variables in its graph.
Currently, information regarding shape is used in two ways in Theano:
To generate faster C code for the 2d convolution on the CPU and the GPU,when the exact output shape is known in advance.
To remove computations in the graph when we only want to know theshape, but not the actual value of a variable. This is done with theOp.infer_shapemethod.
Example:
- >>> import theano
- >>> x = theano.tensor.matrix('x')
- >>> f = theano.function([x], (x ** 2).shape)
- >>> theano.printing.debugprint(f)
- MakeVector{dtype='int64'} [id A] '' 2
- |Shape_i{0} [id B] '' 1
- | |x [id C]
- |Shape_i{1} [id D] '' 0
- |x [id C]
The output of this compiled function does not contain any multiplicationor power. Theano has removed them to compute directly the shape of theoutput.
Shape Inference Problem
Theano propagates information about shape in the graph. Sometimes thiscan lead to errors. Consider this example:
- >>> import numpy
- >>> import theano
- >>> x = theano.tensor.matrix('x')
- >>> y = theano.tensor.matrix('y')
- >>> z = theano.tensor.join(0, x, y)
- >>> xv = numpy.random.rand(5, 4)
- >>> yv = numpy.random.rand(3, 3)
- >>> f = theano.function([x, y], z.shape)
- >>> theano.printing.debugprint(f)
- MakeVector{dtype='int64'} [id A] '' 4
- |Elemwise{Add}[(0, 0)] [id B] '' 3
- | |Shape_i{0} [id C] '' 2
- | | |x [id D]
- | |Shape_i{0} [id E] '' 1
- | |y [id F]
- |Shape_i{1} [id G] '' 0
- |x [id D]
- >>> f(xv, yv) # DOES NOT RAISE AN ERROR AS SHOULD BE.
- array([8, 4])
- >>> f = theano.function([x,y], z)# Do not take the shape.
- >>> theano.printing.debugprint(f)
- Join [id A] '' 0
- |TensorConstant{0} [id B]
- |x [id C]
- |y [id D]
- >>> f(xv, yv)
- Traceback (most recent call last):
- ...
- ValueError: ...
As you can see, when asking only for the shape of some computation (join
in theexample), an inferred shape is computed directly, without executingthe computation itself (there is no join
in the first output or debugprint).
This makes the computation of the shape faster, but it can also hide errors. Inthis example, the computation of the shape of the output of join
is done onlybased on the first input Theano variable, which leads to an error.
This might happen with other ops such as elemwise
and dot
, for example.Indeed, to perform some optimizations (for speed or stability, for instance),Theano assumes that the computation is correct and consistentin the first place, as it does here.
You can detect those problems by running the code without thisoptimization, using the Theano flagoptimizer_excluding=local_shape_to_shape_i
. You can also obtain thesame effect by running in the modes FAST_COMPILE
(it will not apply thisoptimization, nor most other optimizations) or DebugMode
(it will testbefore and after all optimizations (much slower)).
Specifying Exact Shape
Currently, specifying a shape is not as easy and flexible as we wish and we plan someupgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the
ConvOp
createdwhen callingconv2d
. You simply set the parametersimage_shape
andfilter_shape
inside the call. They must be tuples of 4elements. For example:
- theano.tensor.nnet.conv2d(..., image_shape=(7, 3, 5, 5), filter_shape=(2, 3, 4, 4))
- You can use the
SpecifyShape
op to add shape information anywhere in thegraph. This allows to perform some optimizations. In the following example,this makes it possible to precompute the Theano function to a constant.
- >>> import theano
- >>> x = theano.tensor.matrix()
- >>> x_specify_shape = theano.tensor.specify_shape(x, (2, 2))
- >>> f = theano.function([x], (x_specify_shape ** 2).shape)
- >>> theano.printing.debugprint(f)
- DeepCopyOp [id A] '' 0
- |TensorConstant{(2,) of 2} [id B]
Future Plans
The parameter “constant shape” will be added totheano.shared()
. This is probably the most frequent occurrence withshared
variables. It will make the code simpler and will make it possible to check that the shape does not change when updating theshared
variable.