cuDNN is an NVIDIA librarywith functionality used by deep neural networks. It provides optimizedversions of some operations like the convolution. cuDNN is notcurrently installed with CUDA. You must download and install ityourself.

To install it, decompress the downloaded file and make the .h* files available to the compilation environment.There are at least three possible ways of doing so:

  • The easiest is to include them in your CUDA installation. Copy the.h files to CUDA_ROOT/include and the .so* files toCUDA_ROOT/lib64 (by default, CUDA_ROOT is /usr/local/cudaon Linux).

  • Alternatively, on Linux, you can set the environment variablesLD_LIBRARY_PATH, LIBRARY_PATH and CPATH to the directoryextracted from the download. If needed, separate multiple directorieswith : as in the PATH environment variable.


  1. export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
  2. export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
  3. export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
  • And as a third way, also on Linux, you can copy the .h filesto /usr/include and the .so* files to /lib64.

By default, Theano will detect if it can use cuDNN. If so, it will useit. If not, Theano optimizations will not introduce cuDNN ops. SoTheano will still work if the user did not introduce them manually.

To get an error if Theano can not use cuDNN, use this Theano flag:optimizer_including=cudnn.


cuDNN v5.1 is supported in Theano master version. So it dropped cuDNN v3 support.Theano 0.8.0 and 0.8.1 support only cuDNN v3 and v4.Theano 0.8.2 will support only v4 and v5.


Starting in cuDNN v3, multiple convolution implementations are offered andit is possible to use heuristics to automatically choose a convolutionimplementation well suited to the parameters of the convolution.

The Theano flag dnn.conv.algo_fwd allows to specify the cuDNNconvolution implementation that Theano should use for forward convolutions.Possible values include :

  • small (default) : use a convolution implementation with small memoryusage
  • none : use a slower implementation with minimal memory usage
  • large : use a sometimes faster implementation with large memory usage
  • fft : use the Fast Fourier Transform implementation of convolution(very high memory usage)
  • guess_once : the first time a convolution is executed, theimplementation to use is chosen according to cuDNN’s heuristics and reusedfor every subsequent execution of the convolution.
  • guess_on_shape_change : like guess_once but a new convolutionimplementation selected every time the shapes of the inputs and kernelsdon’t match the shapes from the last execution.
  • time_once : the first time a convolution is executed, every convolutionimplementation offered by cuDNN is executed and timed. The fastest isreused for every subsequent execution of the convolution.
  • time_on_shape_change : like time_once but a new convolutionimplementation selected every time the shapes of the inputs and kernelsdon’t match the shapes from the last execution.

The Theano flag dnn.conv.algo_bwd allows to specify the cuDNNconvolution implementation that Theano should use for gradient convolutions.Possible values include :

  • none (default) : use the default non-deterministic convolutionimplementation
  • deterministic : use a slower but deterministic implementation
  • fft : use the Fast Fourier Transform implementation of convolution(very high memory usage)
  • guess_once : the first time a convolution is executed, theimplementation to use is chosen according to cuDNN’s heuristics and reusedfor every subsequent execution of the convolution.
  • guess_on_shape_change : like guess_once but a new convolutionimplementation selected every time the shapes of the inputs and kernelsdon’t match the shapes from the last execution.
  • time_once : the first time a convolution is executed, every convolutionimplementation offered by cuDNN is executed and timed. The fastest isreused for every subsequent execution of the convolution.
  • time_on_shape_change : like time_once but a new convolutionimplementation selected every time the shapes of the inputs and kernelsdon’t match the shapes from the last execution.

guess* and time* flag values take into account the amount ofavailable memory when selecting an implementation. This means that slowerimplementations might be selected if not enough memory is available for thefaster implementations.


Normally you should not call GPU Ops directly, but the CPU interfacecurrently does not allow all options supported by cuDNN ops. So it ispossible that you will need to call them manually.


The documentation of CUDNN tells that, for the 2 following operations, thereproducibility is not guaranteed with the default implementation:cudnnConvolutionBackwardFilter and cudnnConvolutionBackwardData.Those correspond to the gradient wrt the weights and the gradient wrt theinput of the convolution. They are also used sometimes in the forwardpass, when they give a speed up.

The Theano flag dnn.conv.algo_bwd can be use to force the use of aslower but deterministic convolution implementation.


There is a problem we do not understand yet when cudnn paths areused with symbolic links. So avoid using that.

Note* must be readable and executable by everybody.cudnn.h must be readable by everybody.

cuDNN RNN Example

This is a code example of using the cuDNN RNN functionality. Wepresent the code with some commentary in between to explain somepeculiarities.

The terminology here assumes that you are familiar with RNN structure.

  1. dtype = 'float32'
  2. input_dim = 32
  3. hidden_dim = 16
  4. batch_size = 2
  5. depth = 3
  6. timesteps = 5

To clarify the rest of the code we define some variables to hold sizes.

  1. X = T.tensor3('X')
  2. Y = T.tensor3('Y')
  3. h0 = T.tensor3('h0')

We also define some Theano variables to work with. Here X is input,Y is output (as in expected output) and h0 is the initial statefor the recurrent inputs.

  1. rnnb = dnn.RNNBlock(dtype, hidden_dim, depth, 'gru')

This defines an RNNBlock. This is a departure from usual Theanooperations in that it has the structure of a layer more than aseparate operation. This is constrained by the underlying API.

  1. psize = rnnb.get_param_size([batch_size, input_dim])
  2. params_cudnn = gpuarray_shared_constructor(
  3. np.zeros((psize,), dtype=theano.config.floatX))

Here we allocate space for the trainable parameters of the RNN. Thefirst function tells us how many elements we will need to store theparameters. This space if for all the parameters of all the layersinside the RNN and the layout is opaque.

  1. layer = 0
  2. = rnnb.split_params(params_cudnn, layer,
  3. [batch_size, input_dim])

If you need to access the parameters individually, you can callsplit_params on your shared variable to get all the parameters for asingle layer. The order and number of returned items depends on thetype of RNN.

  • rnn_relu, rnn_tanh
  • input, recurrent
  • gru
  • input reset, input update, input newmem, recurrent reset, recurrentupdate, recurrent newmem
  • lstm
  • input input gate, input forget gate, input newmem gate, input outputgate, recurrent input gate, recurrent update gate, recurrent newmemgate, recurrent output gate

All of these elements are composed of a weights and bias (matrix andvector).

  1. y, hy = rnnb.apply(params_cudnn, X, h0)

This is more akin to an op in Theano in that it will apply the RNNoperation to a set of symbolic inputs and return symbolic outputs.y is the output, hy is the final state for the recurrent inputs.

After this, the gradient works as usual so you can treat the returnedsymbolic outputs as normal Theano symbolic variables.

List of Implemented Operations

  • class theano.gpuarray.dnn.DnnBase(files=None, c_func=None)[source]
  • Creates a handle for cudnn and pulls in the cudnn libraries and headers.
  • class theano.gpuarray.dnn.GpuDnnBatchNorm(mode='per-activation', running_averages=False, inplace_running_mean=False, inplace_running_var=False, inplace_output=False)[source]
  • Base Op for cuDNN Batch Normalization.


  • mode ({'per-activation', 'spatial'}) – Whether to normalize per activation (in this mode, bias and scaletensor dimensions are 1xCxHxW) or share normalization factors acrossspatial dimensions (in this mode, bias and scale tensor dimensionsare 1xCx1x1).
  • epsilon – Epsilon value used in the batch normalization formula. Minimum allowedvalue is 1e-5 (imposed by cuDNN).
  • running_average_factor (float) – Factor for updating the values or running_mean and running_var.If the factor is close to one, the running averages will update quickly,if the factor is close to zero it will update slowly.
  • running_mean (tensor or __None) – Previous value of the running mean. If this is given, the new valuerunningmean (1 - r_a_factor) + batch mean r_a_factorwill be returned as one of the outputs of this function._running_mean and running_var should either both be given orboth be None.
  • running_var (tensor or __None) – Previous value of the running variance. If this is given, the new valuerunningvar (1 - r_a_factor) + (m / (m - 1)) batch var * r_a_factorwill be returned as one of the outputs of this function,where _m is the product of lengths of the averaged-over dimensions.running_mean and running_var should either both be given orboth be None.
  • class theano.gpuarray.dnn.GpuDnnBatchNormInference(mode='per-activation', inplace=False)[source]
  • Base Op for cuDNN Batch Normalization.


  • mode ({'per-activation', 'spatial'}) – Whether to normalize per activation (in this mode, bias and scaletensor dimensions are 1xCxHxW) or share normalization factors acrossspatial dimensions (in this mode, bias and scale tensor dimensionsare 1xCx1x1).
  • epsilon – Epsilon value used in the batch normalization formula. Minimum allowedvalue is 1e-5 (imposed by cuDNN).
  • class theano.gpuarray.dnn.GpuDnnConv(algo=None, inplace=False, num_groups=1)[source]
  • The forward convolution.


  • image
  • kernel
  • descr – The convolution descriptor.
  • algo ({'small', 'none', 'large', 'fft', 'ffttiling', 'winograd', 'guessonce',) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’}Default is the value of config.dnn.conv.algo_fwd.
  • num_groups – Divides the image, kernel and output tensors into num_groupsseparate groups. Each which carry out convolutions separately
  • static getout_shape(_ishape, kshape, border_mode, subsample, dilation)[source]
  • This function computes the output shape for a convolution withthe specified parameters. ishape and kshape can be symbolicor scalar.
  • class theano.gpuarray.dnn.GpuDnnConvDesc(border_mode, subsample=(1, 1), dilation=(1, 1), conv_mode='conv', precision='float32', num_groups=1)[source]
  • This Op builds a convolution descriptor for use in the other convolutionoperations.

See the doc of dnn_conv() for a description of the parameters

  • class theano.gpuarray.dnn.GpuDnnConvGradI(inplace=False, algo=None, num_groups=1)[source]
  • The convolution gradient with respect to the inputs.


  • image
  • kernel
  • descr – The convolution descriptor.
  • algo ({'none', 'deterministic', 'fft', 'ffttiling', 'winograd', 'guessonce',) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’}Default is the value of config.dnn.conv.algo_bwd_data.
  • num_groups – Divides the image, kernel and output tensors into num_groupsseparate groups. Each which carry out convolutions separately
  • class theano.gpuarray.dnn.GpuDnnConvGradW(inplace=False, algo=None, num_groups=1)[source]
  • The convolution gradient with respect to the weights.


  • image
  • kernel
  • descr – The convolution descriptor.
  • algo ({'none', 'deterministic', 'fft', 'small', 'guessonce'_,) – ‘guess_on_shape_change’, ‘time_once’, ‘time_on_shape_change’}Default is the value of config.dnn.conv.algo_bwd_filter.
  • num_groups – Divides the image, kernel and output tensors into num_groupsseparate groups. Each which carry out convolutions separately
  • class theano.gpuarray.dnn.GpuDnnPool(mode='max')[source]


  • img (tensor) – The image 4d or 5d tensor.
  • ws (tensor) – Window size.
  • stride (tensor) – (dx, dy) or (dx, dy, dz).
  • mode ({'max', 'averageincpad'__, 'average_exc_pad'}) – The old deprecated name ‘average’ corresponds to ‘average_inc_pad’.
  • pad (tensor) – (padX, padY) or (padX, padY, padZ)
  • class theano.gpuarray.dnn.GpuDnnPoolBase(mode='max')[source]
  • Abstract base class for GpuDnnPool and GpuDnnPoolGrad.
  • class theano.gpuarray.dnn.GpuDnnPoolDesc(ws=(1, 1), stride=(1, 1), mode='max', pad=(0, 0))[source]
  • This Op builds a pooling descriptor for use in the otherpooling operations.

ws, stride and pad must have the same length.


  • ws (tuple) – Window size.
  • stride (tuple) – (dx, dy) or (dx, dy, dz).
  • mode ({'max', 'averageincpad'__, 'average_exc_pad'}) – The old deprecated name ‘average’ corresponds to ‘average_inc_pad’.
  • pad (tuple) – (padX, padY) or (padX, padY, padZ)


Not used anymore. Only needed to reload old pickled files.

  • class theano.gpuarray.dnn.GpuDnnPoolGrad(mode='max')[source]
  • The pooling gradient.


  • inp – The input of the pooling.
  • out – The output of the pooling in the forward.
  • out_grad – Same size as out, but is the corresponding gradient information.
  • ws (tensor variable) – Window size.
  • stride (tensor variable) – (dx, dy) or (dx, dy, dz).
  • mode ({'max', 'averageincpad'__, 'average_exc_pad'}) – The old deprecated name ‘average’ corresponds to ‘average_inc_pad’.
  • pad (tensor) – (padX, padY) or (padX, padY, padZ)
  • class theano.gpuarray.dnn.GpuDnnSoftmax(algo, mode)[source]
  • Op for the cuDNN Softmax.

    • algo : {‘fast’, ‘accurate’, ‘log’}
    • Indicating whether, respectively, computations should be optimized forspeed, for accuracy, or if cuDNN should rather compute the log-softmax instead.
    • mode : {‘instance’, ‘channel’}
    • Indicating whether the softmax should be computed per image across ‘c01’or per spatial location ‘01’ per image across ‘c’.
  • class theano.gpuarray.dnn.GpuDnnSoftmaxBase(algo, mode)[source]
  • Op for the cuDNN Softmax.


  • algo ({'fast', 'accurate', 'log'}) – Indicating whether, respectively, computations should be optimized forspeed, for accuracy, or if cuDNN should rather compute the log-softmax instead.
  • mode ({'instance', 'channel'}) – Indicating whether the softmax should be computed per image across ‘c01’or per spatial location ‘01’ per image across ‘c’.
  • class theano.gpuarray.dnn.GpuDnnSoftmaxGrad(algo, mode)[source]
  • Op for the cuDNN SoftmaxGrad.


  • algo – ‘fast’, ‘accurate’ or ‘log’ indicating whether, respectively,computations should be optimized for speed, for accuracy, or if cuDNNshould rather compute the gradient of the log-softmax instead.
  • mode – ‘instance’ or ‘channel’ indicating whether the softmax shouldbe computed per image across ‘c01’ or per spatial location ‘01’ perimage across ‘c’.
  • class theano.gpuarray.dnn.GpuDnnTransformerGradI[source]
  • Gradient of inputs Op for cuDNN Spatial Transformer.
  • class theano.gpuarray.dnn.GpuDnnTransformerGradT[source]
  • Gradient of affine transformations Op for cuDNN Spatial Transformer.
  • class theano.gpuarray.dnn.GpuDnnTransformerGrid[source]
  • Grid generator Op for cuDNN Spatial Transformer.

    • makenode(_theta, out_dims)[source]
    • Create a grid generator node for a cuDNN Spatial Transformer


  1. - **theta** ([_tensor_]( – Affine transformation tensor containing one affine transformationmatrix per image. <code>theta</code> is usually generated by the localizationnetwork.
  2. - **out_dims** (_tuple_) Dimensions of the transformed inputs, containing four elements, and is givenby (N, C, H, W), where N is the number of inputs, C the number of channels,H and W are the height and width of each input.
  • class theano.gpuarray.dnn.GpuDnnTransformerSampler[source]
  • Grid sampler Op for cuDNN Spatial Transformer.

    • makenode(_img, grid)[source]
    • Create a grid sampler node for a cuDNN Spatial Transformer


  1. - **img** ([_tensor_]( – Images from which the pixels will be sampled. The implementationassumes the tensor is in NCHW format, where N is the number of images,C is the number of color channels, H is the height of the inputs, andW is width of the inputs.
  2. - **grid** ([_GpuDnnTransformerGrid_](#theano.gpuarray.dnn.GpuDnnTransformerGrid)) Grid that contains the coordinates of the pixels to be sampled fromthe inputs images.
  • class theano.gpuarray.dnn.RNNBlock(dtype, hidden_size, num_layers, rnn_mode, input_mode='linear', direction_mode='unidirectional', context_name=None)[source]
  • An object that allow us to use CuDNN RNN implementation.TODO: make an example how to use. You can check Theano teststest_dnn_rnn_gru() and test_dnn_rnn_lstm() in the filetheano/gpuarray/tests/ for now.


  • dtype (data type of computation) –
  • hidden_size (int) – hidden layer dimension.
  • num_layers (int) – number of the recurrent layer you want to set.
  • rnn_mode ({'rnnrelu', 'rnntanh', 'lstm'__, 'gru'}) – rnn_relu: A single-gate recurrent neural network with a ReLU activation function.

theano.gpuarray.dnn – cuDNN - 图1

ht=ReLU(W_ix_t+U_ih{t-1}+b{wi}+b{Ri})rnn_tanh: A single-gate recurrent neural network with a tanh activation function.

theano.gpuarray.dnn – cuDNN - 图2


lstm: A four-gate Long Short-Term Memory network with no peephole connections.gru: A three-gate network consisting of Gated Recurrent Units.

  • input_mode ({'linear', 'skip'}) – linear: input will be multiplied by a biased matrixskip: No operation is performed on the input. The size must match the hidden size.
  • direction_mode ({'unidirectional', 'bidirectional'}) – unidirectional: The network operates recurrently from the first input to the last.bidirectional: The network operates from first to last then from last to first and concatenates the results at each layer.
  • apply(w, x, hx, cx=None)[source]
  • Apply the RNN to some data


  1. - **w** opaque parameter block
  2. - **x** input
  3. - **hx** initial hidden state
  4. - **cx** initial cell state (for LSTM)
  • getparam_size(_input_size)[source]
  • Get the size of the shared variable for the parameters of the RNN.

This will return a size (in items) necessary to store all theparameters for the RNN. You should allocate a variable ofthat size to store those parameters. The order and layout ofthe parameters is opaque.

Parameters:input_size ((int, int)) – Size of the input blocks

  • splitparams(_w, layer, input_size)[source]
  • Split the opaque parameter block into components.


  1. - **w** ([_GpuArraySharedVariable_]($ – opaque parameter block
  2. - **layer** (_int_) ID of the layer
  3. - **input_size** (_(__int__, __int__)_) Size of the input blocks
  • theano.gpuarray.dnn.dnnbatch_normalization_test(_inputs, gamma, beta, mean, var, mode='per-activation', epsilon=0.0001)[source]
  • Performs batch normalization of the given inputs, using the given mean andvariance.


  • mode ({'per-activation', 'spatial'}) – Whether to normalize per activation or share normalization factorsacross spatial dimensions (i.e., all dimensions past the second).
  • gamma (tensor) – Scale factors. Must match the dimensionality of inputs, but havesizes of 1 for all axes normalized over (i.e., in the first dimensionfor mode='per-activation'`, and additionally in all dimensions past the second for ``mode='spatial').
  • beta (tensor) – Biases. Must match the tensor layout of gamma.
  • mean (tensor) – Means. Usually these are running averages computed during training.Must match the tensor layout of gamma.
  • var (tensor) – Variances. Usually these are running averages computed during training.Must match the tensor layout of gamma.
  • epsilon (float) – Epsilon value used in the batch normalization formula. Minimum allowedvalue is 1e-5 (imposed by cuDNN).Returns: out – Batch-normalized inputs. Return type: tensor


Requires cuDNN 5 and Theano 0.9dev2 or more recent.

For 4d tensors, the returned value is equivalent to:

  1. axes = (0,) if mode == 'per-activation' else (0, 2, 3)
  2. gamma, beta, mean, var = (T.addbroadcast(t, *axes)
  3. for t in (gamma, beta, mean, var))
  4. out = (inputs - mean) * gamma / T.sqrt(var + epsilon) + beta

For 5d tensors, the axes would be (0, 2, 3, 4).

  • theano.gpuarray.dnn.dnnbatch_normalization_train(_inputs, gamma, beta, mode='per-activation', epsilon=0.0001, running_average_factor=0.1, running_mean=None, running_var=None)[source]
  • Performs batch normalization of the given inputs, using the mean andvariance of the inputs.


  • mode ({'per-activation', 'spatial'}) – Whether to normalize per activation or share normalization factorsacross spatial dimensions (i.e., all dimensions past the second).
  • gamma (tensor) – Learnable scale factors. Must match the dimensionality of inputs,but have sizes of 1 for all axes normalized over (i.e., in the firstdimension for mode='per-activation'`, and additionally in all dimensions past the second for ``mode='spatial').
  • beta (tensor) – Learnable biases. Must match the tensor layout of gamma.
  • epsilon (float) – Epsilon value used in the batch normalization formula. Minimum allowedvalue is 1e-5 (imposed by cuDNN).
  • running_average_factor (float) – Factor for updating the values or running_mean and running_var.If the factor is close to one, the running averages will update quickly,if the factor is close to zero it will update slowly.
  • running_mean (tensor or __None) – Previous value of the running mean. If this is given, the new valuerunningmean (1 - r_a_factor) + batch mean r_a_factorwill be returned as one of the outputs of this function._running_mean and running_var should either both be given orboth be None.
  • running_var (tensor or __None) – Previous value of the running variance. If this is given, the new valuerunningvar (1 - r_a_factor) + (m / (m - 1)) batch var * r_a_factorwill be returned as one of the outputs of this function,where _m is the product of lengths of the averaged-over dimensions.running_mean and running_var should either both be given orboth be None.Returns:
  • out (tensor) – Batch-normalized inputs.
  • mean (tensor) – Means of inputs across the normalization axes.
  • invstd (tensor) – Inverse standard deviations of inputs across the normalization axes.
  • new_running_mean (tensor) – New value of the running mean (only if both running_mean andrunning_var were given).
  • new_running_var (tensor) – New value of the running variance (only if both running_var andrunning_mean were given).


Requires cuDNN 5 and Theano 0.9dev2 or more recent.

For 4d tensors, returned values are equivalent to:

  1. axes = 0 if mode == 'per-activation' else (0, 2, 3)
  2. mean = inputs.mean(axes, keepdims=True)
  3. var = inputs.var(axes, keepdims=True)
  4. invstd = T.inv(T.sqrt(var + epsilon))
  5. out = (inputs - mean) * gamma * invstd + beta
  7. m = T.cast( /, 'float32')
  8. running_mean = running_mean * (1 - running_average_factor) + \
  9. mean * running_average_factor
  10. running_var = running_var * (1 - running_average_factor) + \
  11. (m / (m - 1)) * var * running_average_factor

For 5d tensors, the axes are (0, 2, 3, 4).

  • theano.gpuarray.dnn.dnnconv(_img, kerns, border_mode='valid', subsample=(1, 1), dilation=(1, 1), conv_mode='conv', direction_hint=None, workmem=None, algo=None, precision=None, num_groups=1)[source]
  • GPU convolution using cuDNN from NVIDIA.

The memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’,‘first dim’, ‘second dim’ in that order.


  • img – Images to do the convolution over.
  • kerns – Convolution filters.
  • border_mode – One of ‘valid’, ‘full’, ‘half’; additionally, the padding sizecould be directly specified by an integer or a pair of integers.
  • subsample – Perform subsampling of the output (default: (1, 1)).
  • dilation – Filter dilation factor. A dilation factor of d is equivalent to aconvolution with d - 1 zeros inserted between neighboring filtervalues.
  • conv_mode – Perform convolution (kernels flipped) or cross-correlation.One of ‘conv’, ‘cross’ (default: ‘conv’).
  • direction_hint – Used by graph optimizers to change algorithm choice.By default, GpuDnnConv will be used to carry out the convolution.If bordermode is ‘valid’, subsample is (1, 1), dilation is (1, 1), anddirection_hint is ‘bprop weights’, it will use GpuDnnConvGradW.If border_mode is ‘full’, subsample is (1, 1), dilation is (1, 1), anddirection_hint is _not ‘forward!’, it will use GpuDnnConvGradI.This parameter is used internally by graph optimizers and may beremoved at any time without a deprecation period. You have been warned.
  • algo ({'none', 'small', 'large', 'fft', 'guessonce', 'guessonshapechange', 'time_once', 'time_on_shape_change'}) – Convolution implementation to use. Some of its values mayrequire certain versions of cuDNN to be installed. Default isthe value of config.dnn.conv.algo_fwd.
  • precision ({'asinputf32', 'asinput', 'float16', 'float32'_, 'float64'}) – Description of the dtype in which the computation of the convolutionshould be done. Possible values are ‘as_input’, ‘float16’, ‘float32’and ‘float64’. Default is the value ofconfig.dnn.conv.precision.
  • num_groups – Divides the image, kernel and output tensors into num_groupsseparate groups. Each which carry out convolutions separately


The cuDNN library only works with GPUs that have a computecapability of 3.0 or higer. This means that older GPUs will notwork with this Op.

  • theano.gpuarray.dnn.dnnconv3d(_img, kerns, border_mode='valid', subsample=(1, 1, 1), dilation=(1, 1, 1), conv_mode='conv', direction_hint=None, algo=None, precision=None, num_groups=1)[source]
  • GPU convolution using cuDNN from NVIDIA.

The memory layout to use is ‘bc012’, that is ‘batch’, ‘channel’,‘first dim’, ‘second dim’, ‘third dim’ in that order.


  • img – Images to do the convolution over.
  • kerns – Convolution filters.
  • border_mode – One of ‘valid’, ‘full’, ‘half’; additionally, the padding sizecould be directly specified by an integer or a pair of integers.
  • subsample – Perform subsampling of the output (default: (1, 1, 1)).
  • dilation – Filter dilation factor. A dilation factor of d is equivalent to aconvolution with d - 1 zeros inserted between neighboring filtervalues.
  • conv_mode – Perform convolution (kernels flipped) or cross-correlation.One of ‘conv’, ‘cross’ (default: ‘conv’).
  • direction_hint – Used by graph optimizers to change algorithm choice.By default, GpuDnnConv will be used to carry out the convolution.If bordermode is ‘valid’, subsample is (1, 1, 1), dilation is(1, 1, 1), and direction_hint is ‘bprop weights’, it will useGpuDnnConvGradW.If border_mode is ‘full’, subsample is (1, 1, 1), dilation is(1, 1, 1), and direction_hint is _not ‘forward!’, it will useGpuDnnConvGradI.This parameter is used internally by graph optimizers and may beremoved at any time without a deprecation period. You have been warned.
  • algo (convolution implementation to use. Only 'none' is implemented) – for the conv3d. Default is the value of config.dnn.conv.algo_fwd.
  • precision ({'asinputf32', 'asinput', 'float16', 'float32'_, 'float64'}) – Description of the dtype in which the computation of the convolutionshould be done. Possible values are ‘as_input’, ‘float16’, ‘float32’and ‘float64’. Default is the value ofconfig.dnn.conv.precision.
  • num_groups – Divides the image, kernel and output tensors into num_groupsseparate groups. Each which carry out convolutions separately


The cuDNN library only works with GPUs that have a computecapability of 3.0 or higer. This means that older GPUs will notwork with this Op.

  • theano.gpuarray.dnn.dnngradinput(_kerns, topgrad, img_shp, border_mode='valid', subsample=(1, 1), dilation=(1, 1), conv_mode='conv', precision=None, algo=None, num_groups=1)[source]
  • TODO: document this
  • theano.gpuarray.dnn.dnngradinput3d(_kerns, topgrad, img_shp, border_mode='valid', subsample=(1, 1, 1), dilation=(1, 1, 1), conv_mode='conv', precision=None, algo=None, num_groups=1)[source]
  • 3d version of dnn_gradinput.
  • theano.gpuarray.dnn.dnngradweight(_img, topgrad, kerns_shp, border_mode='valid', subsample=(1, 1), dilation=(1, 1), conv_mode='conv', precision=None, algo=None, num_groups=1)[source]
  • TODO: document this
  • theano.gpuarray.dnn.dnngradweight3d(_img, topgrad, kerns_shp, border_mode='valid', subsample=(1, 1, 1), dilation=(1, 1, 1), conv_mode='conv', precision=None, algo=None, num_groups=1)[source]
  • 3d version of dnn_gradweight
  • theano.gpuarray.dnn.dnnpool(_img, ws, stride=None, mode='max', pad=None)[source]
  • GPU pooling using cuDNN from NVIDIA.

The memory layout to use is ‘bc01’, that is ‘batch’, ‘channel’,‘first dim’, ‘second dim’ in that order.

ws, stride and pad must have the same length.


  • img – Images to do the pooling over.
  • ws (tuple) – Subsampling window size. Should have 2 or 3 elements.
  • stride (tuple) – Subsampling stride (default: (1, 1) or (1, 1, 1)).
  • mode ({'max', 'averageincpad', 'averageexcpad', 'sum', 'max_deterministic'}) – NB: ‘max_deterministic’ is supported since cuDNN v6.
  • pad (tuple) – (padX, padY) or (padX, padY, padZ)default: (0, 0) or (0, 0, 0)


The cuDNN library only works with GPU that have a computecapability of 3.0 or higer. This means that older GPU will notwork with this Op.


This Op implements the ignore_border=True of max_pool_2d.

  • theano.gpuarray.dnn.dnnspatialtf(_img, theta, scale_width=1, scale_height=1)[source]
  • GPU spatial transformer using cuDNN from NVIDIA.


  • img (tensor) – Images to which the transformations will be applied. The implementationassumes the tensor is in NCHW format, where N is the number of images,C is the number of color channels, H is the height of the inputs, andW is width of the inputs.
  • theta (tensor) – Affine transformation tensor containing one affine transformationmatrix per image. theta is usually generated by the localizationnetwork.
  • scale_height (float) – A float specifying the scaling factor for the height of the outputimage. A value of 1 will keep the original height of the input. Valueslarger than 1 will upsample the input. Values below 1 will downsamplethe input.
  • scale_width (float) – A float specifying the scaling factor for the width of the outputimage. A value of 1 will keep the original width of the input. Valueslarger than 1 will upsample the input. Values below 1 will downsamplethe input.Returns: out – Transformed images with width and height properly scaled. Return type: tensor


Currently, cuDNN only supports 2D transformations with 2x3 affinetransformation matrices.

Bilinear interpolation is the only grid sampler method available.

  • theano.gpuarray.dnn.version(raises=True)[source]
  • Return the current cuDNN version we link with.

This also does a check that the header version matches the runtime version.

Raises:If True, raise an exception if cuDNN is not present.Otherwise, return -1.

It always raise an RuntimeError if the header and library versionare not the same.