Using Deeplearning4j with cuDNN
Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.
The only thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-9.0
, deeplearning4j-cuda-9.2
, or deeplearning4j-cuda-10.0
, for example:
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-cuda-9.0</artifactId>
<version>1.0.0-beta4</version>
</dependency>
or
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-cuda-9.2</artifactId>
<version>1.0.0-beta4</version>
</dependency>
or
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-cuda-10.0</artifactId>
<version>1.0.0-beta4</version>
</dependency>
The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:
- NVIDIA cuDNNNote there are multiple combinations of cuDNN and CUDA supported. At this time the following combinations are supported by Deeplearning4j:
CUDA Version | cuDNN Version |
---|---|
9.0 | 7.0 |
9.2 | 7.1 |
10.0 | 7.3 |
To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/
on Linux, /usr/local/cuda/lib/
on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\
, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\
, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\
on Windows).
Alternatively, in the case of CUDA 9.2 or 10.0, cuDNN comes bundled with the “redist” package of the JavaCPP Presets for CUDA. After agreeing to the license, we can add the following dependencies instead of installing CUDA and cuDNN:
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>9.2-7.1-1.4.2</version>
<classifier>linux-x86_64-redist</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>9.2-7.1-1.4.2</version>
<classifier>linux-ppc64le-redist</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>9.2-7.1-1.4.2</version>
<classifier>macosx-x86_64-redist</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>9.2-7.1-1.4.2</version>
<classifier>windows-x86_64-redist</classifier>
</dependency>
or
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>10.0-7.3-1.4.3</version>
<classifier>linux-x86_64-redist</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>10.0-7.3-1.4.3</version>
<classifier>linux-ppc64le-redist</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>10.0-7.3-1.4.3</version>
<classifier>macosx-x86_64-redist</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>cuda</artifactId>
<version>10.0-7.3-1.4.3</version>
<classifier>windows-x86_64-redist</classifier>
</dependency>
Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the NO_WORKSPACE
mode settable via the network configuration, instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST
, for example:
// for the whole network
new NeuralNetConfiguration.Builder()
.cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
// ...
// or separately for each layer
new ConvolutionLayer.Builder(h, w)
.cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
// ...