Theano 在 Windows 上的配置

注意:不建议在 windows 进行 theano 的配置。 务必确认你的显卡支持 CUDA

我个人的电脑搭载的是 Windows 10 x64 系统,显卡是 Nvidia GeForce GTX 850M

安装 theano

首先是用 anaconda 安装 theano

  1. conda install mingw libpython
  2. pip install theano

安装 VS 和 CUDA

按顺序安装这两个软件:

  • 安装 Visual Studio 2010/2012/2013
  • 安装 对应的 x64 或 x86 CUDA Cuda 的版本与电脑的显卡兼容。

我安装的是 Visual Studio 2012 和 CUDA v7.0v。

配置环境变量

CUDA 会自动帮你添加一个 CUDA_PATH 环境变量(环境变量在 控制面板->系统与安全->系统->高级系统设置 中),表示你的 CUDA 安装位置,我的电脑上为:

  • CUDA_PATH

    • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.0 我们配置两个相关变量:
  • CUDA_BIN_PATH

    • %CUDA_PATH%\bin
  • CUDA_LIB_PATH

    • %CUDA_PATH%\lib\Win32 接下来在 Path 环境变量的后面加上:
  • Minicoda 中关于 mingw 的项:

    • C:\Miniconda\MinGW\bin;
    • C:\Miniconda\MinGW\x86_64-w64-mingw32\lib;
  • VS 中的 cl 编译命令:

    • C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin;
    • C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE;

生成测试文件:

In [1]:

  1. %%file test_theano.py
  2. from theano import config
  3. print 'using device:', config.device
  1. Writing test_theano.py

我们可以通过临时设置环境变量 THEANO_FLAGS 来改变 theano 的运行模式,在 linux 下,临时环境变量直接用:

  1. THEANO_FLAGS=xxx

就可以完成,设置完成之后,该环境变量只在当前的命令窗口有效,你可以这样运行你的代码:

  1. THEANO_FLAGS=xxx python <your script>.py

Windows 下,需要使用 set 命令来临时设置环境变量,所以运行方式为:

  1. set THEANO_FLAGS=xxx && python <your script>.py

In [2]:

  1. import sys
  2.  
  3. if sys.platform == 'win32':
  4. !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py
  5. else:
  6. !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py
  1. using device: cpu

In [3]:

  1. if sys.platform == 'win32':
  2. !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
  3. else:
  4. !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py
  1. Using gpu device 0: Tesla C2075 (CNMeM is disabled)
  2. using device: gpu

测试 CPUGPU 的差异:

In [4]:

  1. %%file test_theano.py
  2.  
  3. from theano import function, config, shared, sandbox
  4. import theano.tensor as T
  5. import numpy
  6. import time
  7.  
  8. vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
  9. iters = 1000
  10.  
  11. rng = numpy.random.RandomState(22)
  12. x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
  13. f = function([], T.exp(x))
  14.  
  15. t0 = time.time()
  16. for i in xrange(iters):
  17. r = f()
  18. t1 = time.time()
  19. print("Looping %d times took %f seconds" % (iters, t1 - t0))
  20. print("Result is %s" % (r,))
  21. if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
  22. print('Used the cpu')
  23. else:
  24. print('Used the gpu')
  1. Overwriting test_theano.py

In [5]:

  1. if sys.platform == 'win32':
  2. !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py
  3. else:
  4. !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py
  1. Looping 1000 times took 3.498123 seconds
  2. Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
  3. 1.62323284]
  4. Used the cpu

In [6]:

  1. if sys.platform == 'win32':
  2. !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
  3. else:
  4. !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py
  1. Using gpu device 0: Tesla C2075 (CNMeM is disabled)
  2. Looping 1000 times took 0.847006 seconds
  3. Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
  4. 1.62323296]
  5. Used the gpu

可以看到 GPU 明显要比 CPU 快。

使用 GPU 模式的 T.exp(x) 可以获得更快的加速效果:

In [7]:

  1. %%file test_theano.py
  2.  
  3. from theano import function, config, shared, sandbox
  4. import theano.sandbox.cuda.basic_ops
  5. import theano.tensor as T
  6. import numpy
  7. import time
  8.  
  9. vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
  10. iters = 1000
  11.  
  12. rng = numpy.random.RandomState(22)
  13. x = shared(numpy.asarray(rng.rand(vlen), 'float32'))
  14. f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))
  15.  
  16. t0 = time.time()
  17. for i in xrange(iters):
  18. r = f()
  19. t1 = time.time()
  20. print("Looping %d times took %f seconds" % (iters, t1 - t0))
  21. print("Result is %s" % (r,))
  22. print("Numpy result is %s" % (numpy.asarray(r),))
  23. if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
  24. print('Used the cpu')
  25. else:
  26. print('Used the gpu')
  1. Overwriting test_theano.py

In [8]:

  1. if sys.platform == 'win32':
  2. !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py
  3. else:
  4. !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py
  1. Using gpu device 0: Tesla C2075 (CNMeM is disabled)
  2. Looping 1000 times took 0.318359 seconds
  3. Result is <CudaNdarray object at 0x7f7bb701fb70>
  4. Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
  5. 1.62323296]
  6. Used the gpu

In [9]:

  1. !rm test_theano.py

配置 .theanorc.txt

我们可以在个人文件夹下配置 .theanorc.txt 文件来省去每次都使用环境变量设置的麻烦:

例如我现在的 .theanorc.txt 配置为:

  1. [global]
  2. device = gpu
  3. floatX = float32
  4. [nvcc]
  5. fastmath = True
  6. flags = -LC:\Miniconda\libs
  7. compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin
  8. [gcc]
  9. cxxflags = -LC:\Miniconda\MinGW

具体这些配置有什么作用之后可以查看官网上的教程。

原文: https://nbviewer.jupyter.org/github/lijin-THU/notes-python/blob/master/09-theano/09.03-gpu-on-windows.ipynb