Theano 实例:更复杂的网络

In [1]:

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4. from load import mnist
  5. from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
  6.  
  7. srng = RandomStreams()
  8.  
  9. def floatX(X):
  10. return np.asarray(X, dtype=theano.config.floatX)
  1. Using gpu device 1: Tesla C2075 (CNMeM is disabled)

上一节我们用了一个简单的神经网络来训练 MNIST 数据,这次我们使用更复杂的网络来进行训练,同时加入 dropout 机制,防止过拟合。

这里采用比较简单的 dropout 机制,即将输入值按照一定的概率随机置零。

In [2]:

  1. def dropout(X, prob=0.):
  2. if prob > 0:
  3. X *= srng.binomial(X.shape, p=1-prob, dtype = theano.config.floatX)
  4. X /= 1 - prob
  5. return X

之前我们采用的的激活函数是 sigmoid,现在我们使用 rectify 激活函数。

这可以使用 T.nnet.relu(x, alpha=0) 来实现,它本质上相当于:T.switch(x > 0, x, alpha * x),而 rectify 函数的定义为:

\text{rectify}(x) = \left{\begin{aligned}x, & \ x > 0 \0, & \ x < 0\end{aligned}\right.

之前我们构造的是一个单隐层的神经网络结构,现在我们构造一个双隐层的结构即“输入-隐层1-隐层2-输出”的全连接结构。

\begin{aligned}& h1 = \text{rectify}(W{h1} \ x) \& h_2 = \text{rectify}(W{h_2} \ h_1) \& o = \text{softmax}(W_o h_2)\end{aligned}

Theano 自带的 T.nnet.softmax() 的 GPU 实现目前似乎有 bug 会导致梯度溢出的问题,因此自定义了 softmax 函数:

In [3]:

  1. def softmax(X):
  2. e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))
  3. return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')
  4.  
  5. def model(X, w_h1, w_h2, w_o, p_drop_input, p_drop_hidden):
  6. """
  7. input:
  8. X: input data
  9. w_h1: weights input layer to hidden layer 1
  10. w_h2: weights hidden layer 1 to hidden layer 2
  11. w_o: weights hidden layer 2 to output layer
  12. p_drop_input: dropout rate for input layer
  13. p_drop_hidden: dropout rate for hidden layer
  14. output:
  15. h1: hidden layer 1
  16. h2: hidden layer 2
  17. py_x: output layer
  18. """
  19. X = dropout(X, p_drop_input)
  20. h1 = T.nnet.relu(T.dot(X, w_h1))
  21.  
  22. h1 = dropout(h1, p_drop_hidden)
  23. h2 = T.nnet.relu(T.dot(h1, w_h2))
  24.  
  25. h2 = dropout(h2, p_drop_hidden)
  26. py_x = softmax(T.dot(h2, w_o))
  27. return h1, h2, py_x

随机初始化权重矩阵:

In [4]:

  1. def init_weights(shape):
  2. return theano.shared(floatX(np.random.randn(*shape) * 0.01))
  3.  
  4. w_h1 = init_weights((784, 625))
  5. w_h2 = init_weights((625, 625))
  6. w_o = init_weights((625, 10))

定义变量:

In [5]:

  1. X = T.matrix()
  2. Y = T.matrix()

定义更新的规则,之前我们使用的是简单的 SGD,这次我们使用 RMSprop 来更新,其规则为:\begin{align}MS(w, t) & = \rho MS(w, t-1) + (1-\rho) \left(\left.\frac{\partial E}{\partial w}\right|{w(t-1)}\right)^2 \w(t) & = w(t-1) - \alpha \left.\frac{\partial E}{\partial w}\right|{w(t-1)} / \sqrt{MS(w, t)}\end{align}

In [6]:

  1. def RMSprop(cost, params, accs, lr=0.001, rho=0.9, epsilon=1e-6):
  2. grads = T.grad(cost=cost, wrt=params)
  3. updates = []
  4. for p, g, acc in zip(params, grads, accs):
  5. acc_new = rho * acc + (1 - rho) * g ** 2
  6. gradient_scaling = T.sqrt(acc_new + epsilon)
  7. g = g / gradient_scaling
  8. updates.append((acc, acc_new))
  9. updates.append((p, p - lr * g))
  10. return updates

训练函数:

In [7]:

  1. # 有 dropout,用来训练
  2. noise_h1, noise_h2, noise_py_x = model(X, w_h1, w_h2, w_o, 0.2, 0.5)
  3. cost = T.mean(T.nnet.categorical_crossentropy(noise_py_x, Y))
  4. params = [w_h1, w_h2, w_o]
  5. accs = [theano.shared(p.get_value() * 0.) for p in params]
  6. updates = RMSprop(cost, params, accs, lr=0.001)
  7. # 训练函数
  8. train = theano.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True)

预测函数:

In [8]:

  1. # 没有 dropout,用来预测
  2. h1, h2, py_x = model(X, w_h1, w_h2, w_o, 0., 0.)
  3. # 预测的结果
  4. y_x = T.argmax(py_x, axis=1)
  5. predict = theano.function(inputs=[X], outputs=y_x, allow_input_downcast=True)

训练:

In [9]:

  1. trX, teX, trY, teY = mnist(onehot=True)
  2.  
  3. for i in range(50):
  4. for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):
  5. cost = train(trX[start:end], trY[start:end])
  6. print "iter {:03d} accuracy:".format(i + 1), np.mean(np.argmax(teY, axis=1) == predict(teX))
  1. iter 001 accuracy: 0.943
  2. iter 002 accuracy: 0.9665
  3. iter 003 accuracy: 0.9732
  4. iter 004 accuracy: 0.9763
  5. iter 005 accuracy: 0.9767
  6. iter 006 accuracy: 0.9802
  7. iter 007 accuracy: 0.9795
  8. iter 008 accuracy: 0.979
  9. iter 009 accuracy: 0.9807
  10. iter 010 accuracy: 0.9805
  11. iter 011 accuracy: 0.9824
  12. iter 012 accuracy: 0.9816
  13. iter 013 accuracy: 0.9838
  14. iter 014 accuracy: 0.9846
  15. iter 015 accuracy: 0.983
  16. iter 016 accuracy: 0.9837
  17. iter 017 accuracy: 0.9841
  18. iter 018 accuracy: 0.9837
  19. iter 019 accuracy: 0.9835
  20. iter 020 accuracy: 0.9844
  21. iter 021 accuracy: 0.9837
  22. iter 022 accuracy: 0.9839
  23. iter 023 accuracy: 0.984
  24. iter 024 accuracy: 0.9851
  25. iter 025 accuracy: 0.985
  26. iter 026 accuracy: 0.9847
  27. iter 027 accuracy: 0.9851
  28. iter 028 accuracy: 0.9846
  29. iter 029 accuracy: 0.9846
  30. iter 030 accuracy: 0.9853
  31. iter 031 accuracy: 0.985
  32. iter 032 accuracy: 0.9844
  33. iter 033 accuracy: 0.9849
  34. iter 034 accuracy: 0.9845
  35. iter 035 accuracy: 0.9848
  36. iter 036 accuracy: 0.9868
  37. iter 037 accuracy: 0.9864
  38. iter 038 accuracy: 0.9866
  39. iter 039 accuracy: 0.9859
  40. iter 040 accuracy: 0.9857
  41. iter 041 accuracy: 0.9853
  42. iter 042 accuracy: 0.9855
  43. iter 043 accuracy: 0.9861
  44. iter 044 accuracy: 0.9865
  45. iter 045 accuracy: 0.9872
  46. iter 046 accuracy: 0.9867
  47. iter 047 accuracy: 0.9868
  48. iter 048 accuracy: 0.9863
  49. iter 049 accuracy: 0.9862
  50. iter 050 accuracy: 0.9856

原文: https://nbviewer.jupyter.org/github/lijin-THU/notes-python/blob/master/09-theano/09.13-modern-net-on-mnist.ipynb