Deeplearning Algorithms tutorial

谷歌的人工智能位于全球前列,在图像识别、语音识别、无人驾驶等技术上都已经落地。而百度实质意义上扛起了国内的人工智能的大旗,覆盖无人驾驶、智能助手、图像识别等许多层面。苹果业已开始全面拥抱机器学习,新产品进军家庭智能音箱并打造工作站级别Mac。另外,腾讯的深度学习平台Mariana已支持了微信语音识别的语音输入法、语音开放平台、长按语音消息转文本等产品,在微信图像识别中开始应用。全球前十大科技公司全部发力人工智能理论研究和应用的实现,虽然入门艰难,但是一旦入门,高手也就在你的不远处!

机器学习主要有三种方式:监督学习,无监督学习与半监督学习。

(1)监督学习:从给定的训练数据集中学习出一个函数,当新的数据输入时,可以根据函数预测相应的结果。监督学习的训练集要求是包括输入和输出,也就是特征和目标。训练集中的目标是有标注的。如今机器学习已固有的监督学习算法有可以进行分类的,例如贝叶斯分类,SVM,ID3,C4.5以及分类决策树,以及现在最火热的人工神经网络,例如BP神经网络,RBF神经网络,Hopfield神经网络、深度信念网络和卷积神经网络等。人工神经网络是模拟人大脑的思考方式来进行分析,在人工神经网络中有显层,隐层以及输出层,而每一层都会有神经元,神经元的状态或开启或关闭,这取决于大数据。同样监督机器学习算法也可以作回归,最常用便是逻辑回归。

(2)无监督学习:与有监督学习相比,无监督学习的训练集的类标号是未知的,并且要学习的类的个数或集合可能事先不知道。常见的无监督学习算法包括聚类和关联,例如K均值法、Apriori算法。

(3)半监督学习:介于监督学习和无监督学习之间,例如EM算法。

如今的机器学习领域主要的研究工作在三个方面进行:1)面向任务的研究,研究和分析改进一组预定任务的执行性能的学习系统;2)认知模型,研究人类学习过程并进行计算模拟;3)理论的分析,从理论的层面探索可能的算法和独立的应用领域算法。

径向基函数网络(Radial Basis Function Network)

在数学建模领域,径向基函数网络(Radial basis function network,缩写 RBF network)是一种使用径向基函数作为激活函数的人工神经网络。 RBF神经网络是基于人脑的神经元细胞对外界反应的局部性而提出的新颖的、有效的前馈式神经网络,具有良好的局部逼近特性。它的数学理论基础成形于1985年由Powell首先提出的多变量插值的径向基函数,1988年被Broomhead和Lowe应用到神经网络设计领域,最终形成了RBF神经网络。 径向基函数网络的输出是输入的径向基函数和神经元参数的线性组合。径向基函数网络具有多种用途,包括包括函数近似法、时间序列预测、分类和系统控制。他们最早由布鲁姆赫德(Broomhead)和洛维(Lowe)在1988年建立。

RBF神经网络是一种三层前馈神经网络。第一层为输入层,由信号源节点构成,将网络与外界环境连结起来,节点数由输入信号的维数确定;第二层为隐含层(径向基层),其节点由径向基函数构成,实现输入空间到隐层空间的非线性变换;第三层为输出层(线性层),对输入模式做出响应,其节点由隐含层节点给出的基函数的线性组合来计算。

径向基函数网络(Radial Basis Function Network) - 图1

径向基神经网络的激活函数采用径向基函数,通常定义为空间任一点到某一中心之间欧氏距离的单调函数。径向基神经网络的激活函数是以输入向量和权值向量之间的距离径向基函数网络(Radial Basis Function Network) - 图2 为自变量的。径向神经网络的激活函数一般表达式为

径向基函数网络(Radial Basis Function Network) - 图3

随着权值和输入向量之间距离的减少,网络输出是递增的,当输入向量和权值向量一致时,神经元输出1。b为阈值,用于调整神经元的灵敏度。利用径向基神经元和线性神经元可以建立广义回归神经网络,该种神经网络适用于函数逼近方面的应用;径向基神经元和竞争神经元可以组件概率神经网络,此种神经网络适用于解决分类问题。输出层和隐含层所完成的任务是不同的,因而它们的学习策略也不相同。输出层是对线性权进行调整,采用的是线性优化策略,因而学习速度较快。而隐函数是对激活函数(格林函数或高斯函数,一般为高斯函数)的参数进行调整,采用的是非线性优化策略,因而学习速度较慢。

尽管RBF网络的输出是隐单元输出的线性加权和,学习速度加快,但并不等于径向基神经网络就可以取代其他前馈网络。这是因为径向神经网络很可能需要比BP网络多得多的隐含层神经元来完成工作。

径向基神经网络中需要求解的参数有三个基函数的中心、方差以及隐含层到输出层的权值。根据径向基函数中心选取方法的不同,RBF网络有多种学习方法。下面介绍自组织选取中心的RBF神经网络学习法。此方法由两个阶段组成:

  • 自组织学习阶段,此阶段为无监督学习过程,求解隐含层基函数的中心与方差。

  • 监督学习阶段,此阶段求解隐含层到输出层之间的权值。

径向基神经网络中常用的径向基函数是高斯函数,因此径向基神经网络的激活函数可表示为:

径向基函数网络(Radial Basis Function Network) - 图4

由此可得,径向基神经网络的结构可得到网络的输出为:

径向基函数网络(Radial Basis Function Network) - 图5

其中径向基函数网络(Radial Basis Function Network) - 图6为第p个输入样本。h为隐含层的结点数。

如果d是样本的期望输出值,那么基函数的方差可表示为:

径向基函数网络(Radial Basis Function Network) - 图7

基于K-均值聚类方法求取基函数中心c:

  • 网络初始化 随机选取h个训练样本作为聚类中心径向基函数网络(Radial Basis Function Network) - 图8
  • 将输入的训练样本集合按最近邻规则分组,按照径向基函数网络(Radial Basis Function Network) - 图9 与中心为径向基函数网络(Radial Basis Function Network) - 图10之间的欧式距离将径向基函数网络(Radial Basis Function Network) - 图11分配到输入样本的各个聚类集合径向基函数网络(Radial Basis Function Network) - 图12之中。
  • 重新调整聚类中心 计算各个聚类集合径向基函数网络(Radial Basis Function Network) - 图13 中训练样本的平均值,即新的聚类中心径向基函数网络(Radial Basis Function Network) - 图14, 如果新的聚类中心不再发生变化,所得到的径向基函数网络(Radial Basis Function Network) - 图15就是RBF神经网络最终的基函数中心,否则返回上一步进行下一轮求解.

求解方差径向基函数网络(Radial Basis Function Network) - 图16:

  • 该RBF神经网络的基函数为高斯函数,因此方差径向基函数网络(Radial Basis Function Network) - 图17可由下式求解得出:

    径向基函数网络(Radial Basis Function Network) - 图18

其中径向基函数网络(Radial Basis Function Network) - 图19是所选取中心之间的最大距离.

计算隐含层和输出层之间的权值:

  • 用最小二乘法直接计算得到:

    径向基函数网络(Radial Basis Function Network) - 图20

应用示例

  1. from __future__ import division
  2. import random
  3. import pylab
  4. import math
  5. SAMPLES = 75
  6. EPOCHS = 100
  7. TESTS = 12
  8. RUNS = 3
  9. MOD = 12
  10. def h(x):
  11. """Function to approximate: y = 0.5 + 0.4sin(2πx)."""
  12. # note: pylab.sin can accept a numpy.ndarray, but math.sin cannot
  13. return 0.5 + 0.4*pylab.sin(pylab.pi*2*x)
  14. def noise(x):
  15. """Add uniform noise in intervale [-0.1, 0.1]."""
  16. return x + random.uniform(-0.1, 0.1)
  17. def sample(n):
  18. """Return sample of n random points uniformly distributed in [0, 1]."""
  19. a = [random.random() for x in range(n)]
  20. a.sort()
  21. return a
  22. def gaussian(radial, x):
  23. """Return gaussian radial function.
  24. Args:
  25. radial: (num, num) of gaussian (base, width^2) pair
  26. x: num of input
  27. Returns:
  28. num of gaussian output
  29. """
  30. base, width2 = radial
  31. power = -1 / width2 / 2 * (x-base)**2
  32. y = pylab.exp(power)
  33. return y
  34. def output(radials, weights, x):
  35. """Return set of linearly combined gaussian functions.
  36. Args:
  37. radials: [(num, num) of (base, width^2) pairs
  38. weights: [num] of radial weights, |weights| -1 = |radials|
  39. x: num of input
  40. Returns:
  41. num of linear combination of radial functions.
  42. """
  43. y = 0
  44. for radial, weight in zip(radials, weights[:-1]):
  45. y += gaussian(radial, x) * weight
  46. # add bias
  47. y += weights[-1]
  48. return y
  49. def update_weights(eta, weights, radials, x, y, d):
  50. """Update weight vector.
  51. Returns:
  52. [num] of updated weight vector, len = |weights|
  53. """
  54. new_weights = []
  55. err = d-y
  56. for radial, weight in zip(radials, weights[:-1]):
  57. w = weight + (eta * err * gaussian(radial, x))
  58. new_weights.append(w)
  59. # update bias
  60. w = weights[-1] + (eta * err)
  61. new_weights.append(w)
  62. return new_weights
  63. def k_means(input, k):
  64. """Return n Gaussian centers computed by K-means algorithm from sample x.
  65. Args:
  66. input: [num] of input vector
  67. k: int number of bases, <= |set(input)|
  68. Returns:
  69. [(num, [num])] k-size list of (center, input cluster) pairs.
  70. """
  71. # initialize k bases as randomly selected unique elements from input
  72. bases = random.sample(set(input), k)
  73. # place all inputs in the first cluster to initialize
  74. clusters = [ (x, 0) for x in input ]
  75. updated = True
  76. while(updated):
  77. updated=False
  78. for i in range(0, len(clusters)):
  79. x, m = clusters[i]
  80. distances = [(abs(b-x), j) for j, b in enumerate(bases)]
  81. d, j = min(distances)
  82. # update to move x to a new base cluster
  83. if m != j:
  84. updated = True
  85. clusters[i] = (x, j)
  86. # update bases
  87. if updated:
  88. base_sums = [ [0,0] for s in range(k)]
  89. for x, m in clusters:
  90. base_sums[m][0] += x
  91. base_sums[m][1] += 1
  92. # check for divide by zero errors
  93. new_bases = []
  94. for s, n in base_sums:
  95. # avoid rare edge case, <1% @ n=25
  96. # division by zero: select a new base from input
  97. if n == 0:
  98. base = random.sample(set(input), 1)[0]
  99. else:
  100. base = s / n
  101. new_bases.append(base)
  102. bases = new_bases
  103. # generate returned value
  104. response = [ (b, []) for b in bases ]
  105. for x, m in clusters:
  106. response[m][1].append(x)
  107. return response
  108. def variance_width(k_meaned_x):
  109. """Return mean, variance pairs computed from k_means(x, k).
  110. Args:
  111. k_meaned_x: [(num, [num])] of (base, input cluster) pairs
  112. Returns:
  113. [(num, num)] of (center, width^2) pairs.
  114. """
  115. response = []
  116. for base, cluster in k_meaned_x:
  117. if len(cluster) > 1:
  118. var = sum([(base-x)**2 for x in cluster]) / len(cluster)
  119. # this actually produces excellent approximations
  120. # var = sum([(base-x)**2 for x in cluster])
  121. else:
  122. var = None
  123. response.append((base, var))
  124. # set |cluster| widths to mean variance of other clusters
  125. vars = [v for b, v in response if v]
  126. if len(vars) == 0:
  127. raise Exception("No variance: cannot compute mean variance")
  128. else:
  129. var_mean = sum(vars) / len(vars)
  130. for i in range(len(response)):
  131. base, var = response[i]
  132. if not var:
  133. response[i] = (base, var_mean)
  134. return response
  135. def shared_width(k_meaned_x):
  136. """Return shared gaussian widths computed from k_means(x, k).
  137. Args:
  138. k_meaned_x: [(num, [num])] of (base, input cluster) pairs
  139. Returns:
  140. [(num, num)] of (center, width^2) pairs.
  141. """
  142. assert(len(k_meaned_x) > 1)
  143. # ignore clusters
  144. bases = [b for b, cluster in k_meaned_x]
  145. # compute distances between adjancent bases
  146. s_bases = bases[:]
  147. s_bases.sort()
  148. distances = map(lambda p: abs(p[0]-p[1]), zip(s_bases, s_bases[1:]))
  149. max_d = max(distances)
  150. sigma_sq = (max_d / 2**0.5)**2
  151. # map to outputs
  152. response = [(b, sigma_sq) for b in bases]
  153. return response
  154. def plot_instance(name, x, ideal_y, measured_y, trained_y, new_x, estimated_y):
  155. """Plot function graph, save to file.
  156. Effect: saves png file of plot to currect directory.
  157. NOTE: use local graph variable
  158. Args:
  159. name: str of plot name, used in file name like "name.png"
  160. x: [num] input vector
  161. ideal_y: [num] ideal output vector
  162. measured_y: [num] noisy output vector
  163. trained_y: [num] trained output vector
  164. new_x: [num] new input sample not used in training
  165. estimated_y: [num] estimated output from trained RBN
  166. """
  167. # plot graph
  168. pylab.rc('text', usetex=True)
  169. pylab.rc('font', family='serif')
  170. pylab.xlabel('$x$')
  171. pylab.ylabel('$y = 0.5 + 0.4\sin(2 \pi x)$')
  172. pylab.title('RBF Network: %s' % name)
  173. pylab.plot(x, ideal_y, 'g', label="Ideal")
  174. pylab.plot(x, measured_y, 'bo', label="Measured")
  175. pylab.plot(x, trained_y, 'y', label="Trained")
  176. pylab.plot(new_x, estimated_y, 'r', label="Generalized")
  177. pylab.legend()
  178. # pylab.grid(True)
  179. filename = name
  180. filename = filename.replace(' ', '_').replace('\\', '').replace('$', '')
  181. filename = filename.replace(',', '')
  182. # save figure
  183. pylab.savefig("%s.png" % filename)
  184. # clear this figure
  185. # note: use http://matplotlib.sourceforge.net/users/artists.html#artist-tutorial
  186. # in the future
  187. pylab.clf()
  188. pylab.cla()
  189. def error(actual, expected):
  190. """Return error from actual to expected.
  191. Args
  192. actual: [num] of sampled output
  193. expected: [num] of expected ouput, ||expected|| = ||actual||
  194. Returns:
  195. num of average distance between actual and expected
  196. """
  197. sum_d = 0
  198. for a, e in zip(actual, expected):
  199. sum_d += abs(a-e)
  200. err = sum_d / len(expected)
  201. return err
  202. def run_test(eta, k, tests=TESTS, runs=RUNS, f_width=variance_width, graph_mod=MOD):
  203. """Run an RBF training test set; plot, return errors from results.
  204. Args:
  205. eta: num of training rate
  206. k: num of bases
  207. tests: num of sample set iterations
  208. runs: num of network generation iterations
  209. f_width: function to generate radial widths
  210. graph_mod: num of after how many iterations to plot a graph
  211. Returns:
  212. {str: [num]} such that n = (tests*runs) and:
  213. "sample_err": [num] of n sampling errors
  214. "train_err": [num] of n training errors
  215. "gen_err": [num] of n estimation errors
  216. """
  217. results = {
  218. "sample_err": [],
  219. "train_err": [],
  220. "gen_err": [],
  221. }
  222. f_name = f_width.__name__.capitalize().split('_')[0]
  223. for test in range(1,tests+1):
  224. print "## K=%d, eta=%.2f, Test=%d" % (k, eta, test)
  225. # compute input samples
  226. input = sample(SAMPLES)
  227. test_input = sample(SAMPLES)
  228. # compute desired and ideal outputs
  229. ideal_y = map(h, input)
  230. test_ideal_y = map(h, test_input)
  231. measured_y = map(noise, ideal_y)
  232. # estimate each sample three times
  233. for run in range(1,runs+1):
  234. # initialize K radials
  235. radials = f_width(k_means(input, k))
  236. # k+1 weights, last weight is bias
  237. weights = [random.uniform(-0.5, 0.5) for x in range(k+1)]
  238. # train all epochs
  239. for i in range(EPOCHS):
  240. # train one epoch
  241. for x, d in zip(input, measured_y):
  242. y = output(radials, weights, x)
  243. weights = update_weights(eta, weights, radials, x, y, d)
  244. # examine results
  245. trained_y = map(lambda x: output(radials, weights, x), input)
  246. estimated_y = map(lambda x: output(radials, weights, x), test_input)
  247. sample_err = error(measured_y, ideal_y)
  248. train_err = error(trained_y, measured_y)
  249. gen_err = error(estimated_y, test_ideal_y)
  250. # save results
  251. results["sample_err"].append(sample_err)
  252. results["train_err"].append(train_err)
  253. results["gen_err"].append(gen_err)
  254. # print "Run: %d, Sample: %.4f, Train: %.4f, General: %.4f" \
  255. # % (run, sample_err, train_err, gen_err)
  256. # graph some set of results
  257. iteration = (test-1)*runs + run
  258. if (iteration % graph_mod) == 0:
  259. # print "Graphing Test=%d, Run=%d" % (test, run)
  260. name = "%s $K=%d, \eta =%.2f, E=%.3f$ (%d-%d)" % \
  261. (f_name, k, eta, gen_err, test, run)
  262. plot_instance( \
  263. name, input, ideal_y, measured_y, trained_y, test_input, estimated_y)
  264. return results
  265. def stats(values):
  266. """Return tuple of common statistical measures.
  267. Returns:
  268. (num, num, num, num) as (mean, std, min, max)
  269. """
  270. mean = sum(values) / len(values)
  271. sum_sqs = reduce(lambda x, y: x + y*y, values)
  272. var = sum([(mean-x)**2 for x in values]) / len(values)
  273. var = (sum_sqs - len(values)*mean**2) / len(values)
  274. std = var**0.5
  275. min_var, max_var = min(values), max(values)
  276. return (mean, std, min_var, max_var)
  277. def main():
  278. random.seed()
  279. # need final report
  280. for f_width in (variance_width, shared_width):
  281. for eta in (0.01, 0.02):
  282. for k in (5, 10, 15, 20, 25):
  283. print ""
  284. print "BEGIN PARAMETER TEST SUITE"
  285. print "K=%d, eta=%.2f, f_width=%s, Tests=%d, Runs=%d" % \
  286. (k, eta, f_width.__name__, TESTS, RUNS)
  287. print "+++++++++++++++++++++++++++++++++++"
  288. r = run_test(k=k, eta=eta, f_width=f_width)
  289. print "+++++++++++++++++++++++++++++++++++"
  290. print "RESULTS"
  291. print "K=%d, eta=%.2f, f_width=%s, Tests=%d, Runs=%d" % \
  292. (k, eta, f_width.__name__, TESTS, RUNS)
  293. for name, values in r.items():
  294. print name
  295. print "mean=%.4f, std=%.4f, min=%.4f, max=%.4f" % \
  296. stats(values)
  297. print "+++++++++++++++++++++++++++++++++++"
  298. if __name__ == "__main__":
  299. main()