2 Tensorflow on Kubernetes

TensorFlow 是一个端到端开源机器学习平台。它拥有一个包含各种工具、库和社区资源的全面灵活生态系统,可以让研究人员推动机器学习领域的先进技术的发展,并让开发者轻松地构建和部署由机器学习提供支持的应用。

2.1 在 K8s 中安装 Tensorflow

使用 Webkubectl 进入集群 Terminal, 新建文件 tensorflow.yml 并输入以下内容:

  1. ---
  2. apiVersion: v1
  3. kind: PersistentVolumeClaim
  4. metadata:
  5. name: tensorflow-pvc
  6. spec:
  7. accessModes:
  8. - ReadWriteOnce
  9. resources:
  10. requests:
  11. storage: 10Gi
  12. ---
  13. apiVersion: apps/v1
  14. kind: Deployment
  15. metadata:
  16. name: tensorflow
  17. spec:
  18. selector:
  19. matchLabels:
  20. k8s-app: tensorflow
  21. replicas: 1
  22. template:
  23. metadata:
  24. labels:
  25. k8s-app: tensorflow
  26. spec:
  27. containers:
  28. - name: tensorflow
  29. image: registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/jupyter:1.1.0-devel-gpu
  30. imagePullPolicy: IfNotPresent
  31. env:
  32. - name: PASSWORD
  33. value: mypassw0rd
  34. resources:
  35. limits:
  36. nvidia.com/gpu: 1
  37. volumeMounts:
  38. - mountPath: /usr/local/nvidia
  39. name: nvidia
  40. volumes:
  41. - name: nvidia
  42. persistentVolumeClaim:
  43. claimName: tensorflow-pvc
  44. ---
  45. apiVersion: v1
  46. kind: Service
  47. metadata:
  48. name: tensorflow-svc
  49. spec:
  50. type: NodePort
  51. ports:
  52. - name: tensorflow-port
  53. port: 80
  54. targetPort: 8888
  55. selector:
  56. k8s-app: tensorflow
  1. kubectl apply -f tensorflow.yml
  2. persistentvolumeclaim/tensorflow-pvc configured
  3. deployment.apps/tensorflow configured
  4. service/tensorflow-svc configured
  5. kubectl get pod
  6. NAME READY STATUS RESTARTS AGE
  7. tensorflow-79c5f4c48c-94l82 1/1 Running 0 101s
  8. kubectl get svc
  9. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  10. tensorflow-svc NodePort 179.10.100.67 <none> 80:32604/TCP 2m15s

访问 http://NODE_IP:32604 访问 Jupyter ,密码: mypassw0rd

jupter-login

2.2 运行 Tensorflow 深度学习示例

2.2.1 打开 Jupter 终端

jupter-select-terminal

jupter-terminal

2.2.2 使用 PIP 下载 KERAS

  1. pip install keras==2.0.6

2.2.3 示例代码实现

  1. import keras
  2. from tensorflow.python.client import device_lib
  3. # 获取 GPU 信息
  4. #num_gpus = sum([1 for d in local_device_protos if d.device_type == 'GPU'])
  5. #print("GPU : {}".format(num_gpus))
  6. # 下载 MNIST 数据集
  7. mnist = keras.datasets.mnist
  8. (x_train, y_train), (x_test, y_test) = mnist.load_data()
  9. x_train, x_test = x_train / 255.0, x_test / 255.0
  10. # 将模型的各层堆叠起来,以搭建 keras.Sequential 模型。为训练选择优化器和损失函数:
  11. model = keras.models.Sequential([
  12. keras.layers.Flatten(input_shape=(28, 28)),
  13. keras.layers.Dense(128, activation='relu'),
  14. keras.layers.Dropout(0.2),
  15. keras.layers.Dense(10, activation='softmax')
  16. ])
  17. model.compile(optimizer='adam',
  18. loss='sparse_categorical_crossentropy',
  19. metrics=['accuracy'])
  20. # 训练并验证模型
  21. model.fit(x_train, y_train, epochs=5)
  22. model.evaluate(x_test, y_test, verbose=2)

2.2.4 运行深度学习服务

  1. python mnist.py
  2. Epoch 1/5
  3. 60000/60000 [==============================] - 9s - loss: 0.2984 - acc: 0.9137
  4. Epoch 2/5
  5. 60000/60000 [==============================] - 7s - loss: 0.1422 - acc: 0.9575
  6. Epoch 3/5
  7. 60000/60000 [==============================] - 7s - loss: 0.1102 - acc: 0.9671
  8. Epoch 4/5
  9. 60000/60000 [==============================] - 7s - loss: 0.0885 - acc: 0.9728
  10. Epoch 5/5
  11. 60000/60000 [==============================] - 7s - loss: 0.0758 - acc: 0.9761
  12. [0.074264542010473084, 0.97750000000000004] #这个照片分类器的准确度已经达到 98%