1 部署带有 GPU 的 Kubernetes 集群
1.1 先决条件
- 至少一台 Worker 节点带有 NVIDIA GPU 卡
- Kubernetes 1.6.X: Package >= 1.16.4
- Kubernetes 1.5.X: Package >= 1.15.7
1.2 集群规划
名称 | CPU (核心) | 内存 (GB) | 操作系统 | GPU (个) | 角色 |
---|---|---|---|---|---|
master | 4 | 8 | CentOS 7.6 | 0 | Master |
worker1 | 4 | 20 | CentOS 7.6 | 1 | Worker |
worker2 | 4 | 8 | CentOS 7.6 | 0 | Worker |
nfs | 1 | 2 | CentOS 7.6 | 0 | NFS |
1.3 添加 GPU 主机
1.4 创建 K8s 集群
部署步骤请参考在自行准备的主机上部署 K8s 集群。
1.5 验证 GPU 调度
使用 Webkubectl 进入集群 Terminal, 新建文件 gpu.yml 并输入以下内容:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nvidia-deployment
spec:
replicas: 2 # 创建POD副本数
selector:
matchLabels:
name: nvidia-gpu-deploy
template:
metadata:
labels:
name: nvidia-gpu-deploy
spec:
containers:
- name: cuda-container
image: ubuntu
command: ["sleep"]
args: ["100000"]
resources:
limits:
nvidia.com/gpu: 1 # 使用 GPU 卡的数量
kubectl apply -f gpu.yml
kubectl get pod
NAME READY STATUS RESTARTS AGE
nvidia-deployment-64589d94d-8m6rd 1/1 Running 0 119s # 获取到 GPU 卡 Running
nvidia-deployment-64589d94d-lzfph 0/1 Pending 0 86s # 未获取到 GPU 卡 Pending
kubectl exec -it nvidia-deployment-64589d94d-8m6rd nvidia-smi
Mon Jan 13 08:16:36 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:06.0 Off | 0 |
| N/A 33C P8 7W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# POD 中已经可以访问到 GPU 设备
当前内容版权归 KubeOperator 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 KubeOperator .