kubernetes GPU分布式集群配置与安装

Posted by ShenHengheng on 2019-03-20

最近单位刚来一台T630 GPU服务器,我想在日常的深度学习的训练此外,想让它docker容器化并且使其成为 Kubernetes 节点,让深度学习任务的训练在kubernetes中完成。注意:本教程的篇幅较长!

主要分为以下几部分:

  • 环境准备
  • 安装cuda环境并测试
  • 安装docker
  • 安装kubernetes
  • 测试GPU

环境准备

节点 ip 配置
gpu服务器 172.16.3.33 centos7系统,2*GTX1080Ti

修改默认yum源为国内yum镜像源

$ cp /etc/yum.repos.d/CentOS-Base.repo{,.bak}
$ wget http://mirrors.aliyun.com/repo/Centos-7.repo -O /etc/yum.repos.d/CentOS-Base.repo

安装开发环境

$ yum groupinstall -y "Development Tools"

网卡设备开机自启动

$ systemctl start network && systemctl enable network
$ ifup em1 # 默认centos网卡设备在开机时不启动
$ sed -i 's/ONBOOT=no/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-em1
$ sed -i 's/ONBOOT=no/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-em2

如果你想将其中一个网卡设置为静态网卡,可以参考下面的格式:(比如em1网卡)

$ cat /etc/sysconfig/network-scripts/ifcfg-em1
TYPE="Ethernet"
BOOTPROTO="static"
DEFROUTE="yes"
PEERDNS="yes"
PEERROUTES="yes"
IPADDR=172.16.3.33
GATEWAY=172.16.3.1
NETMASK=255.255.255.0
DNS1=172.16.3.1
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_FAILURE_FATAL="no"
NAME=em1
UUID=5f2f3298-1185-43d8-8509-62795e7db8f9
DEVICE=em1
ONBOOT=yes

安装cuda环境并测试

检查显卡驱动和型号

1) 添加软件源

$ sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org 
$ sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

2) 安装NVIDIA驱动检测

$ sudo yum install nvidia-detect
$ nvidia-detect -v
Probing for supported NVIDIA devices...
[102b:0534] Matrox Electronics Systems Ltd. G200eR2
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 418.43 NVIDIA driver kmod-nvidia
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 418.43 NVIDIA driver kmod-nvidia

可以看出这两块GPU的型号和需要的驱动型号。

安装显卡驱动

访问 https://www.geforce.cn/drivers 网站,选择 手动搜索驱动程序。设置如下的条件,并开始搜索。

选择搜索结果的第一项即可。

屏幕快照 2019-03-20 下午1.13.07.png

点击获取下载链接:http://cn.download.nvidia.com/XFree86/Linux-x86_64/418.43/NVIDIA-Linux-x86_64-418.43.run,然后使用ssh登陆服务器使用wget工具即可下载。

$ cd ~/0315-cuda
$ wget -r -np -nd http://cn.download.nvidia.com/XFree86/Linux-x86_64/418.43/NVIDIA-Linux-x86_64-418.43.run

解决显卡冲突

因为NVIDIA驱动会和系统自带nouveau驱动冲突,执行下面的命令查看该驱动状态:

$ lsmod | grep nouveau

img

修改/etc/modprobe.d/blacklist.conf 文件,以阻止 nouveau 模块的加载,如果系统没有该文件需要新建一个,这里使用root权限,普通用户无法再在/etc内生成.conf文件。

$ su root -
$ echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf

重新建立initramfs image文件

$ mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
$ dracut /boot/initramfs-$(uname -r).img $(uname -r)

重启

$ reboot

安装显卡驱动

如果此处安装了,那么安装cuda时就不安装,但建议和安装cuda时一起安装。

$ chmod +x NVIDIA-Linux-x86_64-418.43.run
$ ./NVIDIA-Linux-x86_64-418.43.run

如果安装完成,可以运行命令查看显卡状态

$ nvidia-smi
Wed Mar 20 13:23:40 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 20%   25C    P8     9W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
| 20%   23C    P8     8W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

如果出现上面所示的情况,证明你的驱动安装成功了。

安装cuda

官网下载包https://developer.nvidia.com/cuda-downloads,一定要对应自己的版本。

$ ./cuda_10.1.105_418.39_linux.run

如果上面已经安装了驱动,这里安装的时候就不需要安装驱动了。

测试cuda

$ cd /root/NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 1080 Ti (GPU0) -> GeForce GTX 1080 Ti (GPU1) : Yes
> Peer access from GeForce GTX 1080 Ti (GPU1) -> GeForce GTX 1080 Ti (GPU0) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 2
Result = PASS

如果出现PASS,就证明你的cuda环境安装完成了。

安装docker和kubernetes

请参考作者的另一篇博客文章:https://readailib.com/2018/11/15/kubernetes/kubernetes-install-note/

安装nvidia-docker

参考: https://github.com/NVIDIA/nvidia-docker

docker-ce版本:

# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo yum install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

安装Nvidia Device Plugin

参考:https://k2r2bai.com/2018/03/01/kubernetes/nvidia-device-plugin/

参考:https://github.com/NVIDIA/k8s-device-plugin(官方)

$ cp /etc/docker/daemon.json{,.bak}
$ vi /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

注意:前提是已经安装好nvidia-docker

接下来就是启动kubernetes集群,并且将gpu节点加入到集群中。具体的操作命令这里就不再赘述。

Enabling GPU Support in Kubernetes

Once you have enabled this option on all the GPU nodes you wish to use, you can then enable GPU support in your cluster by deploying the following Daemonset:

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

然后查看pod的运行状态:

$ kubectl -n kube-system get po -o wide
NAME                                       READY     STATUS    RESTARTS   AGE       IP               NODE
...
nvidia-device-plugin-daemonset-bncw2       1/1       Running   0          2m        10.244.41.135    kube-gpu-node1
nvidia-device-plugin-daemonset-ddnhd       1/1       Running   0          2m        10.244.152.132   kube-gpu-node2

测试 GPU

$ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
NAME         GPU
bigdata      <none>
gpucluster   2

测试实例

$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
  - image: nvidia/cuda
    name: cuda    
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1
EOF
pod "gpu-pod" created
$ kubectl get po -a -o wide
NAME      READY     STATUS      RESTARTS   AGE       IP              NODE
gpu-pod   0/1       Completed   0          50s       10.244.41.136   kube-gpu-node1

$ kubectl logs gpu-pod -c cuda
Wed Mar 20 04:23:49 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 20%   24C    P8     8W / 250W |      0MiB / 11178MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

测试tensorflow,新建一个deployment,名字为tf-gpu-dep.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tf-gpu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf-gpu
  template:
    metadata:
     labels:
       app: tf-gpu
    spec:
      containers:
      - name: tensorflow
        image: tensorflow/tensorflow:latest-gpu
        ports:
        - containerPort: 8888
        resources:
          limits:
            nvidia.com/gpu: 1

利用 kubectl 建立 Deployment,并暴露 Jupyter port:

$ kubectl create -f tf-gpu-dp.yml
deployment "tf-gpu" created

$ kubectl expose deploy tf-gpu --type LoadBalancer --external-ip=172.16.3.33 --port 8888 --target-port 8888
service "tf-gpu" exposed

$ kubectl get po,svc -o wide
NAME                         READY     STATUS    RESTARTS   AGE       IP               NODE
po/tf-gpu-6f9464f94b-pq8t9   1/1       Running   0          1m        10.244.152.133   kube-gpu-node2

NAME             TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE       SELECTOR
svc/kubernetes   ClusterIP      10.96.0.1        <none>          443/TCP          23h       <none>
svc/tf-gpu       LoadBalancer   10.105.104.183   172.22.132.53   8888:30093/TCP   12s       app=tf-gpu

确认无误后,通过 logs 指令取得 token。

这里执行一个简单的例子,并在用 logs 指令查看就能看到 Pod 通过 NVIDIA Device Plugins 使用 GPU:

$ kubectl logs -f tf-gpu-6f9464f94b-pq8t9
...
2018-03-15 07:37:22.022052: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-03-15 07:37:22.155254: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-15 07:37:22.155565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 2.95GiB freeMemory: 2.88GiB
2018-03-15 07:37:22.155586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-15 07:37:22.346590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2598 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 3GB, pci bus id: 0000:01:00.0, compute capability: 6.1)

最后因为目前 Pod 会绑定整张 GPU 使用,因此当无多余显卡時就让 Pod 处于 Pending:

$ kubectl scale deploy tf-gpu --replicas=3
$ kubectl get po -o wide
NAME                      READY     STATUS    RESTARTS   AGE       IP               NODE
tf-gpu-6f9464f94b-42xcf   0/1       Pending   0          4s        <none>           <none>
tf-gpu-6f9464f94b-nxdw5   1/1       Running   0          12s       10.244.41.138    kube-gpu-node1
tf-gpu-6f9464f94b-pq8t9   1/1       Running   0          5m        10.244.152.133   kube-gpu-node2