K8s集群部署

Posted by ShenHengheng on 2018-11-15

重点参考

实验环境

采用 CentOS7.4 minimual,docker 1.13,kubeadm 1.10.0,etcd 3.0, k8s 1.10.0 我们这里选用两个个节点搭建一个实验环境。

192.168.59.133 k8smaster
192.168.59.150 k8snode1

下面开始准备环境。

  1. 配置好各节点hosts文件

    $ cat>> /etc/hosts  << EOF
    192.168.59.133 k8smaster
    192.168.59.150 k8snode1
    EOF
    
  2. 关闭系统防火墙

    $ systemctl stop firewalld && systemctl disable firewalld
    
  3. 关闭SElinux

    $ setenforce 0
    $ sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
    

    重启生效!

  4. 关闭swap

    $ swapoff -a
    
  5. 配置系统内核参数

    使流过网桥的流量也进入iptables/netfilter框架中,在/etc/sysctl.conf中添加以下配置:

    net.bridge.bridge-nf-call-iptables = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    

    然后执行下面的命令:

     $ sysctl -p
    

使用kubeadm安装

  1. 首先配置阿里K8S YUM源

    $ cat <<EOF > /etc/yum.repos.d/kubernetes.repo
    [kubernetes]
    name=Kubernetes
    baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
    enabled=1
    gpgcheck=0
    EOF
    $ yum -y install epel-release
    $ yum clean all
    $ yum makecache
    
  2. 安装kubeadm和相关工具包

    $ yum -y install docker kubelet kubeadm kubectl kubernetes-cni
    
  3. 启动Docker与kubelet服务

    $ systemctl enable docker && systemctl start docker
    $ systemctl enable kubelet && systemctl start kubelet
    

    提示:此时kubelet的服务运行状态是异常的,因为缺少主配置文件kubelet.conf。但可以暂不处理,因为在完成Master节点的初始化后才会生成这个配置文件。

  4. 下载K8S相关镜像(可以直接看5)

    因为无法直接访问gcr.io下载镜像,所以需要配置一个国内的容器镜像加速器 配置一个阿里云的加速器:(可省略)

    登录 https://cr.console.aliyun.com/

    在页面中找到并点击镜像加速按钮,即可看到属于自己的专属加速链接,选择Centos版本后即可看到配置方法。

    提示: 在阿里云上使用 Docker 并配置阿里云镜像加速器,可能会遇到 daemon.json 导致 docker daemon 无法启动的问题,可以通过以下方法解决。

    你需要的是编辑

    $ vim /etc/sysconfig/docker
    

    然后

    OPTIONS='--selinux-enabled --log-driver=journald --registry-mirror=http://xxxx.mirror.aliyuncs.com'
    

    registry-mirror 输入你的镜像地址

    最后

    $ service docker restart
    

    重启 daemon. 然后

    $ ps aux |grep docker
    

    然后你就会发现带有镜像的启动参数了。

  5. 下载K8S相关镜像

    OK,解决完加速器的问题之后,开始下载k8s相关镜像,下载后将镜像名改为k8s.gcr.io/开头的名字,以便kubeadm识别使用。

    #!/bin/bash
    images=(kube-proxy-amd64:v1.10.0 kube-scheduler-amd64:v1.10.0 kube-controller-manager-amd64:v1.10.0 kube-apiserver-amd64:v1.10.0
    etcd-amd64:3.1.12 pause-amd64:3.1 kubernetes-dashboard-amd64:v1.8.3 k8s-dns-sidecar-amd64:1.14.8 k8s-dns-kube-dns-amd64:1.14.8
    k8s-dns-dnsmasq-nanny-amd64:1.14.8)
    for imageName in ${images[@]} ; do
      docker pull keveon/$imageName
      docker tag keveon/$imageName k8s.gcr.io/$imageName
      docker rmi keveon/$imageNamedone
    

    上面的shell脚本主要做了3件事,下载各种需要用到的容器镜像、重新打标记为符合k8s命令规范的版本名称、清除旧的容器镜像。

    提示:镜像版本一定要和kubeadm安装的版本一致,否则会出现time out问题。

  6. 初始化安装K8S Master

    执行上述shell脚本,等待下载完成后,执行kubeadm init.

    $ kubeadm init --token=102952.1a7dd4cc8d1f4cc5 --kubernetes-version 1.10.0
    ····································
    ··········································
    kubeadm join 192.168.59.133:6443 --token 102952.1a7dd4cc8d1f4cc5 --discovery-token-ca-cert-hash sha256:6f1a864c6f530351f9ec7a42f74404497fcbe91ad7bf726bffd8cb3e3c333a38
    

    最后会出现这个信息,在这个信息会非常有用

    $ kubeadm join 192.168.59.133:6443 --token 102952.1a7dd4cc8d1f4cc5 --discovery-token-ca-cert-hash sha256:6f1a864c6f530351f9ec7a42f74404497fcbe91ad7bf726bffd8cb3e3c333a38
    

    提示:选项–kubernetes-version=v1.10.0是必须的,否则会因为访问google网站被墙而无法执行命令。这里使用v1.10.0版本,刚才前面也说到了下载的容器镜像版本必须与K8S版本一致否则会出现time out。

    上面的命令大约需要1分钟的过程,期间可以观察下tail -f /var/log/message日志文件的输出,掌握该配置过程和进度。上面最后一段的输出信息保存一份,后续添加工作节点还要用到。

  7. 配置kubectl认证信息 # 对于非root用户

    $ mkdir -p $HOME/.kube
    $ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    $ sudo chown $(id -u):$(id -g) $HOME/.kube/config
    

    # 对于root用户

    $ export KUBECONFIG=/etc/kubernetes/admin.conf
    

    也可以直接放到~/.bash_profile

    $ echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
    
  8. 安装flannel网络

    $ mkdir -p /etc/cni/net.d/
    $ cat <<EOF> /etc/cni/net.d/10-flannel.conf
    {“name”: “cbr0”,
    “type”: “flannel”,
    “delegate”: {“isDefaultGateway”: true}
    }
    EOF
    $ mkdir /usr/share/oci-umount/oci-umount.d -p
    $ mkdir /run/flannel/
    $ cat <<EOF> /run/flannel/subnet.env
    FLANNEL_NETWORK=10.244.0.0/16
    FLANNEL_SUBNET=10.244.1.0/24
    FLANNEL_MTU=1450
    FLANNEL_IPMASQ=true
    EOF
    kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
    
  9. 让node1加入集群

    在node1节点上分别执行kubeadm join命令,加入集群:

    $ kubeadm join 192.168.59.133:6443 --token 102952.1a7dd4cc8d1f4cc5 --discovery-token-ca-cert-hash sha256:6f1a864c6f530351f9ec7a42f74404497fcbe91ad7bf726bffd8cb3e3c333a38[preflight] Running pre-flight checks.    [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
        [WARNING FileExisting-crictl]: crictl not found in system path
    Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl[discovery] Trying to connect to API Server "10.0.100.202:6443"[discovery] Created cluster-info discovery client, requesting info from "https://10.0.100.202:6443"[discovery] Requesting info from "https://10.0.100.202:6443" again to validate TLS against the pinned public key[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.100.202:6443"[discovery] Successfully established connection with API Server "10.0.100.202:6443"This node has joined the cluster:
    * Certificate signing request was sent to master and a response
      was received.
    * The Kubelet was informed of the new secure connection details.
    Run 'kubectl get nodes' on the master to see this node join the cluster.
    

    提示:细心的童鞋应该会发现,这段命令其实就是前面K8S Matser安装成功后我让你们保存的那段命令。

    默认情况下,Master节点不参与工作负载,但如果希望安装出一个All-In-One的k8s环境,则可以执行以下命令,让Master节点也成为一个Node节点:

    $ kubectl taint nodes --all node-role.kubernetes.io/master-
    
  10. 验证K8S Master是否搭建成功

    # 查看节点状态
    kubectl get nodes
    # 查看pods状态
    kubectl get pods --all-namespaces
    # 查看K8S集群状态
    kubectl get cs
    
  11. 安装 dashboard

    具体参考:https://github.com/kubernetes/dashboard

    $ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
    

img

常见错误

  1. 如果出现NotReady,说明网络问题,需要创建一个网络服务,在这里需要先下载一个文件:

  2. https://github.com/weaveworks/weave/releases 下载

      $ wget https://github.com/weaveworks/weave/releases/download/v2.3.0/weave-daemonset-k8s-1.7.yaml
    

    然后执行

      $ kubectl create -f weave-daemonset-k8s-1.7.yaml
    

    现在可以了!?

    • 或者使用flannel网络!
  3. 安装时候最常见的就是time out,因为K8S镜像在国外,所以我们在前面就说到了提前把他下载下来,可以用一个国外机器采用habor搭建一个私有仓库把镜像都download下来。

    [root@k8smaster ~]# kubeadm init
    [init] Using Kubernetes version: v1.10.0
    [init] Using Authorization modes: 
    [Node RBAC]
    [preflight] Running pre-flight checks.    
    [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
        [WARNING FileExisting-crictl]: crictl not found in system path
    Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl[preflight] Starting the kubelet service[certificates] Generated ca certificate and key.[certificates] Generated apiserver certificate and key.[certificates] apiserver serving cert is signed for DNS names [k8smaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.100.202][certificates] Generated apiserver-kubelet-client certificate and key.[certificates] Generated etcd/ca certificate and key.[certificates] Generated etcd/server certificate and key.[certificates] etcd/server serving cert is signed for DNS names [localhost] and IPs [127.0.0.1][certificates] Generated etcd/peer certificate and key.[certificates] etcd/peer serving cert is signed for DNS names [k8smaster] and IPs [10.0.100.202][certificates] Generated etcd/healthcheck-client certificate and key.[certificates] Generated apiserver-etcd-client certificate and key.[certificates] Generated sa key and public key.[certificates] Generated front-proxy-ca certificate and key.[certificates] Generated front-proxy-client certificate and key.[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".[init] This might take a minute or longer if the control plane images have to be pulled.
    Unfortunately, an error has occurred:
        timed out waiting for the condition
    This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
        - Either there is no internet connection, or imagePullPolicy is set to "Never",
          so the kubelet cannot pull or find the following control plane images:
            - k8s.gcr.io/kube-apiserver-amd64:v1.10.0
            - k8s.gcr.io/kube-controller-manager-amd64:v1.10.0
            - k8s.gcr.io/kube-scheduler-amd64:v1.10.0
            - k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'couldn't initialize a Kubernetes cluster
    

    那出现这个问题大部分原因是因为安装的K8S版本和依赖的K8S相关镜像版本不符导致的,关于这部分排错可以查看/var/log/message 我们在文章开始安装的时候也提到了要多看日志。 还有些童鞋可能会说,那我安装失败了,怎么清理环境重新安装啊?下面教大家一条命令:

    kubeadm reset
    
  4. 我们可能在使用kubeadm 安装完会出现下面的问题:

    The connection to the server localhost:8080 was refused - did you specify the right host or port?
    

    解决方案如下:

    $ sudo cp /etc/kubernetes/admin.conf $HOME/
    $ sudo chown $(id -u):$(id -g) $HOME/admin.conf
    $ export KUBECONFIG=$HOME/admin.conf
    
  5. 访问 apiserver 或者 dashboard 出现 "<h3>Unauthorized</h3>"

    可以查看当前的是否存在 kubectl proxy 进程

    $ ps -aux | grep 'kubectl proxy'
    

    如果有,杀死该进程,执行下面的命令:

    $ kubectl proxy --address 0.0.0.0 --accept-hosts '.*'
    
  6. 可能在初始化集群出现下面的错误

    [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
        [ERROR Swap]: running with swap on is not supported. Please disable swap
    [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
    

    处理方法:

    $ echo "1" >/proc/sys/net/bridge/bridge-nf-call-iptables
    
  7. 可能出现 swap 不支持错误

    [ERROR Swap]: running with swap on is not supported. Please disable swap
    [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
    

    关闭它:

    $ swapoff -a
    
  8. 在初始化集群的时候,可能会出现下面的错误:

    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
    

    很容易看出 kubelet 异常,centos 的解决方案如下:

    On CentOS I can add these options in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf:

    # egrep KUBELET_CGROUP_ARGS= /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
    Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice"
    

    然后重启 kubelet 服务即可。

    $ systemctl daemon-reload
    $ systemctl restart kubelet