K8s实战

1. 安装

1.1 安装docker

apt install docker-engine

1.2 安装Kubernetes

# apt-key文件可以手动下载
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://mirrors.tuna.tsinghua.edu.cn/kubernetes/apt kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt install -y kubelet kubeadm kubectl

# 拉取镜像
kube_containers=(
  k8s.gcr.io/kube-apiserver:v1.20.2
  k8s.gcr.io/kube-controller-manager:v1.20.2
  k8s.gcr.io/kube-scheduler:v1.20.2
  k8s.gcr.io/kube-proxy:v1.20.2
  k8s.gcr.io/pause:3.2
  k8s.gcr.io/etcd:3.4.13-0
  k8s.gcr.io/coredns:1.7.0
)

for rec_container in ${kube_containers[*]}; do
  echo "$rec_container"
  ali_img="${rec_container//k8s.gcr.io/registry.cn-hangzhou.aliyuncs.com\/google_containers}"
  docker pull "$ali_img"
  docker tag "$ali_img" "$rec_container"
  docker rmi "$ali_img"
  echo -e "\033[41;37m$rec_container pull complete.\033[0m"
done

问题1:

kubectl apply -f .
The connection to the server localhost:8080 was refused - did you specify the right host or port?

解答:没有搭建K8s集群

1.3 搭建K8s集群

指定从阿里云镜像仓拉取镜像

kubeadm init --pod-network-cidr=10.244.0.0/16 --kubernetes-version=v1.21.0 --apiserver-advertise-address=192.168.0.3 --v=5 --image-repository=registry.aliyuncs.com/google_containers

问题2:

[preflight] Some fatal errors occurred:
	[ERROR IsDockerSystemdCheck]: cannot execute 'docker info -f {{.CgroupDriver}}': exit status 125
	[ERROR SystemVerification]: failed executing "docker info --format '{{json .}}'"\noutput: flag provided but not defined: --format
See 'docker info --help'.
\nerror: exit status 125

解答:docker版本太低,apt安装的版本是1.11,升级版本,实际最低要求是1.13。升级步骤,https://blog.csdn.net/BigData_Mining/article/details/103781306 然后再执行上面的步骤

问题3:

29352 checks.go:850] pulling registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.0
[preflight] Some fatal errors occurred:
	[ERROR ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.0: output: Error response from daemon: pull access denied for registry.aliyuncs.com/google_containers/coredns/coredns, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight

解答: 手动拉取coredns/coredns:1.8.0镜像,然后重新打tag

docker tag 296a registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.0

启动

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

在第二胎机器上加入集群

1
kubeadm join 192.168.0.3:6443 --token wsh1oq.hcuy3eh1qqf8j32s --discovery-token-ca-cert-hash sha256:90881c6bc988d79c6ab854a566eaaa942e36cf29f00145a8c28f4b3a65534df3

问题4: coredns一直处于pending状态?

解答: 官方说法是

coredns 停滞在 Pending 状态 这一行为是 预期之中 的,因为系统就是这么设计的。 kubeadm 的网络供应商是中立的,因此管理员应该选择 安装 pod 的网络插件。 你必须完成 Pod 的网络配置,然后才能完全部署 CoreDNS。 在网络被配置好之前,DNS 组件会一直处于 Pending 状态。

安装flannel插件

kubectl create -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

问题5:cni0网卡网段和eth0网卡网段冲突,导致机器ip不通

首先到 cd /etc/cni/net.d/ 创建两个文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// 10-mynet.conf
{
	"name": "mynet",
	"type": "bridge",
	"bridge": "cni0",
	"isGateway": true,
	"ipMasq": true,
	"ipam": {
		"type": "host-local",
		"subnet": "10.22.0.0/16",
		"routes": [
			{ "dst": "0.0.0.0/0" }
		]
	}
}

// 99-loopback.conf
{
	"type": "lookback"
}

然后删除已有的cni0

1
2
ifconfig cni0 down
ip link delete cni0

再重新启动集群

1
kubeadm init --pod-network-cidr=10.244.0.0/16 --kubernetes-version=v1.21.0 --apiserver-advertise-address=192.168.0.3 --v=5 --image-repository=registry.aliyuncs.com/google_containers

安装GrimoireLab

进入 项目根目录 kubernetes后,执行命令

1
kubectl apply -f .

之后就启动了容器。

使用端口转发开启kibiter

1
kubectl port-forward service/kibiter 5601 -n grimoire --address=0.0.0.0

问题6: esnode mariadb两个永久节点无法启动,describe发现warning

  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  39s (x42 over 40m)  default-scheduler  0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims.

解答: 因为这两个容器比较特殊,是需要持久存储的,因此,需要分配固定的磁盘给他们。PersistantVolumns和PersistantVolumnClaims分别代表持久存储分区和持久存储分区请求。 因此,需要创建两个PV,之后这两个pending的pod会自动启动。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# pv-demo.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-es
  labels:
    name: pv001
spec:
  nfs:
    path: /data/volumes/v1
    server: nfs
  accessModes: ["ReadWriteOnce"]
  capacity:
    storage: 20Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv002
  labels:
    name: pv002
spec:
  nfs:
    path: /data/volumes/v2
    server: nfs
  accessModes: ["ReadWriteOnce"]
  capacity:
    storage: 10Gi
1
kubectl apply -f pv-demo.yml

进入容器:

kubectl -n grimoire exec -it kibiter-598f5c5bdc-nxtvn -- bash