开始尝试使用swarm之第三篇-多节点操作

尝试一下docker-swarm之hello world

前文回顾

接上文,本文一共在虚拟机上运行了三个节点。它们都是管理节点。然后对service做了一些简单的增删查操作

1
2
3
4
5
docker node  ls
# ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
#xgjnkzvysal5fv7ugm1t5d7zr * swarmmanger Ready Active Leader 20.10.21
#e3ye8fep0dr3xkxrzdxz8wzjw swarmworker1 Ready Active Reachable 20.10.21
#i5vz5saiixnx82j9hpy4d5wmc swarmworker2 Ready Active Reachable 20.10.21

在这3个节点上,本文就要开始做多节点方面的操作

Task

什么是task

大意就是服务的执行者,一个服务通过多个task来执行服务

swarm的task之于service相当于:

面向对像的object之于class;

docker的container之于images;

一个服务可有多个task,并且通过内部算法,部署在一个或多个node上

处理各类任务的,也是这些task

设置task个数的参数为replicas,即复制品或副本…

那么我们来试一下

创建多task的service并指定node

现在,我们一共3个节点,创建服务时将replicas设为2

分别按以下条件指定--constraint 参数

由于遇到一些问题,此时再次查看一下node数量

1
2
3
4
5
6
7
docker node ls
# ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
# xgjnkzvysal5fv7ugm1t5d7zr * swarmmanger Ready Active Leader 20.10.21
# e3ye8fep0dr3xkxrzdxz8wzjw swarmworker1 Ready Active Reachable 20.10.21
# 61yvp61wb1ojwytecc7d8vtcf swarmworker2 Down Active 20.10.21
# hmekztyy9ux1soi3mg7xde7f5 swarmworker2 Ready Active 20.10.21
# i5vz5saiixnx82j9hpy4d5wmc swarmworker2 Down Active 20.10.21

swarmworker2因为遇到了问题,前后执行了两次leave,导致id发生了变化。从status看,只有3个是ready

不指定条件

1
2
3
4
5
6
7
8
9
10
11
12
service create --replicas 2 --name hello_without_constraint  alpine ping docker.com
# rbzl783j0ngiu5apkc3ivuz8m
# overall progress: 2 out of 2 tasks
# 1/2: running [==================================================>]
# 2/2: running [==================================================>]
# verify: Service converged


docker service ps hello_without_constraint
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
# 9abhtllls5yu hello_without_constraint.1 alpine:latest swarmworker2 Running Running 23 seconds ago
# ts72jny9a066 hello_without_constraint.2 alpine:latest swarmworker1 Running Running 25 seconds ago

经多次设置,task应该是通过某种算法均匀分布在所有node上的。表现为尽可能给所有节点平滩。据经验,分配方式肯定可以配置的,比如设权重之类的

指定hostname

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
docker service create --replicas 2 --constraint node.hostname==swarmworker1 --name hello alpine ping docker.com
#0fz44crhpw73te5t6hva5gsaq
#overall progress: 2 out of 2 tasks
#1/2: running [==================================================>]
#2/2: running [==================================================>]
#verify: Service converged


# 在两台机器上运行以下命令,结果一致
docker service ls
#ID NAME MODE REPLICAS IMAGE PORTS
#0fz44crhpw73 hello replicated 2/2 alpine:latest


# 在两台机器上运行以下命令,结果一致。可以看到,两个task运行在两个不同的节点上
docker service ps hello
#ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
#kh1um6dqg31d hello.1 alpine:latest swarmworker1 Running Running about a minute ago
#5jttfmry1nm9 hello.2 alpine:latest swarmworker1 Running Running about a minute ago

可见,分配到指定hostname的节点上了

多个节点的hostname理论上是可以重复的,这个有空再说吧

指定节点类型

节点类型就两种,worker和manager

那么本次我们先将worker2降级成worker类型,那么,有两个管理节点和一个工作节点

1
2
3
4
5
6
7
8
9
docker node demote swarmworker2
#Manager swarmworker2 demoted in the swarm.


docker node ls
#ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
#xgjnkzvysal5fv7ugm1t5d7zr * swarmmanger Ready Active Leader 20.10.21
#e3ye8fep0dr3xkxrzdxz8wzjw swarmworker1 Ready Active Reachable 20.10.21
#i5vz5saiixnx82j9hpy4d5wmc swarmworker2 Ready Active 20.10.21

键入以下命令将服务的task分布在管理节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
docker service create --replicas 4 --constraint node.role==manager --name hello_by_role alpine ping docker.com
#af7xj2jrqgfjsy35klqrajltv
#overall progress: 4 out of 4 tasks
#1/4: running [==================================================>]
#2/4: running [==================================================>]
#3/4: running [==================================================>]
#4/4: running [==================================================>]
#verify: Service converged

docker service ps hello_by_role
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
# py8yy13tqcvg hello_by_role.1 alpine:latest swarmmanger Running Running 49 seconds ago
# 4xfukh7i7tue hello_by_role.2 alpine:latest swarmworker1 Running Running 50 seconds ago
# fmd8xxkexrxl hello_by_role.3 alpine:latest swarmmanger Running Running 49 seconds ago
# wqlnuvy34a7j hello_by_role.4 alpine:latest swarmworker1 Running Running 50 seconds ago

指定label

就不多做了,内容都差不多。。官网看吧

https://docs.docker.com/engine/swarm/services/#placement-constraints

关于端口

一般情况,我们用http来提供无状态的服务,就会涉及到端口的暴露和分布的问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
## 目前有两个节点,下面我们建一个replicas=1的nginx服务,暴露端口为8080,容器内部端口为80,即8080转发到80
docker service create --name my_web --replicas 1 --publish published=8080,target=80 nginx
#k7w0xq3n9in3ttdfggg6q0wc5
#overall progress: 1 out of 1 tasks
#1/1: running [==================================================>]
#verify: Service converged

## 查看nginx运行情况,发现运行在manager上
docker service ps my_web
#ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
#35pb8xxumb8p my_web.1 nginx:latest swarmmanger Running Running 5 minutes ago

## 此时,我们分别curl 两个节点的ip:端口
curl http://192.0.0.30:8080/
#<!DOCTYPE html>
#<html>
#<head>
#<title>Welcome to nginx!</title>
# ...

## 然后curl 另一个,发现结果是一样的
curl http://192.0.0.31:8080/

即内部有一套流量转发机制

引用官方

https://docs.docker.com/engine/swarm/services/#publish-ports

You don’t need to know which nodes are running the tasks; connecting to port 8080 on any of the 10 nodes connects you to one of the three nginx tasks. You can test this using curl. The following example assumes that localhost is one of the swarm nodes. If this is not the case, or localhost does not resolve to an IP address on your host, substitute the host’s IP address or resolvable host name.

大意是访问任何节点的该端口都会转发运行该服务的task的节点上(官方是10个节点,在3个节点上运行nginx,即–replicas 3)。

机缘巧合发现个问题

另外,我发现个问题

由于第3个节点swarmworker2的没有配置代理,也没有提前去pull image

当我在执行 docker service create ..... --replicas=2 ...... 的时候,恰好给swarmworker2分配了一个task

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
docker service create --replicas 2 --name hello_without_constraint  alpine ping docker.com
# b32kykwlr8gcpinfmpfugzkpy
# overall progress: 1 out of 2 tasks
# 1/2: preparing [=================================> ]
# 2/2: running [==================================================>]

# 到这里就卡住了很久,开始我还以为是虚拟机卡了(毕竟电脑内存不大),一直找不到原因,没反应过来。

# 然后我就ctrl+c 退出了,显示出以下信息(注意第一行的^C,是ctrl+c,也给打印出来了)

# ^COperation continuing in background.
# Use `docker service ps b32kykwlr8gcpinfmpfugzkpy` to check progress.

# 按提示,我执行了命令
docker service ps b32kykwlr8gcpinfmpfugzkpy
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
# xnuy658s5b7u hello_without_constraint.1 alpine:latest swarmworker1 Running Running about a minute ago
# cqft2hmpdbdu hello_without_constraint.2 alpine:latest swarmworker2 Running Preparing about a minute ago

# 然后又多试了几次,发现,只要是在swarmworker2节点上,就启动不了
# 通过检查排除,发现是没有image。后台还在pull吧,这该死的网络,该死的gfw。

这个问题我我也去了解了一下,就是节点只会用自己本地的image,不会用其它节点的。为什么不设计成节点共享呢,哪怕给个命令手动互相copy也行啊。

尤其对一些用于本地测试的私有image而言来说,就必须把image publish到仓库才行

怎么解决呢,要么手动 export/import

要么就自己搭个中转仓库

具体的情况,再搜搜吧

既然已经出了这档问题,顺便测试一下,直接将该节点剔除,看是什么反应

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 剔除之前
docker service ps hello_without_constraint
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
# xnuy658s5b7u hello_without_constraint.1 alpine:latest swarmworker1 Running Running 8 minutes ago
# cqft2hmpdbdu hello_without_constraint.2 alpine:latest swarmworker2 Running Preparing 8 minutes ago

# 在 swarmworker2 上执行leave,注意,只有工作节点才能leave,管理节点的话,要先降级成工作节点
docker swarm leave
Node left the swarm.

# 再在管理节点上查看service
docker service ps hello_without_constraint
# ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
# xnuy658s5b7u hello_without_constraint.1 alpine:latest swarmworker1 Running Running 3 hours ago
# exf4xl00dttl hello_without_constraint.2 alpine:latest swarmmanger Running Running 13 seconds ago
# cqft2hmpdbdu \_ hello_without_constraint.2 alpine:latest swarmworker2 Shutdown Preparing 3 hours ago

由此可见 swarmworker2 关闭了,又从swarmmanger 启动了一个task

小结

在多个节点中:

要运行指定个数的task变得简单

对于使用网络端口对外作为服务的task,不用考虑task具体运行在哪些节点上,每个节点都能访问到task

如果某些task因为某些原因挂掉了,系统会自动帮忙重新运行新的task,并且在此节点池内,总个数不会变化

考虑到特定情况,将指定task运行在符合条件的node上也很容易

0%