无法在不同的vm上启动中间层从属恒定停用状态

q9rjltbz  于 2021-06-26  发布在  Mesos
关注(0)|答案(1)|浏览(358)

我正在尝试在两个虚拟机上设置简单的mesos集群。IP是:
10.10.0.102(1主1从)-fqdn mesos1.mydomain
10.10.0.103(带1个从属)-fqdn mesos2.mydomain
我使用的是mesos0.27.1(rpm从mesosphere下载)和centoslinux版本7.1.1503(core)。
我成功地部署了单节点集群(10.10.0.102):主节点和从节点工作,我可以通过marathon部署和扩展一些简单的应用程序。
当我尝试在10.10.0.103上启动第二个从属时,问题出现了。总是,当我启动那个从机时,它的状态是无效的。
从10.10.0.103上的日志:

I0226 13:49:58.428019 14937 slave.cpp:463] Slave resources: cpus(*):1; mem(*):2768; disk(*):3409; ports(*):[31000-32000]
I0226 13:49:58.428019 14937 slave.cpp:471] Slave attributes: [  ]
I0226 13:49:58.428019 14937 slave.cpp:476] Slave hostname: mesos2
I0226 13:49:58.430469 14946 state.cpp:58] Recovering state from '/tmp/mesos/meta'
I0226 13:49:58.430922 14947 status_update_manager.cpp:200] Recovering status update manager
I0226 13:49:58.430954 14947 containerizer.cpp:390] Recovering containerizer
I0226 13:49:58.432219 14947 provisioner.cpp:245] Provisioner recovery complete
I0226 13:49:58.432273 14947 slave.cpp:4495] Finished recovery
I0226 13:49:58.448940 14948 group.cpp:349] Group process (group(1)@10.10.0.103:5051) connected to ZooKeeper
I0226 13:49:58.449050 14948 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0226 13:49:58.449064 14948 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I0226 13:49:58.451846 14948 detector.cpp:154] Detected a new leader: (id='3')
I0226 13:49:58.451937 14948 group.cpp:700] Trying to get '/mesos/json.info_0000000003' in ZooKeeper
I0226 13:49:58.453397 14948 detector.cpp:479] A new leading master (UPID=master@10.10.0.102:5050) is detected
I0226 13:49:58.453459 14948 slave.cpp:795] New master detected at master@10.10.0.102:5050
I0226 13:49:58.453698 14948 slave.cpp:820] No credentials provided. Attempting to register without authentication
I0226 13:49:58.453724 14948 slave.cpp:831] Detecting new master
I0226 13:49:58.453743 14948 status_update_manager.cpp:174] Pausing sending status updates
I0226 13:50:58.445101 14948 slave.cpp:4304] Current disk usage 22.11%. Max allowed age: 4.752451232032847days
I0226 13:51:58.460233 14948 slave.cpp:4304] Current disk usage 22.11%. Max allowed age: 4.752451232032847days

10.10.0.102主机日志

I0226 22:55:14.240464  2021 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 682
I0226 22:55:14.240542  2021 hierarchical.cpp:473] Added slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 (mesos2) with cpus(*):1; mem(*):2768; disk(*):3409; ports(*):[31000-32000] (allocated: )
I0226 22:55:14.240671  2021 master.cpp:5350] Sending 1 offers to framework c5a5818d-16fa-42bf-8e73-697a2d12fe97-0001 (marathon) at scheduler-91034353-1820-4020-aad1-10e11d567136@10.10.0.102:45698
I0226 22:55:14.240767  2021 replica.cpp:537] Replica received write request for position 682 from (1259)@10.10.0.102:5050
E0226 22:55:14.241082  2027 process.cpp:1966] Failed to shutdown socket with fd 32: Transport endpoint is not connected
I0226 22:55:14.241143  2019 master.cpp:1172] Slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 at slave(1)@10.10.0.103:5051 (mesos2) disconnected
I0226 22:55:14.241153  2019 master.cpp:2633] Disconnecting slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 at slave(1)@10.10.0.103:5051 (mesos2)
I0226 22:55:14.241161  2019 master.cpp:2652] Deactivating slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 at slave(1)@10.10.0.103:5051 (mesos2)
I0226 22:55:14.241230  2019 hierarchical.cpp:560] Slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 deactivated
I0226 22:55:14.245923  2019 master.cpp:3673] Processing DECLINE call for offers: [ a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-O1251 ] for framework c5a5818d-16fa-42bf-8e73-697a2d12fe97-0001 (marathon) at scheduler-91034353-1820-4020-aad1-10e11d567136@10.10.0.102:45698
W0226 22:55:14.245923  2019 master.cpp:3720] Ignoring decline of offer a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-O1251 since it is no longer valid
I0226 22:55:14.249065  2021 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 8.264893ms
I0226 22:55:14.249107  2021 replica.cpp:712] Persisted action at 682
I0226 22:55:14.249220  2021 replica.cpp:691] Replica received learned notice for position 682 from @0.0.0.0:0

我尝试使用两种方法启动slave(在10.10.0.103上):
sudo服务mesos从机启动
mesos从站--主站=10.10.0.102:5050--ip=10.10.0.103
两个结果都一样。
此外,在mesos-slave.warning中,我还看到:

Running on machine: mesos2.mydomain
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0226 13:49:58.415089 14937 systemd.cpp:244] Required functionality `Delegate` was introduced in Version `218`. Your system may not function properly; however since some distributions have patched systemd packages, your system may still be functional. This is why we keep running. See MESOS-3352 for more information

基于类似的主题,我发现这可能与网络配置有关,因此下面是一些有关的信息:
主机文件10.10.0.102

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.0.103 mesos2 mesos2.mydomain
10.10.0.102 mesos1 mesos1.mydomain

主机文件10.10.0.103

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.0.102 mesos1 mesos1.mydomain
10.10.0.103 mesos2 mesos2.mydomain

两个虚拟机都有2个网络接口(无环回)。以下来自10.10.0.103-10.10.0.102上类似:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:49:76:48 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 75232sec preferred_lft 75232sec
    inet6 fe80::a00:27ff:fe49:7648/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:d9:24:2a brd ff:ff:ff:ff:ff:ff
    inet 10.10.0.103/24 brd 10.10.0.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fed9:242a/64 scope link
       valid_lft forever preferred_lft forever

两个虚拟机都有网络连接。
从10.10.0.102到10.10.0.103

[root@mesos1 ~]# ping mesos2.mydomain
PING mesos2 (10.10.0.103) 56(84) bytes of data.
64 bytes from mesos2 (10.10.0.103): icmp_seq=1 ttl=64 time=0.578 ms
64 bytes from mesos2 (10.10.0.103): icmp_seq=2 ttl=64 time=0.616 ms

从10.10.0.103到10.10.0.102

[root@mesos2 ~]# ping mesos1.mydomain
PING mesos1 (10.10.0.102) 56(84) bytes of data.
64 bytes from mesos1 (10.10.0.102): icmp_seq=1 ttl=64 time=0.441 ms
64 bytes from mesos1 (10.10.0.102): icmp_seq=2 ttl=64 time=0.972 ms

所有的帮助将不胜感激。当做
安德烈

uqjltbpv

uqjltbpv1#

最简单的答案总是最好的。原来我在从节点上运行iptables。禁用此选项可以解决我的问题: systemctl disable firewalld systemctl stop firewalld 谢谢大家的帮助!

相关问题