我正在尝试在两个虚拟机上设置简单的mesos集群。IP是:
10.10.0.102(1主1从)-fqdn mesos1.mydomain
10.10.0.103(带1个从属)-fqdn mesos2.mydomain
我使用的是mesos0.27.1(rpm从mesosphere下载)和centoslinux版本7.1.1503(core)。
我成功地部署了单节点集群(10.10.0.102):主节点和从节点工作,我可以通过marathon部署和扩展一些简单的应用程序。
当我尝试在10.10.0.103上启动第二个从属时,问题出现了。总是,当我启动那个从机时,它的状态是无效的。
从10.10.0.103上的日志:
I0226 13:49:58.428019 14937 slave.cpp:463] Slave resources: cpus(*):1; mem(*):2768; disk(*):3409; ports(*):[31000-32000]
I0226 13:49:58.428019 14937 slave.cpp:471] Slave attributes: [ ]
I0226 13:49:58.428019 14937 slave.cpp:476] Slave hostname: mesos2
I0226 13:49:58.430469 14946 state.cpp:58] Recovering state from '/tmp/mesos/meta'
I0226 13:49:58.430922 14947 status_update_manager.cpp:200] Recovering status update manager
I0226 13:49:58.430954 14947 containerizer.cpp:390] Recovering containerizer
I0226 13:49:58.432219 14947 provisioner.cpp:245] Provisioner recovery complete
I0226 13:49:58.432273 14947 slave.cpp:4495] Finished recovery
I0226 13:49:58.448940 14948 group.cpp:349] Group process (group(1)@10.10.0.103:5051) connected to ZooKeeper
I0226 13:49:58.449050 14948 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0226 13:49:58.449064 14948 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I0226 13:49:58.451846 14948 detector.cpp:154] Detected a new leader: (id='3')
I0226 13:49:58.451937 14948 group.cpp:700] Trying to get '/mesos/json.info_0000000003' in ZooKeeper
I0226 13:49:58.453397 14948 detector.cpp:479] A new leading master (UPID=master@10.10.0.102:5050) is detected
I0226 13:49:58.453459 14948 slave.cpp:795] New master detected at master@10.10.0.102:5050
I0226 13:49:58.453698 14948 slave.cpp:820] No credentials provided. Attempting to register without authentication
I0226 13:49:58.453724 14948 slave.cpp:831] Detecting new master
I0226 13:49:58.453743 14948 status_update_manager.cpp:174] Pausing sending status updates
I0226 13:50:58.445101 14948 slave.cpp:4304] Current disk usage 22.11%. Max allowed age: 4.752451232032847days
I0226 13:51:58.460233 14948 slave.cpp:4304] Current disk usage 22.11%. Max allowed age: 4.752451232032847days
10.10.0.102主机日志
I0226 22:55:14.240464 2021 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 682
I0226 22:55:14.240542 2021 hierarchical.cpp:473] Added slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 (mesos2) with cpus(*):1; mem(*):2768; disk(*):3409; ports(*):[31000-32000] (allocated: )
I0226 22:55:14.240671 2021 master.cpp:5350] Sending 1 offers to framework c5a5818d-16fa-42bf-8e73-697a2d12fe97-0001 (marathon) at scheduler-91034353-1820-4020-aad1-10e11d567136@10.10.0.102:45698
I0226 22:55:14.240767 2021 replica.cpp:537] Replica received write request for position 682 from (1259)@10.10.0.102:5050
E0226 22:55:14.241082 2027 process.cpp:1966] Failed to shutdown socket with fd 32: Transport endpoint is not connected
I0226 22:55:14.241143 2019 master.cpp:1172] Slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 at slave(1)@10.10.0.103:5051 (mesos2) disconnected
I0226 22:55:14.241153 2019 master.cpp:2633] Disconnecting slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 at slave(1)@10.10.0.103:5051 (mesos2)
I0226 22:55:14.241161 2019 master.cpp:2652] Deactivating slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 at slave(1)@10.10.0.103:5051 (mesos2)
I0226 22:55:14.241230 2019 hierarchical.cpp:560] Slave a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-S167 deactivated
I0226 22:55:14.245923 2019 master.cpp:3673] Processing DECLINE call for offers: [ a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-O1251 ] for framework c5a5818d-16fa-42bf-8e73-697a2d12fe97-0001 (marathon) at scheduler-91034353-1820-4020-aad1-10e11d567136@10.10.0.102:45698
W0226 22:55:14.245923 2019 master.cpp:3720] Ignoring decline of offer a61e9d9f-f85b-4c72-9780-166a7ffc0ac3-O1251 since it is no longer valid
I0226 22:55:14.249065 2021 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 8.264893ms
I0226 22:55:14.249107 2021 replica.cpp:712] Persisted action at 682
I0226 22:55:14.249220 2021 replica.cpp:691] Replica received learned notice for position 682 from @0.0.0.0:0
我尝试使用两种方法启动slave(在10.10.0.103上):
sudo服务mesos从机启动
mesos从站--主站=10.10.0.102:5050--ip=10.10.0.103
两个结果都一样。
此外,在mesos-slave.warning中,我还看到:
Running on machine: mesos2.mydomain
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0226 13:49:58.415089 14937 systemd.cpp:244] Required functionality `Delegate` was introduced in Version `218`. Your system may not function properly; however since some distributions have patched systemd packages, your system may still be functional. This is why we keep running. See MESOS-3352 for more information
基于类似的主题,我发现这可能与网络配置有关,因此下面是一些有关的信息:
主机文件10.10.0.102
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.0.103 mesos2 mesos2.mydomain
10.10.0.102 mesos1 mesos1.mydomain
主机文件10.10.0.103
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.0.102 mesos1 mesos1.mydomain
10.10.0.103 mesos2 mesos2.mydomain
两个虚拟机都有2个网络接口(无环回)。以下来自10.10.0.103-10.10.0.102上类似:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:49:76:48 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
valid_lft 75232sec preferred_lft 75232sec
inet6 fe80::a00:27ff:fe49:7648/64 scope link
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:d9:24:2a brd ff:ff:ff:ff:ff:ff
inet 10.10.0.103/24 brd 10.10.0.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fed9:242a/64 scope link
valid_lft forever preferred_lft forever
两个虚拟机都有网络连接。
从10.10.0.102到10.10.0.103
[root@mesos1 ~]# ping mesos2.mydomain
PING mesos2 (10.10.0.103) 56(84) bytes of data.
64 bytes from mesos2 (10.10.0.103): icmp_seq=1 ttl=64 time=0.578 ms
64 bytes from mesos2 (10.10.0.103): icmp_seq=2 ttl=64 time=0.616 ms
从10.10.0.103到10.10.0.102
[root@mesos2 ~]# ping mesos1.mydomain
PING mesos1 (10.10.0.102) 56(84) bytes of data.
64 bytes from mesos1 (10.10.0.102): icmp_seq=1 ttl=64 time=0.441 ms
64 bytes from mesos1 (10.10.0.102): icmp_seq=2 ttl=64 time=0.972 ms
所有的帮助将不胜感激。当做
安德烈
1条答案
按热度按时间uqjltbpv1#
最简单的答案总是最好的。原来我在从节点上运行iptables。禁用此选项可以解决我的问题:
systemctl disable firewalld systemctl stop firewalld
谢谢大家的帮助!