无法启动mesos/marathon群集

物理机：192.168.10.1（mesos、zookeeper、marathon）
虚拟机：192.168.122.10（mesos、zookeeper）
虚拟机：192.168.122.46（mesos、zookeeper）
所有三台机器的操作系统都是Fedora23服务器
默认情况下，这两个网络已经相互路由，因为虚拟机都驻留在物理机上。
没有防火墙设置。
mesos选举日志：

Master bound to loopback interface! Cannot communicate with remote schedulers or slaves. You might want to set '--ip' flag to a routable IP address.

我可以手动设置，但是我不能动态设置。。。这个 --ip_discovery_command 无法识别标志。
我想做的是将下面的脚本链接到那个标志。

if [[ $(ip addr) == *enp8s0* ]]; 
then 
    ip addr show enp8s0 | awk -F'/| ' '/inet/ { print $6 }'
else 
    ip addr show eth0 | awk -F'/| ' '/inet/ { print $6 }'
fi

当我手动设置时（不是我想做的）。。。
mesos页面 IP:5050 出现。。。但是由于这个原因，mesos主控器在1分钟后失败了。。。

F0427 17:03:27.975260  6914 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins

***Check failure stack trace:***

    @     0x7f8360fa9edd  (unknown)
    @     0x7f8360fabc50  (unknown)
    @     0x7f8360fa9ad3  (unknown)
    @     0x7f8360fac61e  (unknown)
    @     0x7f83619a85dd  (unknown)
    @     0x7f83619e7c30  (unknown)
    @     0x55a885ee3b2e  (unknown)
    @     0x7f8361a11c0e  (unknown)
    @     0x7f8361a5d75e  (unknown)
    @     0x7f8361a7077a  (unknown)
    @     0x7f83618f4aae  (unknown)
    @     0x7f8361a70768  (unknown)
    @     0x7f8361a548d0  (unknown)
    @     0x7f8361fc832c  (unknown)
    @     0x7f8361fd42a5  (unknown)
    @     0x7f8361fd472f  (unknown)
    @     0x7f8360a5e60a  start_thread
    @     0x7f835fefda4d  __clone Aborted (core dumped)

zookeeper的设置如下：


# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/log

# the port at which the clients will connect

clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

# maxClientCnxns=60

# 

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

# 

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

# 

# The number of snapshots to retain in dataDir

# autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

# autopurge.purgeInterval=1

server.1:192.168.10.1:2888:3888
server.2:192.168.122.46:2888:3888
server.3:192.168.122.10:2888:3888

不知道如何验证它是否正常工作。。。
老实说，我已经穷途末路了。。在过去的一周里，我一直在讨论这个问题，原因是糟糕的文档和缺乏正确的体系结构解释（主要是马拉松式的）组织糟糕的日志（meso），systemd无法正确解析bash并将输出用作变量，以及缺乏全面的说明。
我做错什么了吗？我感谢任何帮助，我可以得到，让我知道如果你需要什么我还没有提供，我会马上张贴。
编辑：
我修复了marathon的问题，在vm中添加了两个额外的marathon服务器，这样它们就可以形成一个仲裁。
编辑2：
我现在有一个问题，即mesos服务器不断快速重新选举领导人。。。但取决于结果，我稍后会调查。。。

如果你密切关注安装文档，我认为你应该让它工作。
例如，“主绑定到环回”问题与不正确/不完整的设置有关。请参见：
主机名（可选）
如果无法直接解析计算机的主机名（例如，如果在不同的网络上或使用vpn），请设置 /etc/mesos-master/hostname 可以解析的值，例如，外部可访问的ip地址或dns主机名。这将确保来自mesos控制台的所有链接正常工作。
您还需要在中设置此属性 /etc/marathon/conf/hostname .
此外，我还建议在 /etc/mesos-master/ip 文件。始终确保主机名可解析为非本地ip地址，即通过在 /etc/hosts 每个主机上的文件。
基本上 /etc/hosts 文件应类似于此（用实际的主机名替换主机名）：

127.0.0.1 localhost

192.168.10.1 host1
192.168.122.10 host2
192.168.122.46 host3

如果您只想测试一个mesos集群，还可以使用一个预配置的流浪解决方案，比如tobilg/coreos mesos集群。
关于zookeeper设置，请确保您创建了 /var/lib/zookeeper/myid 在包含为每个节点设置的实际数字id的每个节点上，例如 192.168.10.1 文件的唯一内容需要 1 .
在调试主机之前，请检查zookeeper集群是否正常工作，以及是否选出了一个领导者。确保 /etc/mesos/zk 在每个主机上包含正确的zookeeper连接字符串，例如。

zk://192.168.10.1:2181,192.168.122.10:2181,192.168.122.46:2181/mesos

如果zk工作，那么重启服务并检查主日志。对奴隶也一样。
参考文献：
https://open.mesosphere.com/reference/mesos-master/
https://open.mesosphere.com/reference/mesos-slave/
https://mesosphere.github.io/marathon/docs/

无法启动mesos/marathon群集

1条答案

相关问题

热门标签

最新问答