更换mesos主控，导致马拉松停摆？

w3nuxt5m 于 2021-06-26 发布在 Mesos

关注(0)|答案(2)|浏览(416)

环境：

Zookeeper on computer A,
  Mesos master on computer B as Leader,
  Mesos master on computer C,
  Marathon on computer B singleton.

行动：

Kill Mesos master task on computer B, attempt to change mesos cluster leader

结果：

Mesos cluster leader change to mesos master on computer C, 
  But Marathon task on computer auto shutdown with following logs.

问题：
有人能帮我为什么马拉松跑下来？以及如何修复它！
日志：

I1109 12:19:10.010197 11287 detector.cpp:152] Detected a new leader: (id='9')
I1109 12:19:10.010646 11291 group.cpp:699] Trying to get '/mesos/json.info_0000000009' in ZooKeeper
I1109 12:19:10.013425 11292 zookeeper.cpp:262] A new leading master (UPID=master@10.4.23.55:5050) is detected
[2017-11-09 12:19:10,015] WARN  Disconnected (mesosphere.marathon.MarathonScheduler:Thread-23)
I1109 12:19:10.018977 11292 sched.cpp:2021] Asked to stop the driver
I1109 12:19:10.019161 11292 sched.cpp:336] New master detected at master@10.4.23.55:5050
I1109 12:19:10.019892 11292 sched.cpp:1203] Stopping framework d52cbd8c-1015-4d94-8328-e418876ca5b2-0000
[2017-11-09 12:19:10,020] INFO  Driver future completed with result=Success(()). (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,022] INFO  Abdicating leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,022] INFO  Stopping the election service (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,029] INFO  backgroundOperationsLoop exiting (org.apache.curator.framework.imps.CuratorFrameworkImpl:Curator-Framework-0)
[2017-11-09 12:19:10,061] INFO  Session: 0x15f710ffb010058 closed (org.apache.zookeeper.ZooKeeper:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,061] INFO  EventThread shut down for session: 0x15f710ffb010058 (org.apache.zookeeper.ClientCnxn:pool-3-thread-1-EventThread)
[2017-11-09 12:19:10,063] INFO  Stopping MarathonSchedulerService [RUNNING]'s leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,063] INFO  Lost leadership (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,066] INFO  All actors suspended:

* Actor[akka://marathon/user/offerMatcherStatistics#-1904211014]
* Actor[akka://marathon/user/reviveOffersWhenWanted#-238627718]
* Actor[akka://marathon/user/expungeOverdueLostTasks#608979053]
* Actor[akka://marathon/user/launchQueue#803590575]
* Actor[akka://marathon/user/offersWantedForReconciliation#598482724]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#813230776]
* Actor[akka://marathon/user/offerMatcherManager#1205401692]
* Actor[akka://marathon/user/instanceTracker#1055980147]
* Actor[akka://marathon/user/killOverdueStagedTasks#-40058350]
* Actor[akka://marathon/user/taskKillServiceActor#-602552505]
* Actor[akka://marathon/user/rateLimiter#-911383474]
* Actor[akka://marathon/user/deploymentManager#2013376325] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-10)

I1109 12:19:10.069551 11272 sched.cpp:2021] Asked to stop the driver
[2017-11-09 12:19:10,068] INFO  Stopping driver (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,069] INFO  Stopped MarathonSchedulerService [RUNNING]'s leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,070] INFO  Terminating due to leadership abdication or failure (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,071] INFO  Call postDriverRuns callbacks on  (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,074] INFO  Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-12)
[2017-11-09 12:19:10,074] INFO  Suspending scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-2)
[2017-11-09 12:19:10,083] INFO  Finished postDriverRuns callbacks (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,084] INFO  ExpungeOverdueLostTasksActor has stopped (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-9)
[1]+  Exit 137

mesos marathon leader

来源：https://stackoverflow.com/questions/47194014/change-mesos-master-leader-cause-marathon-shutdown

2条答案

按热度按时间

mrwjdhj31#

你把masters参考设置为marathon conf了吗？你能做到吗

cat /etc/marathon/conf/master

赞(0）回复(0）举报 2021-06-26

8wtpewkr2#

我认为zookeeper集群中存在错误配置。使用3个zookeeper集群和2个mesos master n多个slave。裁判：https://www.google.co.in/amp/s/beingasysadmin.wordpress.com/2014/08/16/managing-ha-docker-cluster-using-multiple-mesos-masters/amp/

赞(0）回复(0）举报 2021-06-26

我来回答

更换mesos主控，导致马拉松停摆？

2条答案

相关问题

热门标签

最新问答