kafka在isr中没有用于分区的代理

col17t5w  于 2021-06-07  发布在  Kafka
关注(0)|答案(0)|浏览(240)

我们有一个由6个节点组成的kafka集群。6个节点中有5个节点有zookeeper。
spark流媒体作业是从流媒体服务器读取数据,进行一些处理,然后将结果发送给kafka。
spark作业有时会被卡住,没有数据发送到kafka,作业会重新启动。
在手动重新启动kafka集群之前,作业一直被卡住并重新启动。重启Kafka后,一切都进展顺利。
检查kafka日志,我们发现这个异常被抛出了好几次

2017-03-10 05:12:14,177 ERROR state.change.logger: Controller 133 epoch 616 initiated state change for partition [live_stream_2,52] from OfflinePartition to OnlinePartition failed
kafka.common.NoReplicaOnlineException: No broker in ISR for partition [gnip_live_stream_2,52] is alive. Live brokers are: [Set(133, 137, 134, 135, 143)], ISR brokers are: [142] 
    at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:66)
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
    at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
    at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
    at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
    at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333)
    at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:164)
    at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
    at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:259)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
    at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:823)
    at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

上面的异常是针对未使用的主题引发的( live_stream_2 )但它也是为一个使用过的主题抛出的,有点不同。
以下是所用主题的例外情况

2017-03-10 12:05:18,535 ERROR state.change.logger: Controller 133 epoch 620 initiated state change for partition [gnip_live_stream,3] from OfflinePartition to OnlinePartition failed
kafka.common.NoReplicaOnlineException: No broker in ISR for partition [live_stream,3] is alive. Live brokers are: [Set(133, 134, 135, 137)], ISR brokers are: [136] 
    at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:66)
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
    at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
    at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
    at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
    at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333)
    at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:164)
    at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
    at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:259)
    at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
    at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:823)
    at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

在第一个异常中,它表示分区52的isr代理列表只包含id为142的代理,这很奇怪,因为集群没有具有此id的代理。
在第二个例外中,它表示分区3的isr代理列表只包含id为136的代理,而该代理不存在于代理实时列表中。
我怀疑zookeeper中有过时的数据导致了第一个异常,并且由于某种原因,broker 136在特定时间关闭,从而导致了第二个异常。
我的问题
1-这些例外可能是Kafka(以及由此产生的Spark工作)陷入困境的原因吗?
2-如何解决?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题