kafka分区领导人选举在不受控制的代理关闭后失败

bfnvny8b  于 2021-06-07  发布在  Kafka
关注(0)|答案(1)|浏览(608)

我们有3个kafka代理和主题,40个分区,复制因子设置为1。在不受控制的kafka代理关闭了一些分区之后,我们看到选举新的领导人是不可能的(见下面的日志)。最终我们无法从这个主题中阅读。请告知,如果在不将复制因子更改为大于1的情况下有可能在此类崩溃中幸存下来。
我们希望目标数据库的状态保持一致(基于kafka主题中的事件创建),因此我们还将参数unclean.leader.election.enable设置为false。
崩溃后的分区信息:

extenr-topic:1:882091242
extenr-topic:19:882091615
extenr-topic:28:882092273
Error: partition 18 does not have a leader. Skip getting offsets
Error: partition 27 does not have a leader. Skip getting offsets
Error: partition 36 does not have a leader. Skip getting offsets

Kafka经纪人的例外:

2017-10-09 05:56:50,302 ERROR state.change.logger: Controller 236 epoch 267 initiated state change for partition [extenr-topic,15] from OfflinePartition to OnlinePartition failed
kafka.common.NoReplicaOnlineException: No broker in ISR for partition [extenr-topic,15] is alive. Live brokers are: [Set(236, 237)], ISR brokers are: [235]
at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:66)
at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:342)
at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:203)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:118)
at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:115)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)

日志中还有以下错误

2017-10-09 04:11:25,509 ERROR state.change.logger: Broker 235 received LeaderAndIsrRequest with correlation id 1 from controller 236 epoch 267 for partition [extenr-topic,36] but cannot become follower since the new leader -1 is unavailable.
s4chpxco

s4chpxco1#

将1作为 replication.factor 由于没有其他可用的复制副本可供接管,因此当其领导崩溃/关闭时将变为脱机。
如果可用性对您很重要,我建议增加复制因子。高可用性的建议配置[1]为 replication.factor 设置为3和 min.insync.replicas 设置为2。
1: http://kafka.apache.org/documentation/#brokerconfigs

相关问题