所以我在azure kubernetes上有一个融合的kafka集群设置。
有3个代理,1个主题3分区,复制因子为2, min.in-sync-replica=1
. 生产者配置 acks="all"
```
Topic:test_topic PartitionCount:3 ReplicationFactor:2 Configs:retention.ms=345600000
Topic: test_topic Partition: 0 Leader: 0 Replicas: 0,2 Isr: 0,2
Topic: test_topic Partition: 1 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: test_topic Partition: 2 Leader: 2 Replicas: 2,1 Isr: 2,1
在很长一段时间内,一切都很顺利,kafka表现得非常好,但6-7小时后,分区1的kafka同步副本中有一个不同步。它确实在10分钟内恢复了。但由于acks=all,分区1无法接受流量,当时的所有数据都丢失了。
有人能帮我理解为什么其中一个复制品不同步吗。我有同样的日志。
kafka.common.StateChangeFailedException: Failed to elect leader for partition test_topic-2 under strategy OfflinePartitionLeaderElectionStrategy
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328)
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:326)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:326)
at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:254)
at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:175)
at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:106)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$onReplicasBecomeOffline(KafkaController.scala:442)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$onBrokerFailure(KafkaController.scala:410)
at kafka.controller.KafkaController$BrokerChange$.process(KafkaController.scala:1252)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
这个问题已经发生多次了。在领导人选举策略方面,日志是不同的。
[2020-05-14 06:41:09,583] TRACE [Controller id=1] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)
[2020-05-14 06:41:09,585] DEBUG [Controller id=1] Preferred replicas by broker Map(1 -> Map(__consumer_offsets-22 -> Vector(1, 0), __consumer_offsets-30 -> Vector(1, 0), __consumer_offsets-8 -> Vector(1, 0), __consumer_offsets-4 -> Vector(1, 0), __consumer_offsets-46 -> Vector(1, 0), __consumer_offsets-16 -> Vector(1, 0), __consumer_offsets-28 -> Vector(1, 0), __consumer_offsets-36 -> Vector(1, 0), __consumer_offsets-42 -> Vector(1, 0), __consumer_offsets-18 -> Vector(1, 0), jeepusermap-1 -> Vector(1, 0), __consumer_offsets-24 -> Vector(1, 0), __consumer_offsets-38 -> Vector(1, 0), __consumer_offsets-48 -> Vector(1, 0), __consumer_offsets-2 -> Vector(1, 0), __consumer_offsets-6 -> Vector(1, 0), __consumer_offsets-14 -> Vector(1, 0), __consumer_offsets-20 -> Vector(1, 0), __consumer_offsets-0 -> Vector(1, 0), __consumer_offsets-44 -> Vector(1, 0), jeepusermap-3 -> Vector(1, 0), __consumer_offsets-12 -> Vector(1, 0), __consumer_offsets-26 -> Vector(1, 0), __consumer_offsets-34 -> Vector(1, 0), __consumer_offsets-10 -> Vector(1, 0), __consumer_offsets-32 -> Vector(1, 0), __consumer_offsets-40 -> Vector(1, 0)), 0 -> Map(__consumer_offsets-21 -> Vector(0, 1), __consumer_offsets-27 -> Vector(0, 1), __consumer_offsets-7 -> Vector(0, 1), __consumer_offsets-9 -> Vector(0, 1), __consumer_offsets-25 -> Vector(0, 1), __consumer_offsets-35 -> Vector(0, 1), __consumer_offsets-41 -> Vector(0, 1), __consumer_offsets-33 -> Vector(0, 1), __consumer_offsets-23 -> Vector(0, 1), __consumer_offsets-49 -> Vector(0, 1), __consumer_offsets-47 -> Vector(0, 1), jeepusermap-0 -> Vector(0, 1), __consumer_offsets-31 -> Vector(0, 1), __consumer_offsets-3 -> Vector(0, 1), __consumer_offsets-37 -> Vector(0, 1), jeepusermap-4 -> Vector(0, 1), __consumer_offsets-15 -> Vector(0, 1), jeepusermap-2 -> Vector(0, 1), __consumer_offsets-17 -> Vector(0, 1), __consumer_offsets-19 -> Vector(0, 1), __consumer_offsets-11 -> Vector(0, 1), __consumer_offsets-13 -> Vector(0, 1), __consumer_offsets-43 -> Vector(0, 1), __consumer_offsets-39 -> Vector(0, 1), __consumer_offsets-45 -> Vector(0, 1), __consumer_offsets-1 -> Vector(0, 1), __consumer_offsets-5 -> Vector(0, 1), __consumer_offsets-29 -> Vector(0, 1))) (kafka.controller.KafkaController)
[2020-05-14 06:41:09,586] DEBUG [Controller id=1] Topics not in preferred replica for broker 1 Map() (kafka.controller.KafkaController)
[2020-05-14 06:41:09,587] TRACE [Controller id=1] Leader imbalance ratio for broker 1 is 0.0 (kafka.controller.KafkaController)
[2020-05-14 06:41:09,587] DEBUG [Controller id=1] Topics not in preferred replica for broker 0 Map(__consumer_offsets-21 -> Vector(0, 1), __consumer_offsets-27 -> Vector(0, 1), __consumer_offsets-7 -> Vector(0, 1), __consumer_offsets-9 -> Vector(0, 1), __consumer_offsets-25 -> Vector(0, 1), __consumer_offsets-35 -> Vector(0, 1), __consumer_offsets-41 -> Vector(0, 1), __consumer_offsets-33 -> Vector(0, 1), __consumer_offsets-23 -> Vector(0, 1), __consumer_offsets-49 -> Vector(0, 1), __consumer_offsets-47 -> Vector(0, 1), jeepusermap-0 -> Vector(0, 1), __consumer_offsets-31 -> Vector(0, 1), __consumer_offsets-3 -> Vector(0, 1), __consumer_offsets-37 -> Vector(0, 1), jeepusermap-4 -> Vector(0, 1), __consumer_offsets-15 -> Vector(0, 1), jeepusermap-2 -> Vector(0, 1), __consumer_offsets-17 -> Vector(0, 1), __consumer_offsets-19 -> Vector(0, 1), __consumer_offsets-11 -> Vector(0, 1), __consumer_offsets-13 -> Vector(0, 1), __consumer_offsets-43 -> Vector(0, 1), __consumer_offsets-39 -> Vector(0, 1), __consumer_offsets-45 -> Vector(0, 1), __consumer_offsets-1 -> Vector(0, 1), __consumer_offsets-5 -> Vector(0, 1), __consumer_offsets-29 -> Vector(0, 1)) (kafka.controller.KafkaController)
[2020-05-14 06:41:09,587] TRACE [Controller id=1] Leader imbalance ratio for broker 0 is 1.0 (kafka.controller.KafkaController)
[2020-05-14 06:41:09,588] INFO [Controller id=1] Starting preferred replica leader election for partitions __consumer_offsets-21,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,jeepusermap-0,__consumer_offsets-31,__consumer_offsets-3,__consumer_offsets-37,jeepusermap-4,__consumer_offsets-15,jeepusermap-2,__consumer_offsets-17,__consumer_offsets-19,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-43,__consumer_offsets-39,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-29 (kafka.controller.KafkaController)
[2020-05-14 06:41:10,023] TRACE [Broker id=1] Stopped fetchers as part of become-leader request from controller 1 epoch 3 with correlation id 3 for partition __consumer_offsets-25 (last update controller epoch 3) (state.change.logger)
[2020-05-14 06:41:10,024] INFO [Partition __consumer_offsets-41 broker=1] __consumer_offsets-41 starts at Leader Epoch 2 from offset 0. Previous Leader Epoch was: 1 (kafka.cluster.Partition)
ERROR [Controller id=1 epoch=3] Controller 1 epoch 3 failed to change state for partition __consumer_offsets-21 from OnlinePartition to OnlinePartition (state.change.logger)
kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-27 under strategy PreferredReplicaPartitionLeaderElectionStrategy
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:328)
at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:326)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:326)
at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:254)
at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:175)
at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:116)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$onPreferredReplicaElection(KafkaController.scala:614)
at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1000)
at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:981)
at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance(KafkaController.scala:981)
at kafka.controller.KafkaController$AutoPreferredReplicaLeaderElection$.process(KafkaController.scala:1012)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
暂无答案!
目前还没有任何答案,快来回答吧!