我有一个Kafka集群,有3个经纪人。我最近开始遇到一些问题,经纪人离开了集群,生产商/消费者抛出了leader not available错误。
在检查日志时,我看到以下事件序列:
//大量副本获取程序线程正在启动/停止
[2017-10-09 14:48:50,600] INFO [ReplicaFetcherManager on broker 6] Removed fetcher for partitions
[2017-10-09 14:48:50,608] INFO [ReplicaFetcherThread-0-7], Shutting down (kafka.server.ReplicaFetcherThread)
[2017-10-09 14:48:50,918] INFO [ReplicaFetcherThread-0-7], Stopped (kafka.server.ReplicaFetcherThread)
[2017-10-09 14:48:50,918] INFO [ReplicaFetcherThread-0-7], Shutdown completed (kafka.server.ReplicaFetcherThread)
//持续扩大/缩小isr
[2017-10-09 14:48:51,037] INFO Partition [__consumer_offsets,8] on broker 6: Expanding ISR for partition __consumer_offsets-8 from 6,8 to 6,8,7 (kafka.cluster.Partition)
[2017-10-09 14:48:51,038] INFO Partition [__consumer_offsets,35] on broker 6: Expanding ISR for partition __consumer_offsets-35 from 6,8 to 6,8,7 (kafka.cluster.Partition)
[2017-10-09 14:49:01,702] INFO Partition [t1,1] on broker 6: Shrinking ISR for partition [t1,1] from 6,7 to 6 (kafka.cluster.Partition)
[2017-10-09 14:49:01,702] INFO Partition [__consumer_offsets,41] on broker 6: Shrinking ISR for partition [__consumer_offsets,41] from 6,8,7 to 6,8 (kafka.cluster.Partition)
//经纪人和领导人改选的重新登记
[2017-10-09 14:51:54,380] INFO re-registering broker info in ZK for broker 6
[2017-10-09 14:51:54,405] INFO New leader is 7 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
//控制器移动异常错误
[2017-10-09 14:56:39,746] ERROR [KafkaApi-6] Error when handling request.. org.apache.kafka.common.errors.ControllerMovedException: Broker 6 received update metadata request with correlation id 59 from an old controlle
r 7 with epoch 301. Latest known controller epoch is 302
[2017-10-09 14:57:59,210] INFO re-registering broker info in ZK for broker 6 (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-10-09 14:57:59,210] INFO Creating /brokers/ids/6 (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2017-10-09 14:57:59,213] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2017-10-09 14:57:59,213] INFO Registered broker 6 at path /brokers/ids/6 with addresses: EndPoint(kafka03,9092,ListenerName(PLAIN
TEXT),PLAINTEXT) (kafka.utils.ZkUtils)
[2017-10-09 14:57:59,213] INFO done re-registering broker (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-10-09 14:57:59,213] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaHealthcheck$SessionExpireListener
)
[2017-10-09 14:57:59,224] INFO New leader is 7 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-10-09 14:58:11,697] INFO Partition [testing1,2] on broker 6: Shrinking ISR for partition [testing1,2] from 6,8 to 6 (kafka.cluster.Partit
ion)
[2017-10-09 14:58:11,700] INFO Partition [testing1,2] on broker 6: Cached zkVersion [199] not equal to that in zookeeper, skip updating ISR (ka
fka.cluster.Partition)
然后这些错误在循环中发生,集群无法恢复
[2017-10-09 16:17:26,769] INFO Partition [__consumer_offsets,14] on broker 6: Shrinking ISR for partition [__consumer_offsets,14] from 7,6,8 to 7,6 (kafka.cluster.Partition)
[2017-10-09 16:17:26,771] INFO Partition [__consumer_offsets,14] on broker 6: Cached zkVersion [306] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
在客户机上,我收到了“引线不可用”错误。
不清楚集群为什么会进入这种无效状态。。有什么想法吗?
2条答案
按热度按时间pw9qyyiw1#
杀死当前控制器,则会选择另一个控制器。
重新启动在步骤1中杀死的上一个控制器
partiton状态将被修复
z9smfwbn2#
这个问题在Kafka-2729中被知道和跟踪,但直到现在还没有解决。据我所知,这种情况发生在网络上,由于最大流量或一些短时间内的网络中断而导致大延迟。唯一的解决方案(afaik)是重新启动所有代理。