消费者的意外失败/重新平衡

lbsnaicq  于 2021-06-07  发布在  Kafka
关注(0)|答案(2)|浏览(1583)

使用ApacheKafka 2.1.0和SpringKafka 2.1.7,我们在SpringKafka客户机上收到如下错误消息:

2019-01-13 23:01:34.019 consumer-1-C-1 LogContext$KafkaLogger.error SEVERE: [Consumer clientId=consumer-2, groupId=kafka-consumer-group-x] Offset commit failed on partition topic-x-16 at offset 57882: The coordinator is not aware of this member.

在此错误前几秒钟,我们可以在其中一个kafka borkers上看到以下日志消息:

[2019-01-13 23:01:17,329] INFO [GroupCoordinator 2]: Member consumer-30-13dc06ff-aed2-4e4e-a66d-2d60d79ac526 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,330] INFO [GroupCoordinator 2]: Preparing to rebalance group kafka-consumer-group-x in state PreparingRebalance with old generation 1370 (__consumer_offsets-40) (reason: removing member consumer-30-13dc06ff-aed2-4e4e-a66d-2d60d79ac526 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,330] INFO [GroupCoordinator 2]: Member consumer-20-ba370e86-e1cc-4261-a73c-78cea1b00479 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-32-be8807df-b88f-4cc9-bddf-bed772d1244f in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-17-3e34f026-894e-40dc-916b-d169a43da135 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-31-4dd9cb6e-09e9-47db-9610-37e0ab5633e0 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-18-90175650-1224-4f22-9350-246e17e75367 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,332] INFO [GroupCoordinator 2]: Member consumer-19-663239af-9702-4e59-ad3d-f8202e9d579d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-22-c54fb4c0-1fa1-4d9f-91fc-1da6df41b227 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-25-3bfd915c-8bd1-454b-85e3-60212b4c568e in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-27-cbb97ebf-b5cd-4cfa-991a-5302462ddab9 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,615] INFO [GroupCoordinator 2]: Member consumer-24-37fbcc73-e8c6-4820-ad56-580fd88f5a10 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,618] INFO [GroupCoordinator 2]: Member consumer-21-eea1b841-202e-4ebe-bdde-007775d001dd in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,636] INFO [GroupCoordinator 2]: Member consumer-28-881da47e-87c9-4675-9f88-e3b33748cff1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,708] INFO [GroupCoordinator 2]: Member consumer-26-375880ee-b2a9-4ece-8eee-987d282956d8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,708] INFO [GroupCoordinator 2]: Member consumer-23-492417e9-f3cb-4bec-bbac-130895356907 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,731] INFO [GroupCoordinator 2]: Member consumer-29-64732e9a-2c2b-44fb-a8a5-f606462a4201 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,947] INFO [GroupCoordinator 2]: Member consumer-10-fdd0ca92-3604-46de-9e2b-97ca41d36150 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,228] INFO [GroupCoordinator 2]: Member consumer-3-feb6986d-79af-4c64-a8f8-2dbb3bdb73c3 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-2-0345e5d5-86fc-4df0-bd39-c35b75514cea in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-1-c301f59f-8a56-4bdb-a5ef-dc163232d378 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-13-56aea64a-ecca-45e7-9474-b8f1163d01c8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,266] INFO [GroupCoordinator 2]: Member consumer-9-3ee76e0e-86f1-4c0c-85cc-d07721bf36b1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,273] INFO [GroupCoordinator 2]: Member consumer-4-9fa81414-870d-444d-b5d1-c38ce5c157a8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,296] INFO [GroupCoordinator 2]: Member consumer-14-8236578f-b60d-4199-b621-913d025149d1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,656] INFO [GroupCoordinator 2]: Member consumer-12-2921b7de-1721-460f-adbf-4fb6951cca22 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,665] INFO [GroupCoordinator 2]: Member consumer-11-09d7015c-cc33-464e-93ac-fb270f209b3f in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,667] INFO [GroupCoordinator 2]: Member consumer-5-b3fe06ff-8ef4-4d60-8571-68b7cfee12bc in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,722] INFO [GroupCoordinator 2]: Member consumer-15-5af82ca6-0ebf-463e-b9c5-4bbde513453d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,754] INFO [GroupCoordinator 2]: Member consumer-7-c1e2bf89-c7c5-4363-b099-191956ed1c89 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-6-9b3be0e4-c1be-4d6a-98b1-caa9d095c403 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-16-0f48ad44-402a-4706-9d78-9d0d5077a56d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-8-0496aa54-79f7-41b8-8f31-7823ed72f16a in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Group kafka-consumer-group-x with generation 1371 is now empty (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:35,226] INFO [GroupCoordinator 2]: Preparing to rebalance group kafka-consumer-group-x in state PreparingRebalance with old generation 1371 (__consumer_offsets-40) (reason: Adding new member consumer-1-7787a334-acf2-4534-bc19-78af35371bfb) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:38,227] INFO [GroupCoordinator 2]: Stabilized group kafka-consumer-group-x generation 1372 (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:38,239] INFO [GroupCoordinator 2]: Assignment received from leader for group kafka-consumer-group-x for generation 1372 (kafka.coordinator.group.GroupCoordinator)

由于我们在处理消息时没有看到任何错误,或者处理过程需要花费大量时间的迹象,因此我们无法解释这些突然的重新平衡。
有人知道这是怎么回事吗?
我们消费者的配置大多默认为 enable.auto.commit=false 以及 AckMode.RECORD .

ahy6op9u

ahy6op9u1#

Kafka再平衡的原因:
一个新的消费者加入了这个团体
消费者离开了该组(完全关闭)
添加了新分区
在Kafka看来,消费者似乎已经死了
原因:
使用者无法在max.poll.interval.ms(长时间运行的进程)中共享
使用者无法在session.timeout.ms中将心跳发送到kafka

**通常,heartbeat线程在heartbeat.interval.ms中运行(默认为3秒)

你的位置好像是4.2。
可能有各种各样的原因。要解决此问题,可以增加session.timeout.ms(默认值为10秒。)
另一个解决方案是优化系统,使其按预期运行心跳线程(避免高iowait、负载平衡等)

bq3bfh9z

bq3bfh9z2#

我敢肯定你会遇到Kafka7196。您应该将服务器升级到2.0.1或更高版本。
作为解决方法,您可以尝试配置随机 client.id 每次你启动,但这可能会有一些不必要的副作用。

相关问题