Kafka 为什么retention.msKaka Streams的www.example.com重分区主题默认设置为-1?这不是无限保留重分区主题中的消息吗?

jogvjijk  于 2022-12-11  发布在  Apache
关注(0)|答案(2)|浏览(157)

我想这与下面的链接有关,但我不明白。

可以retention.ms为Kafka流内部主题(如 *-changelog主题)提供主题配置(如“www.example.com“、“cleanup.policy”)以删除无用的日志。
但是当涉及到像 *-repartition主题这样的内部主题时,不可能提供主题配置值,即使repartition主题的默认“retention.ms“是“-1”,这意味着无限保留。我如何删除或管理repartition主题?否则repartition主题的大小将变得太大,可能会出现磁盘故障问题。
如何管理重新分区主题?什么是purgeData?在文档中找不到任何相关的解释。

w8f9ii69

w8f9ii691#

Fact

  • retention.ms for the repartition topics is -1 by default and there's no way to override this value in kafka-streams client code.
    What I misunderstood
  • Size of the repartition topic would be increasing infinitely since the retentions.ms for the repartition topics is -1.
    Fix misunderstanding
  • There's a method called maybeCommit in the StreamThread class.
  • maybeCommit method is called iteratively inside the loop that handles stream records.
  • Inside the maybeCommit method (version 2.7.1), there's a comment like below.

try to purge the committed records for repartition topics if possible

  • https://github.com/apache/kafka/blob/2.7.1/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L923-L926
  • Based on this, what I understand is that when the record in the repartition topics is streamed down to the changelog topic, then the records already sent are purged periodically.
  • Therefore, there's no need to clear or manage retention.ms for the repartition topics.
    Reference
  • https://issues.apache.org/jira/browse/KAFKA-6150

Please leave a comment or correct this if I'm wrong.

1tuwyuhd

1tuwyuhd2#

我在ksqldb上也遇到了同样的问题。内部主题在几天内就增长了TB的数据,默认情况下保留时间是无限的。我们修改了它们,将www.example.com设置retention.ms为某个值,而不是无限(-1),但之后一切都坏了。今天我执行了以下命令:set topic.retention.ms=3600000之后,我创建了一个表,所有的内部主题都是用retention.ms=1h而不是infinite创建的。下周将在prd环境中尝试,看看ksqldb(0.28.2)是否会驱逐段,一切正常。https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#internal-topic-parameters希望能有所帮助

相关问题