Flink作业完成时的保存点

epggiuax  于 2022-12-09  发布在  Apache
关注(0)|答案(1)|浏览(168)

I have a usecase where I need to seed a Flink Application(both RocksDB state and Broadcast State) using Bounded S3 sources and then read other unbounded/bounded S3 sources after the seeding is complete.
I was trying to achieve this in 2 steps:

  1. Seeding: Trigger a Flink job with only the seeding data bounded source and take a savepoint after the job finishes.
  2. Regular Processing: Restore from seeded savepoint on a new Flink graph to process other unbounded/bounded S3 sources.
    Questions:
  3. For Step 1: Does Flink support taking savepoints automatically after Job Finishes in Streaming Mode.
  4. If only manual savepoint trigger is supported, what can be used a done signal that all the seeding data is processed completely and all the task are finished processing?
    Any other approaches to achieve the seeding usecase is appreciated as well. Note: Approaches where we buffer the regular data until seeding data is processed is not feasible for my usecase
    Thanks
2wnc66cl

2wnc66cl1#

1.使用unbounded source,您可以使用externalized checkpoint,并且您将能够从检查点启动/恢复作业。启用此功能时,必须有一个进程在作业取消时清理检查点,否则Flink不会删除检查点。
1.您可以使用Flink 1.15中提供的新特性(已完成任务的检查点)来完成此操作。

相关问题