我正在运行Apache Flink版本1.12.7,并将流执行环境配置为任务管理器的任务插槽数= 3(只是试验),但无法看到环境读取的文件的输出。相反,如日志中所示,执行图被卡在SCHEDULED状态,无法进入RUNNING状态。
请注意,如果属性文件中未传递任何配置,则一切正常,并且由于执行图切换到RUNNING状态,环境能够读取文件,因此可以看到输出。
代码如下:
ParameterTool parameters = ParameterTool.fromPropertiesFile("src/main/resources/application.properties");
Configuration config = Configuration.fromMap(parameters.toMap());
TaskExecutorResourceUtils.adjustForLocalExecution(config);
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment(config);
System.out.println("Config Params : " + config.toMap());
DataStream<String> inputStream =
env.readTextFile(FILEPATH);
DataStream<String> filteredData = inputStream.filter((String value) -> {
String[] tokens = value.split(",");
return Double.parseDouble(tokens[3]) >= 75.0;
});
filteredData.print(); // no o/p seen if configuration object is set otherwise everything works as expected
env.execute("Filter Country Details");
需要帮助来理解这种行为,以及为了打印输出沿着进行一些自定义配置,应该做哪些更改。谢谢。
1条答案
按热度按时间jjjwad0x1#
Okay..So found the answer to the above puzzle by referring to some links mentioned below.
Solution : So I set the parallelism (env.setParallelism) in the above code just after configuring the streaming execution environment and the file was read with output generated as expected.
Post that, experimented with a few things :
As per this link corresponding to Flink Architecture,
A Flink cluster needs exactly as many task slots as the highest parallelism used in the job
So its best to go with no. of task slots for a task manager equal to the parallelism configured.