flink 1.9:独立群集:无法从taskexecutor id:`asktimeoutexception传输文件`

polhcujo  于 2021-06-24  发布在  Flink
关注(0)|答案(1)|浏览(889)

背景
正在尝试创建apache flink独立集群。
环境:aws
工作经理:1人
任务管理器:2
配置:

  1. FLINK_PLUGINS_DIR : /usr/local/flink-1.9.1/plugins
  2. io.tmp.dirs : /tmp/flink
  3. jobmanager.execution.failover-strategy : region
  4. jobmanager.heap.size : 1024m
  5. jobmanager.rpc.address : job manager ip
  6. jobmanager.rpc.port : 6123
  7. jobstore.cache-size : 52428800
  8. jobstore.expiration-time : 3600
  9. parallelism.default : 4
  10. slot.idle.timeout : 50000
  11. slot.request.timeout : 300000
  12. task.cancellation.interval : 30000
  13. task.cancellation.timeout : 180000
  14. task.cancellation.timers.timeout : 7500
  15. taskmanager.exit-on-fatal-akka-error : false
  16. taskmanager.heap.size : 1024m
  17. taskmanager.network.bind-policy : "ip"
  18. taskmanager.numberOfTaskSlots : 2
  19. taskmanager.registration.initial-backoff: 500ms
  20. taskmanager.registration.timeout : 5min
  21. taskmanager.rpc.port : 50100-50200
  22. web.tmpdir : /tmp/flink-web-74cce811-17c0-411e-9d11-6d91edd2e9b0

示例类型:t2介质(2个CPU 4 gb内存)
打开的安全组端口:6123、8081、50100-50200
操作系统:centos linux 7.6.1810版(核心)
java 语:

  1. openjdk version "1.8.0_191"
  2. OpenJDK Runtime Environment (build 1.8.0_191-b12)
  3. OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

群集已启动并正常运行

  1. org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://ip:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
  2. org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http:/ip:8081.
  3. org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
  4. org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
  5. org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - ResourceManager akka.tcp://flink@ip:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
  6. org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager.
  7. org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink@ip:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
  8. org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
  9. org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering TaskManager with ResourceID f2c7f664378b40ce44463713ae98e1c4 (akka.tcp://flink@TaskManager1Ip:38566/user/taskmanager_0) at ResourceManager
  10. org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering TaskManager with ResourceID 354a785f637751fb3b034618a47480ed (akka.tcp://flink@TaskManager2Ip:34400/user/taskmanager_0) at ResourceManager

ui显示所有集群详细信息


问题
任务提交不起作用

  1. java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn'
  2. t send a reply.
  3. at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
  4. at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
  5. at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
  6. at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
  7. at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
  8. at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
  9. at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:871)
  10. at akka.dispatch.OnComplete.internal(Future.scala:263)
  11. at akka.dispatch.OnComplete.internal(Future.scala:261)
  12. at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
  13. at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
  14. at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
  15. at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
  16. at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
  17. at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
  18. at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:644)
  19. at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
  20. at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
  21. at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
  22. at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
  23. at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
  24. at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
  25. at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
  26. at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
  27. at java.lang.Thread.run(Thread.java:748)
  28. Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
  29. at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
  30. at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
  31. at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
  32. ... 9 more
  33. 2020-02-04 23:25:16,125 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler - Unhandled exception.
  34. akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
  35. at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
  36. at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
  37. at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
  38. at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
  39. at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
  40. at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
  41. at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
  42. at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
  43. at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
  44. at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
  45. at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
  46. at java.lang.Thread.run(Thread.java:748)

有人能解释一下吗?是端口/防火墙问题还是某些设置出错?

kcrjzv8t

kcrjzv8t1#

安全组端口权限出现问题。打开时整个范围从 0 - 65565 ,一切开始运作。不过,这对于生产系统来说还不够好,因此最终对于美国的在职工人来说 flink-conf.yaml 配置文件,密钥 taskmanager.data.port 被分配了一个特定的端口,这就成功了。这样,可以将任务管理器配置为侦听范围内的特定端口。

相关问题