pyspark 使用增量湖的spark-sql/spark-submit导致空指针异常(位于org.apache.spark.storage.BlockManagerMasterEndpoint)

pvcm50d1  于 2023-01-01  发布在  Spark
关注(0)|答案(1)|浏览(210)

我正在使用delta lake on通过提交以下命令使用pyspark

  1. spark-sql --packages io.delta:delta-core_2.12:0.8.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

系统规格:

  • Spark-3.0.3
  • 比例尺-2.12.10
  • Java语言1.8.0
  • Hadoop-2.7

我正在查看参考博客https://docs.delta.io/latest/quick-start.htmlhttps://www.confessionsofadataguy.com/introduction-to-delta-lake-on-apache-spark-for-data-engineers/
但是当我使用没有delta配置的spark时spark工作正常。
错误(截断堆栈跟踪):

  1. 22/12/29 20:38:10 INFO Executor: Starting executor ID driver on host 192.168.0.100
  2. 22/12/29 20:38:10 INFO Executor: Fetching spark://192.168.0.100:50920/jars/org.antlr_antlr4-runtime-4.7.jar with timestamp 1672326488255
  3. 22/12/29 20:38:23 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
  4. 22/12/29 20:38:23 INFO Executor: Told to re-register on heartbeat
  5. 22/12/29 20:38:23 INFO BlockManager: BlockManager null re-registering with master
  6. 22/12/29 20:38:23 INFO BlockManagerMaster: Registering BlockManager null
  7. 22/12/29 20:38:23 ERROR Inbox: Ignoring error
  8. java.lang.NullPointerException
  9. at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:404)
  10. at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:97)
  11. at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
  12. at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
  13. at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
  14. at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
  15. at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
  16. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  17. at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  18. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  19. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  20. at java.lang.Thread.run(Thread.java:745)
  21. 22/12/29 20:38:23 WARN Executor: Issue communicating with driver in heartbeater
  22. org.apache.spark.SparkException: Exception thrown in awaitResult:
  23. at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
  24. at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
  25. at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
  26. at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
  27. at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:66)
  28. at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:567)
  29. at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:934)
  30. at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:200)
  31. at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  32. at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934)
  33. at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
  34. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  35. at java.lang.Thread.run(Thread.java:745)
  36. Caused by: java.lang.NullPointerException
  37. at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:404)
  38. at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:97)
  39. at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
  40. at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
  41. at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  42. ... 3 more
  43. 22/12/29 20:38:31 ERROR Utils: Aborting task
  44. java.io.IOException: Failed to connect to /192.168.0.100:50920
  45. at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
  46. at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
  47. at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392)
  48. at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360)
  49. at scala.Option.getOrElse(Option.scala:189)
  50. at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
  51. at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:49)
  52. at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:321)
  53. at java.lang.reflect.Method.invoke(Method.java:498)
  54. at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
  55. Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /192.168.0.100:50920
  56. Caused by: java.net.ConnectException: Connection timed out: no further information
  57. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  58. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  59. at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
  60. at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
  61. at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
  62. at java.lang.Thread.run(Thread.java:745)
  63. 22/12/29 20:38:31 ERROR SparkContext: Error initializing SparkContext.
  64. java.io.IOException: Failed to connect to /192.168.0.100:50920
  65. at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
  66. at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
  67. at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392)
  68. at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360)
  69. at java.lang.reflect.Method.invoke(Method.java:498)
  70. at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  71. at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  72. at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
  73. at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  74. at java.lang.Thread.run(Thread.java:745)
  75. 22/12/29 20:38:31 INFO SparkUI: Stopped Spark web UI at http://192.168.0.100:4041
  76. 22/12/29 20:38:31 ERROR Utils: Uncaught exception in thread main
  77. java.lang.NullPointerException
  78. at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:168)
  79. at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
  80. at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:734)
  81. at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2171)
  82. java.lang.NullPointerException
  83. at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:304)
  84. at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  85. at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:221)
  86. at org.apache.spark.executor.Executor.stop(Executor.scala:304)
  87. at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:74)
  88. at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
  89. at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
  90. at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
  91. at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
  92. 22/12/29 20:38:31 INFO ShutdownHookManager: Shutdown hook called

错过了什么?

velaa5lx

velaa5lx1#

试着这样做:

  1. !pip3 install findspark --user
  2. import findspark
  3. findspark.init()
  1. import os
  2. os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages io.delta:delta- core_2.12:2.2.0 --driver-memory 2g pyspark-shell'
  1. spark = SparkSession.builder.appName("your application") \
  2. .config("spark.jars.packages", "io.delta:delta-core_2.12:2.2.0") \
  3. .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
  4. .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
  5. .getOrCreate()

相关问题