pyspark SparkSession是否需要SparkContext才能工作?

6bc51xsx  于 2022-11-01  发布在  Spark
关注(0)|答案(1)|浏览(260)

我构建了jupyter/all-spark-notebook Docker映像,安装了geomesa_pyspark,并尝试运行官方指南中的以下示例命令。

  1. import geomesa_pyspark
  2. import pyspark
  3. from pyspark.sql import SparkSession
  4. conf = geomesa_pyspark.configure(
  5. jars=['/usr/local/spark/jars/geomesa-accumulo-spark-runtime_2.11-2.0.0.jar'],
  6. packages=['geomesa_pyspark','pytz'],
  7. spark_home='/usr/local/spark/').\
  8. setAppName('MyTestApp')
  9. # sc = pyspark.SparkContext()
  10. spark = ( SparkSession
  11. .builder
  12. .config(conf=conf)
  13. .enableHiveSupport()
  14. .getOrCreate()
  15. )

如果取消对SparkContext创建语句的注解,则代码会出现此错误。

  1. ---------------------------------------------------------------------------
  2. Py4JJavaError Traceback (most recent call last)
  3. <ipython-input-1-22f9613a0be5> in <module>
  4. 31 .builder
  5. 32 .master('spark://spark-master:7077')
  6. ---> 33 .config(conf=conf)
  7. 34 .enableHiveSupport()
  8. 35 .getOrCreate()
  9. /usr/local/spark/python/pyspark/sql/session.py in getOrCreate(self)
  10. 171 for key, value in self._options.items():
  11. 172 sparkConf.set(key, value)
  12. --> 173 sc = SparkContext.getOrCreate(sparkConf)
  13. 174 # This SparkContext may be an existing one.
  14. 175 for key, value in self._options.items():
  15. /usr/local/spark/python/pyspark/context.py in getOrCreate(cls, conf)
  16. 365 with SparkContext._lock:
  17. 366 if SparkContext._active_spark_context is None:
  18. --> 367 SparkContext(conf=conf or SparkConf())
  19. 368 return SparkContext._active_spark_context
  20. 369
  21. /usr/local/spark/python/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
  22. 134 try:
  23. 135 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  24. --> 136 conf, jsc, profiler_cls)
  25. 137 except:
  26. 138 # If an error occurs, clean up in order to allow future SparkContext creation:
  27. /usr/local/spark/python/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
  28. 196
  29. 197 # Create the Java SparkContext through Py4J
  30. --> 198 self._jsc = jsc or self._initialize_context(self._conf._jconf)
  31. 199 # Reset the SparkConf to the one actually used by the SparkContext in JVM.
  32. 200 self._conf = SparkConf(_jconf=self._jsc.sc().conf())
  33. /usr/local/spark/python/pyspark/context.py in _initialize_context(self, jconf)
  34. 304 Initialize SparkContext in function to allow subclass specific initialization
  35. 305 """
  36. --> 306 return self._jvm.JavaSparkContext(jconf)
  37. 307
  38. 308 @classmethod
  39. /usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
  40. 1523 answer = self._gateway_client.send_command(command)
  41. 1524 return_value = get_return_value(
  42. -> 1525 answer, self._gateway_client, None, self._fqn)
  43. 1526
  44. 1527 for temp_arg in temp_args:
  45. /usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
  46. 326 raise Py4JJavaError(
  47. 327 "An error occurred while calling {0}{1}{2}.\n".
  48. --> 328 format(target_id, ".", name), value)
  49. 329 else:
  50. 330 raise Py4JError(
  51. Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
  52. : java.lang.AbstractMethodError: io.netty.util.concurrent.MultithreadEventExecutorGroup.newChild(Ljava/util/concurrent/ThreadFactory;[Ljava/lang/Object;)Lio/netty/util/concurrent/EventExecutor;
  53. at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
  54. at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:59)
  55. at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:78)
  56. at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:73)
  57. at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:60)
  58. at org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:50)
  59. at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:102)
  60. at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
  61. at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:71)
  62. at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
  63. at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
  64. at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
  65. at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
  66. at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
  67. at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
  68. at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
  69. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  70. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  71. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  72. at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  73. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
  74. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
  75. at py4j.Gateway.invoke(Gateway.java:238)
  76. at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
  77. at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
  78. at py4j.GatewayConnection.run(GatewayConnection.java:238)
  79. at java.lang.Thread.run(Thread.java:748)

我使用以下版本:

  • Spark2.4.5
  • Hadoop 2.7版本
  • java-1.8.0-openjdk-amd64.

为什么它需要一个SparkContext?它不应该包含在SparkSession中吗?

svmlkihl

svmlkihl1#

AbstractMethodError s表示类路径有问题-例如,请参见this post。由于错误发生在netty中,因此您应该检查不同版本的netty jar的类路径。

相关问题