pyspark 在jupyter笔记本中定义SparkContext时出错(Python 3(ipykernel))

rqcrx0a6  于 2022-11-01  发布在  Spark
关注(0)|答案(2)|浏览(486)

sc = SparkContext(conf=conf) error
使用anaconda的Jupyter笔记本,这是我第一次使用pyspark。我尝试做的是定义spark环境来读取本地磁盘上的csv文件。
编辑:添加了文本,因为user 2314737要求文本而不是图像,现在StackOverflow想让我添加更多细节,因为我的文本主体主要是代码XD,所以我必须与你们聊天,在这部分你可以忽略这个文本主体,但我必须写,直到我可以按保存编辑。
输入:

  1. from pyspark import SparkContext, SparkConf
  2. conf = SparkConf().setAppName("PrdectiveModel")
  3. sc = SparkContext(conf=conf) ----> the error is in this line

输出量:

  1. Py4JJavaError Traceback (most recent call last)
  2. Input In [13], in <cell line: 3>()
  3. 1 from pyspark import SparkContext, SparkConf
  4. 2 conf = SparkConf().setAppName("PrdectiveModel")
  5. ----> 3 sc = SparkContext(conf=conf)
  6. File E:\anaconda3\lib\site-packages\pyspark\context.py:146, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
  7. 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  8. 145 try:
  9. --> 146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  10. 147 conf, jsc, profiler_cls)
  11. 148 except:
  12. 149 # If an error occurs, clean up in order to allow future SparkContext creation:
  13. 150 self.stop()
  14. File E:\anaconda3\lib\site-packages\pyspark\context.py:209, in SparkContext._do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
  15. 206 self.environment["PYTHONHASHSEED"] = os.environ.get("PYTHONHASHSEED", "0")
  16. 208 # Create the Java SparkContext through Py4J
  17. --> 209 self._jsc = jsc or self._initialize_context(self._conf._jconf)
  18. 210 # Reset the SparkConf to the one actually used by the SparkContext in JVM.
  19. 211 self._conf = SparkConf(_jconf=self._jsc.sc().conf())
  20. File E:\anaconda3\lib\site-packages\pyspark\context.py:329, in SparkContext._initialize_context(self, jconf)
  21. 325 def _initialize_context(self, jconf):
  22. 326 """
  23. 327 Initialize SparkContext in function to allow subclass specific initialization
  24. 328 """
  25. --> 329 return self._jvm.JavaSparkContext(jconf)
  26. File E:\anaconda3\lib\site-packages\py4j\java_gateway.py:1585, in JavaClass.__call__(self, *args)
  27. 1579 command = proto.CONSTRUCTOR_COMMAND_NAME +\
  28. 1580 self._command_header +\
  29. 1581 args_command +\
  30. 1582 proto.END_COMMAND_PART
  31. 1584 answer = self._gateway_client.send_command(command)
  32. -> 1585 return_value = get_return_value(
  33. 1586 answer, self._gateway_client, None, self._fqn)
  34. 1588 for temp_arg in temp_args:
  35. 1589 temp_arg._detach()
  36. File E:\anaconda3\lib\site-packages\py4j\protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
  37. 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
  38. 325 if answer[1] == REFERENCE_TYPE:
  39. --> 326 raise Py4JJavaError(
  40. 327 "An error occurred while calling {0}{1}{2}.\n".
  41. 328 format(target_id, ".", name), value)
  42. 329 else:
  43. 330 raise Py4JError(
  44. 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
  45. 332 format(target_id, ".", name, value))
  46. Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
  47. : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
  48. at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
  49. at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
  50. at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
  51. at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
  52. at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
  53. at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
  54. at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
  55. at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
  56. at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  57. at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
  58. at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  59. at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
  60. at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
  61. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
  62. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
  63. at py4j.Gateway.invoke(Gateway.java:238)
  64. at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
  65. at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
  66. at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
  67. at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
  68. at java.base/java.lang.Thread.run(Thread.java:833)

所以我照你说的做了检查环境之类的。甚至用了findsprak
投入量

  1. import findspark
  2. findspark.init()
  3. findspark.find()

输出功率

  1. 'C:\\spark-3.2.1-bin-hadoop3.2'

因此它已正确安装并可以导入,没有任何问题。
然而!!!!!!!!!!!我的错误仍然存在

  1. sc = SparkContext(conf=conf)

为什么这一行会导致错误
..............
所以我试了另一种密码

  1. import pyspark
  2. from pyspark.sql import SparkSession
  3. spark = SparkSession.builder.getOrCreate()
  4. df = spark.sql("select 'spark' as hello ")
  5. df.show()

现在出现了另一个错误

  1. iRuntimeError Traceback (most recent call last)
  2. Input In [6], in <cell line: 5>()
  3. 1 import pyspark
  4. 3 from pyspark.sql import SparkSession
  5. ----> 5 spark = SparkSession.builder.getOrCreate()
  6. 7 df = spark.sql("select 'spark' as hello ")
  7. 9 df.show()
  8. File C:\spark-3.2.1-bin-hadoop3.2\python\pyspark\sql\session.py:228, in SparkSession.Builder.getOrCreate(self)
  9. 226 sparkConf.set(key, value)
  10. 227 # This SparkContext may be an existing one.
  11. --> 228 sc = SparkContext.getOrCreate(sparkConf)
  12. 229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
  13. 230 # by all sessions.
  14. 231 session = SparkSession(sc)
  15. File C:\spark-3.2.1-bin-hadoop3.2\python\pyspark\context.py:392, in SparkContext.getOrCreate(cls, conf)
  16. 390 with SparkContext._lock:
  17. 391 if SparkContext._active_spark_context is None:
  18. --> 392 SparkContext(conf=conf or SparkConf())
  19. 393 return SparkContext._active_spark_context
  20. File C:\spark-3.2.1-bin-hadoop3.2\python\pyspark\context.py:144, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
  21. 139 if gateway is not None and gateway.gateway_parameters.auth_token is None:
  22. 140 raise ValueError(
  23. 141 "You are trying to pass an insecure Py4j gateway to Spark. This"
  24. 142 " is not allowed as it is a security risk.")
  25. --> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  26. 145 try:
  27. 146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  28. 147 conf, jsc, profiler_cls)
  29. File C:\spark-3.2.1-bin-hadoop3.2\python\pyspark\context.py:339, in SparkContext._ensure_initialized(cls, instance, gateway, conf)
  30. 337 with SparkContext._lock:
  31. 338 if not SparkContext._gateway:
  32. --> 339 SparkContext._gateway = gateway or launch_gateway(conf)
  33. 340 SparkContext._jvm = SparkContext._gateway.jvm
  34. 342 if instance:
  35. File C:\spark-3.2.1-bin-hadoop3.2\python\pyspark\java_gateway.py:108, in launch_gateway(conf, popen_kwargs)
  36. 105 time.sleep(0.1)
  37. 107 if not os.path.isfile(conn_info_file):
  38. --> 108 raise RuntimeError("Java gateway process exited before sending its port number")
  39. 110 with open(conn_info_file, "rb") as info:
  40. 111 gateway_port = read_int(info)
  41. RuntimeError: Java gateway process exited before sending its port number

我希望这两个错误之间有联系
谢谢你花时间阅读这篇文章,希望它得到解决

irtuqstp

irtuqstp1#

一个常见的问题是Java版本错误。
检查是否已安装Java 8(或更高版本),以及是否已设置JAVA_HOME环境变量(设置为安装目录)。
如果这样做没有帮助,您可能还需要设置变量SPARK_HOME和PYTHONPATH(请参见https://spark.apache.org/docs/latest/api/python/getting_started/install.html)。

falq053o

falq053o2#

我也得到了相同的错误,但它将在以下2个步骤解决。
1.安装openJDK
1.编写以下代码

  1. import pyspark
  2. from pyspark.sql import SparkSession
  3. #Create SparkSession
  4. spark = SparkSession.builder
  5. .master("local[1]")
  6. .appName("myapp.com")
  7. .getOrCreate()

相关问题