在Sagemaker Studio中使用pyspark导入和读取csv- JAVA_HOME未设置PySparkRuntimeError

esbemjvw  于 2024-01-06  发布在  Spark
关注(0)|答案(1)|浏览(261)

当我尝试在Sagemaker Studio中使用pyspark在S3中导入和读取CSV时,出现了Java运行时错误。有人可以帮助吗?
https://github.com/rog-SARTHAK/AWS-Sagemaker-Studio/blob/main/Crime.ipynb

  1. from pyspark import SparkContext
  2. from pyspark.sql import SparkSession
  3. spark = SparkSession.builder.appName("CrimeInsights").getOrCreate()

字符串
它给我JAVA_HOME是不设置错误:

  1. JAVA_HOME is not set
  2. ---------------------------------------------------------------------------
  3. PySparkRuntimeError Traceback (most recent call last)
  4. Cell In[49], line 1
  5. ----> 1 spark = SparkSession.builder.appName("CrimeInsights").getOrCreate()
  6. File /opt/conda/lib/python3.10/site-packages/pyspark/sql/session.py:497, in SparkSession.Builder.getOrCreate(self)
  7. 495 sparkConf.set(key, value)
  8. 496 # This SparkContext may be an existing one.
  9. --> 497 sc = SparkContext.getOrCreate(sparkConf)
  10. 498 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
  11. 499 # by all sessions.
  12. 500 session = SparkSession(sc, options=self._options)
  13. File /opt/conda/lib/python3.10/site-packages/pyspark/context.py:515, in SparkContext.getOrCreate(cls, conf)
  14. 513 with SparkContext._lock:
  15. 514 if SparkContext._active_spark_context is None:
  16. --> 515 SparkContext(conf=conf or SparkConf())
  17. 516 assert SparkContext._active_spark_context is not None
  18. 517 return SparkContext._active_spark_context
  19. File /opt/conda/lib/python3.10/site-packages/pyspark/context.py:201, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls, udf_profiler_cls, memory_profiler_cls)
  20. 195 if gateway is not None and gateway.gateway_parameters.auth_token is None:
  21. 196 raise ValueError(
  22. 197 "You are trying to pass an insecure Py4j gateway to Spark. This"
  23. 198 " is not allowed as it is a security risk."
  24. 199 )
  25. --> 201 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  26. 202 try:
  27. 203 self._do_init(
  28. 204 master,
  29. 205 appName,
  30. (...)
  31. 215 memory_profiler_cls,
  32. 216 )
  33. File /opt/conda/lib/python3.10/site-packages/pyspark/context.py:436, in SparkContext._ensure_initialized(cls, instance, gateway, conf)
  34. 434 with SparkContext._lock:
  35. 435 if not SparkContext._gateway:
  36. --> 436 SparkContext._gateway = gateway or launch_gateway(conf)
  37. 437 SparkContext._jvm = SparkContext._gateway.jvm
  38. 439 if instance:
  39. File /opt/conda/lib/python3.10/site-packages/pyspark/java_gateway.py:107, in launch_gateway(conf, popen_kwargs)
  40. 104 time.sleep(0.1)
  41. 106 if not os.path.isfile(conn_info_file):
  42. --> 107 raise PySparkRuntimeError(
  43. 108 error_class="JAVA_GATEWAY_EXITED",
  44. 109 message_parameters={},
  45. 110 )
  46. 112 with open(conn_info_file, "rb") as info:
  47. 113 gateway_port = read_int(info)
  48. PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
  49. ``

qlckcl4x

qlckcl4x1#

我认为潜在的问题与PySpark、Java和SageMaker环境之间的设置和兼容性有关。我将从正确设置JAVA_HOME环境变量开始。

相关问题