当我尝试在Sagemaker Studio中使用pyspark在S3中导入和读取CSV时,出现了Java运行时错误。有人可以帮助吗?
https://github.com/rog-SARTHAK/AWS-Sagemaker-Studio/blob/main/Crime.ipynb
from pyspark import SparkContext
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("CrimeInsights").getOrCreate()
字符串
它给我JAVA_HOME是不设置错误:
JAVA_HOME is not set
---------------------------------------------------------------------------
PySparkRuntimeError Traceback (most recent call last)
Cell In[49], line 1
----> 1 spark = SparkSession.builder.appName("CrimeInsights").getOrCreate()
File /opt/conda/lib/python3.10/site-packages/pyspark/sql/session.py:497, in SparkSession.Builder.getOrCreate(self)
495 sparkConf.set(key, value)
496 # This SparkContext may be an existing one.
--> 497 sc = SparkContext.getOrCreate(sparkConf)
498 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
499 # by all sessions.
500 session = SparkSession(sc, options=self._options)
File /opt/conda/lib/python3.10/site-packages/pyspark/context.py:515, in SparkContext.getOrCreate(cls, conf)
513 with SparkContext._lock:
514 if SparkContext._active_spark_context is None:
--> 515 SparkContext(conf=conf or SparkConf())
516 assert SparkContext._active_spark_context is not None
517 return SparkContext._active_spark_context
File /opt/conda/lib/python3.10/site-packages/pyspark/context.py:201, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls, udf_profiler_cls, memory_profiler_cls)
195 if gateway is not None and gateway.gateway_parameters.auth_token is None:
196 raise ValueError(
197 "You are trying to pass an insecure Py4j gateway to Spark. This"
198 " is not allowed as it is a security risk."
199 )
--> 201 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
202 try:
203 self._do_init(
204 master,
205 appName,
(...)
215 memory_profiler_cls,
216 )
File /opt/conda/lib/python3.10/site-packages/pyspark/context.py:436, in SparkContext._ensure_initialized(cls, instance, gateway, conf)
434 with SparkContext._lock:
435 if not SparkContext._gateway:
--> 436 SparkContext._gateway = gateway or launch_gateway(conf)
437 SparkContext._jvm = SparkContext._gateway.jvm
439 if instance:
File /opt/conda/lib/python3.10/site-packages/pyspark/java_gateway.py:107, in launch_gateway(conf, popen_kwargs)
104 time.sleep(0.1)
106 if not os.path.isfile(conn_info_file):
--> 107 raise PySparkRuntimeError(
108 error_class="JAVA_GATEWAY_EXITED",
109 message_parameters={},
110 )
112 with open(conn_info_file, "rb") as info:
113 gateway_port = read_int(info)
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
``
型
1条答案
按热度按时间qlckcl4x1#
我认为潜在的问题与PySpark、Java和SageMaker环境之间的设置和兼容性有关。我将从正确设置JAVA_HOME环境变量开始。