我有一个Docker容器,用vs代码运行,用pyspark连接到本地机器上的postgres数据库:
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.jars", "/opt/spark/jars/postgresql-42.2.5.jar") \
.getOrCreate()
df = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://host.docker.internal:5432/postgres") \
.option("dbtable", "chicago_crime") \
.option("user", "postgres") \
.option("password", "postgres") \
.option("driver", "org.postgresql.Driver") \
.load()
type(df)
输出:**pyspark.sql. Dataframe . Dataframe **
工作原理的示例代码:
df.printSchema()
df.select('ogc_fid').show() #(Raises a Py4JJavaError sometimes)
不起作用的示例代码:
df.show(1) # Py4JJavaError and ConnectionRefusedError: [Errno 111] Connection refused
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
[... skipping hidden 1 frame]
Cell In[2], line 1
----> 1 df.show(1)
File /usr/local/lib/python3.9/site-packages/pyspark/sql/dataframe.py:606, in DataFrame.show(self, n, truncate, vertical)
605 if isinstance(truncate, bool) and truncate:
--> 606 print(self._jdf.showString(n, 20, vertical))
607 else:
File /usr/local/lib/python3.9/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1324 for temp_arg in temp_args:
File /usr/local/lib/python3.9/site-packages/pyspark/sql/utils.py:190, in capture_sql_exception.<locals>.deco(*a, **kw)
189 try:
--> 190 return f(*a, **kw)
191 except Py4JJavaError as e:
File /usr/local/lib/python3.9/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
325 if answer[1] == REFERENCE_TYPE:
...
--> 438 self.socket.connect((self.java_address, self.java_port))
439 self.stream = self.socket.makefile("rb")
440 self.is_connected = True
ConnectionRefusedError: [Errno 111] Connection refused
有人知道这个Py4JJavaError是什么吗?以及如何克服它?
1条答案
按热度按时间oxalkeyp1#
PySpark只是Spark实际实现的一个 Package 器,它是用Scala编写的,Py4J使您能够用Python与JVM进程通信。
这意味着Py4JJavaError只是一个抽象,它告诉您JVM进程抛出了一个异常。
真实的的错误是
ConnectionRefusedError: [Errno 111] Connection refused
。我假设错误是在连接到Postgres示例时引起的。