在使用pyspark Dataframe 时,始终收到Py4JError

x6h2sr28  于 2023-01-01  发布在  Spark
关注(0)|答案(1)|浏览(299)

我有一个Docker容器,用vs代码运行,用pyspark连接到本地机器上的postgres数据库:

  1. from pyspark.sql import SparkSession
  2. spark = SparkSession \
  3. .builder \
  4. .appName("Python Spark SQL basic example") \
  5. .config("spark.jars", "/opt/spark/jars/postgresql-42.2.5.jar") \
  6. .getOrCreate()
  7. df = spark.read \
  8. .format("jdbc") \
  9. .option("url", "jdbc:postgresql://host.docker.internal:5432/postgres") \
  10. .option("dbtable", "chicago_crime") \
  11. .option("user", "postgres") \
  12. .option("password", "postgres") \
  13. .option("driver", "org.postgresql.Driver") \
  14. .load()
  15. type(df)

输出:**pyspark.sql. Dataframe . Dataframe **
工作原理的示例代码:

  1. df.printSchema()
  2. df.select('ogc_fid').show() #(Raises a Py4JJavaError sometimes)

不起作用的示例代码:

  1. df.show(1) # Py4JJavaError and ConnectionRefusedError: [Errno 111] Connection refused
  2. Output exceeds the size limit. Open the full output data in a text editor
  3. ---------------------------------------------------------------------------
  4. Py4JJavaError Traceback (most recent call last)
  5. [... skipping hidden 1 frame]
  6. Cell In[2], line 1
  7. ----> 1 df.show(1)
  8. File /usr/local/lib/python3.9/site-packages/pyspark/sql/dataframe.py:606, in DataFrame.show(self, n, truncate, vertical)
  9. 605 if isinstance(truncate, bool) and truncate:
  10. --> 606 print(self._jdf.showString(n, 20, vertical))
  11. 607 else:
  12. File /usr/local/lib/python3.9/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
  13. 1320 answer = self.gateway_client.send_command(command)
  14. -> 1321 return_value = get_return_value(
  15. 1322 answer, self.gateway_client, self.target_id, self.name)
  16. 1324 for temp_arg in temp_args:
  17. File /usr/local/lib/python3.9/site-packages/pyspark/sql/utils.py:190, in capture_sql_exception.<locals>.deco(*a, **kw)
  18. 189 try:
  19. --> 190 return f(*a, **kw)
  20. 191 except Py4JJavaError as e:
  21. File /usr/local/lib/python3.9/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
  22. 325 if answer[1] == REFERENCE_TYPE:
  23. ...
  24. --> 438 self.socket.connect((self.java_address, self.java_port))
  25. 439 self.stream = self.socket.makefile("rb")
  26. 440 self.is_connected = True
  27. ConnectionRefusedError: [Errno 111] Connection refused

有人知道这个Py4JJavaError是什么吗?以及如何克服它?

oxalkeyp

oxalkeyp1#

PySpark只是Spark实际实现的一个 Package 器,它是用Scala编写的,Py4J使您能够用Python与JVM进程通信。
这意味着Py4JJavaError只是一个抽象,它告诉您JVM进程抛出了一个异常。
真实的的错误是ConnectionRefusedError: [Errno 111] Connection refused。我假设错误是在连接到Postgres示例时引起的。

相关问题