无法在google colab中运行pyspark

oknwwptz  于 2021-05-29  发布在  Spark
关注(0)|答案(2)|浏览(794)

嗨,我正尝试使用以下代码在google colab上运行pyspark:

!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
!tar xf spark-2.4.5-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.5-bin-hadoop2.7"
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

我收到以下错误:

/content/spark-2.4.5-bin-hadoop2.7/python/pyspark/java_gateway.py in _launch_gateway(conf, insecure)
    106 
    107             if not os.path.isfile(conn_info_file):
--> 108                 raise Exception("Java gateway process exited before sending its port number")
    109 
    110             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

注意:我今天下午才运行这个代码,突然这个错误在晚上出现了

rekjcdws

rekjcdws1#

googlecollab已经预装了java。所以如果你跑了

!pip install pyspark

然后使用spark它能工作。。不需要findspark或其他不必要的库。

isr3a4wc

isr3a4wc2#

请检查wget是否工作。如果没有,请将最新版本的apachespark上传到google drive并将其解包到google collaboratory,然后添加给定的路径。您的代码无法工作,因为它无法找到spark文件夹。wget不工作

相关问题