来自python worker的错误:/usr/bin/python没有名为pyspark的模块

quhf5bfb  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(302)

我试图在yarn上运行pyspark,但在控制台上键入任何命令时,我收到以下错误。
我可以运行scala shell在Spark在本地和Yarn模式。pyspark在本地模式下运行良好,但在Yarn模式下不工作。
操作系统:rhel 6.x
hadoop发行版:ibm biginsights 4.0
spark版本:1.2.1
warn scheduler.tasksetmanager:在阶段0.0(tid 0,work)中丢失任务0.0:org.apache.spark.sparkexception:error from python worker:/usr/bin/python:no module named pyspark pythonpath was:/mnt/sdj1/hadoop/yarn/local/filecache/13/spark-assembly.jar(我的评论是:linux文件系统上不存在此路径,但是逻辑数据节点)java.io.eofexception位于java.io.datainputstream.readint(datainputstream)。java:392)在org.apache.spark.api.pythonworkerfactory.startdaemon(pythonworkerfactory。scala:163)位于org.apache.spark.api.pythonworkerfactory.createthroughdaemon(pythonworkerfactory。scala:86)在org.apache.spark.api.pythonworkerfactory.create(pythonworkerfactory。scala:62)在org.apache.spark.sparkenv.createpythonworker(sparkenv。scala:102)在org.apache.spark.api.pythonrdd.compute(pythonrdd。scala:70)在org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd。scala:280)在org.apache.spark.rdd.rdd.iterator(rdd。scala:247)在org.apache.spark.scheduler.resulttask.runtask(resulttask。scala:61)在org.apache.spark.scheduler.task.run(task。scala:56)在org.apache.spark.executor.executor$taskrunner.run(executor。scala:200)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1145)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:615)在java.lang.thread.run(线程。java:745)
我已经通过export命令设置了spark\u home和pythonpath,如下所示

export SPARK_HOME=/path/to/spark
export PYTHONPATH=/path/to/spark/python/:/path/to/spark/lib/spark-assembly.jar

有人能帮我解决这个问题吗?

回答:

经过一番挖掘,我发现pyspark在big insights 4.0开箱即用中确实存在一些问题,有人建议我们升级到bi4.1。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题