saveasnewhadoopfile上的filenotfoundexception

nxowjjhe 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(321)

我正在使用spark将数据批量加载到hbase中。我的python脚本可以完美地完成这项工作，但是我需要能够使用spark submit提交它，以便在集群上运行它。
当我使用以下命令在本地运行脚本时：


# !/bin/bash

sudo /usr/hdp/current/spark-client/bin/spark-submit\
  --master local[*]\
  --deploy-mode client\
  --verbose\
  --num-executors 3\
  --executor-cores 1\
  --executor-memory 512m\
  --driver-memory 512m\
  --conf\
    spark.logConf=true\
  /test/BulkLoader.py

它工作得很好-加载数据，写入hfiles，批量加载它们。但是，当我用yarn运行代码时，如下所示：


# !/bin/bash

sudo /usr/hdp/current/spark-client/bin/spark-submit\
  --master yarn\
  --deploy-mode client\
  --verbose\
  --num-executors 3\
  --executor-cores 1\
  --executor-memory 512m\
  --driver-memory 512m\
  --conf\
    spark.logConf=true\
  --conf\
    spark.speculation=false\
  /test/BulkLoader.py

事情很快就会出错。一旦脚本尝试写入hfile，我就会得到以下错误：

An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsNewAPIHadoopFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0 failed 26 times, most recent failure: 
Lost task 0.25 in stage 15.0 (TID 67, sandbox.hortonworks.com): java.io.FileNotFoundException: File file:/tmp/hfiles-06-46-57/_temporary/0/_temporary/attempt_201602150647_0019_r_000000_25/f1 does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
...

写入hfile时，在 _temporary 目录。我环顾四周，发现许多其他人都遇到过这样的错误（这里和这里），但没有任何建议对我有效。我将执行者的数量设置为1，并将推测设置为false，因为这可能是错误的原因，但是问题仍然存在。如果有人能给我建议其他的选择，我将不胜感激。

hadoop yarn apache-spark

来源：https://stackoverflow.com/questions/35403449/filenotfoundexception-on-saveasnewhadoopfile

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

saveasnewhadoopfile上的filenotfoundexception

暂无答案！

相关问题

热门标签

最新问答