由于spark | |路径中的阶段失败而中止作业

9nvpjoqh 于 2021-07-13 发布在 Spark

关注(0)|答案(0)|浏览(268)

我正试图根据某些特定条件从hdfs目录中删除或修改数据。因此，当我们第一次删除/修改数据时，它会成功地处理，但第二次它会删除目录中已经不存在的文件。似乎存在按路径或表刷新的问题。我试过了，但运气不好。
代码

FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs://path")).foreach( x => { 
     if(x.isFile && x.getPath.toString.contains("parquet")) 
     { 
          val createdTime = x.getModificationTime
          println(s"converted Time: $convertedRunIdTime   - createdTime : $createdTime")
          if (createdTime < convertedRunIdTime) {
                  print("Deleting ${x.getPath.toString}")
                 FileSystem.get(sc.hadoopConfiguration).delete(x.getPath, true)
          }
     } 
})

错误。
作业因阶段失败而中止：阶段8.0中的任务20失败了4次，最近的失败：阶段8.0中的任务20.3丢失（tid 462，datanode03，executor 2）：java.io.filenotfoundexception:文件不存在：/stack/store/activity/run\u id=20210224060208924/part-00103.c000.snappy.parquet+详细信息底层文件可能已更新。通过在sql中运行“refresh table tablename”命令或通过重新创建所涉及的数据集/Dataframe，可以显式地使spark中的缓存无效。

scala apache-spark apache-spark-sql spark-streaming

来源：https://stackoverflow.com/questions/66346719/job-aborted-due-to-stage-failure-in-spark-path-not-found

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

由于spark | |路径中的阶段失败而中止作业

暂无答案！

相关问题

热门标签

最新问答