覆盖hadoop

vdzxcuhz  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(370)

我正在数据管道中运行一个emr活动,分析日志文件,当管道失败时,我得到以下错误:
线程“main”org.apache.hadoop.mapred.filealreadyexistsexception中出现异常:输出目录hdfs://10.211.146.177:9000/home/hadoop/temp-output-s3copy-2013-05-24-00已存在于org.apache.hadoop.mapred.fileoutputformat.checkoutputspecs(fileoutputformat)。java:121)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:944)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:905)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:396)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1132)在org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient。java:905)在org.apache.hadoop.mapred.jobclient.submitjob(jobclient。java:879)在org.apache.hadoop.mapred.jobclient.runjob(jobclient。java:1316)在com.valtira.datapipeline.stream.cloudfrontstreamlogprocessors.main(cloudfrontstreamlogprocessors。java:216)位于sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:39)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:25)在java.lang.reflect.method.invoke(方法。java:597)在org.apache.hadoop.util.runjar.main(runjar。java:187)
我尝试通过添加以下内容删除该文件夹:
filesystem fs=filesystem.get(getconf());fs.delete(新路径(“path/to/file”),true);//删除文件,对于递归为真
但它不起作用。有没有办法重写java中hadoop的fileoutputformat方法?有没有办法在java中忽略这个错误?

dgjrabp2

dgjrabp21#

当输出目录使用日期进行命名时,要删除的文件的路径会更改。有两种删除方法:
在shell上,尝试以下操作:

hadoop dfs -rmr hdfs://127.0.0.1:9000/home/hadoop/temp-output-s3copy-*

要通过java代码执行此操作,请执行以下操作:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import org.mortbay.log.Log;

public class FSDeletion {

  public static void main(String[] args) {

    try {
      Configuration conf = new Configuration();
      FileSystem fs = FileSystem.get(conf);

      String fsName = conf.get("fs.default.name", "localhost:9000");
      String baseDir = "/home/hadoop/";
      String outputDirPattern = fsName + baseDir + "temp-output-s3copy-";

      Path[] paths = new Path[1];
      paths[0] = new Path(baseDir);

      FileStatus[] status = fs.listStatus(paths);
      Path[] listedPaths = FileUtil.stat2Paths(status);
      for (Path p : listedPaths) {
        if (p.toString().startsWith(outputDirPattern)) {
          Log.info("Attempting to delete : " + p);
          boolean result = fs.delete(p, true);
          Log.info("Deleted ? : " + result);
        }
      }

      fs.close();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

相关问题