hadoop路径添加%2f

8wtpewkr  于 2021-06-04  发布在  Hadoop
关注(0)|答案(2)|浏览(326)

我在hadoop中有一个文件:/home/hduser/ih/input/imageslocalpaths.txt(我已经用hadoop fs-ls ih/input/imageslocalpaths.txt检查过了)。当我跑步时:

hadoop jar IH.jar IH/input/imageslocalpaths.txt

我得到:

Input path does not exist: hdfs://localhost:54310/user/hduser/IH%2Finput%2Fimageslocalpaths.txt

有人能告诉我如何阻止hadoop将斜杠更改为%2f或其他解决方法吗?
(我尝试了完整的路径,但hadoop只是将它添加到/user/hduser giving/user/hduser/user/hduser的末尾。。。还有%2f)。
因为这里是我的主要要求(你想要其他位?)

public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Configuration conf2 = new Configuration();

        conf.set("fs.defaultFS", "hdfs://localhost:54310");

        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        Job job1 = new Job(conf, "MergeImages");

        job1.setJarByClass(ImageHandlerMain.class);
        job1.setMapperClass(BinaryFilesToHadoopSequenceFileMapper.class);
        job1.setOutputKeyClass(Text.class);
        job1.setOutputValueClass(BytesWritable.class);

        FileInputFormat.addInputPath(job1, new Path(URLEncoder.encode(otherArgs[0],"UTF-8")));
        job1.setInputFormatClass(TextInputFormat.class);     

        FileOutputFormat.setOutputPath(job1, new Path(URLEncoder.encode(otherArgs[1],"UTF-8"))); //put result into intermediate folder
        job1.setInputFormatClass(TextInputFormat.class);
        job1.setOutputFormatClass(SequenceFileOutputFormat.class);
        ControlledJob cJob1 = new ControlledJob(conf);
        cJob1.setJob(job1);

        Job job2 = new Job(conf2,"FindDuplicates");

        job2.setJarByClass(ImageHandlerMain.class);
        job2.setMapperClass(ImagePHashMapper.class); 
        job2.setReducerClass(ImageDupsReducer.class);
        job2.setOutputKeyClass(Text.class);
        job2.setOutputValueClass(Text.class);        
        FileInputFormat.addInputPath(job2, new Path(URLEncoder.encode(otherArgs[1],"UTF-8") + "/part-r-00000")); //get the part-r-00000 file from the intermediate folder
        FileOutputFormat.setOutputPath(job2, new  Path(otherArgs[2])); //put result into output folder
        job2.setInputFormatClass(SequenceFileInputFormat.class);
        ControlledJob cJob2 = new ControlledJob(conf2);
        cJob2.setJob(job2);
        JobControl jobctrl = new JobControl("jobctrl");
        jobctrl.addJob(cJob1);
        jobctrl.addJob(cJob2);
        cJob2.addDependingJob(cJob1);
        jobctrl.run();

}
irtuqstp

irtuqstp1#

问题在这行代码中

FileInputFormat.addInputPath(job2, new Path(URLEncoder.encode(otherArgs[1],"UTF-8") + "/part-r-00000")); //get the part-r-00000 file from the intermediate folder

在这里,当您使用urlencoder.encode创建路径时,它正在将“/”转换为%2f。
可能的解决方案

FileInputFormat.addInputPath(job2, new Path(URLEncoder.encode(otherArgs[1],"UTF-8").replace("%2F", "/") + "/part-r-00000")); //get the part-r-00000 file from the intermediate folder

编码后,用replace method back to“/”替换回“%2f”。

atmip9wb

atmip9wb2#

我不确定问题可能来自何处,但请尝试检查以下内容:
解析中的参数后,检查url格式是否正确
string[]otherargs=new genericoptionsparser(conf,args).getremainingargs();
尝试创建不带url编码器的路径,如下所示:
fileinputformat.setinputpaths(作业,新路径(inputlocation))//其中inputlocation只是一个字符串

相关问题