分布式缓存中的hadoop访问路径变量

ca1c2owp 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(536)

我正在尝试访问 Path 分布式缓存中的变量。

//Job 1
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(MINMAX));
//job 2
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path(args[1]));

在驱动器中 DistributedCache.addCacheFile(new Path(MINMAX).toUri(),conf); 以及
在setup（）中

Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
BufferedReader bf = new BufferedReader(new InputStreamReader(fs.open(cacheFiles[0])));

但是展示

java.lang.Exception: java.lang.NullPointerException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NullPointerException

我做错什么了吗。
请建议。

hadoop mapreduce distributed-cache Path

来源：https://stackoverflow.com/questions/23987574/access-path-variable-in-distributed-cache

1条答案

按热度按时间

xqkwcwgp1#

我找到了答案

//Job 1
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(MINMAX));
//job 2
Path prevJob = new Path(new Path(MINMAX), "part-r-[0-9]*");
FileStatus [] list = fs.globStatus(prevJob);
for (FileStatus status : list) {
     DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path(args[1]));

并在setup方法中访问该文件

Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
BufferedReader bf = new BufferedReader(new InputStreamReader(
        fs.open(cacheFiles[0])));

展开查看全部

赞(0）回复(0）举报 2021-06-03

我来回答

分布式缓存中的hadoop访问路径变量

1条答案

相关问题

热门标签

最新问答