hadoop mapreduce分布式缓存使用率

b1zrtrql 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(485)

我试图重现mapreduce设计模式手册中的bloom过滤示例。
在下面，我将只显示感兴趣的代码：

public static class BloomFilteringMapper extends Mapper<Object, Text, Text, NullWritable>
{
    private BloomFilter filter = new BloomFilter();
    protected void setup( Context context ) throws IOException
    {
        URI[] files = DistributedCache.getCacheFiles( context.getConfiguration() );
        String path = files[0].getPath();
        System.out.println( "Reading Bloom Filter from: " + path );
        DataInputStream strm = new DataInputStream( new FileInputStream( path ) );
        filter.readFields( strm );
        strm.close();
    }
    //...
}
public static void main( String[] args ) throws Exception
{
    Job job = new Job( new Configuration(), "description" );
    URI uri = new URI("hdfs://localhost:9000/user/draxent/comment.bloomfilter");
    DistributedCache.addCacheFile( uri, job.getConfiguration() );
    //...
}

当我尝试执行它时，我收到以下错误：
java.io.filenotfoundexception:/user/draxent/comment.bloomfilter
但是执行命令：

bin/hadoop fs -ls

我可以看到文件：

-rw-r--r--   1 draxent supergroup        405 2015-11-25 17:12 /user/draxent/comment.bloomfilter

所以我确信问题就在眼前：

URI uri = new URI("hdfs://localhost:9000/user/draxent/comment.bloomfilter");

但我尝试了几种不同的配置，比如：
"hdfs://user/draxent/comment.bloomfilter"
“/user/draxent/comment.bloomfilter”
“注解.过滤器”
没有人工作。
我试着查看cfeduke的实现，但是我没能解决我的问题。
回答意见：
ravindra:uri文件[0]包含在main中传递的字符串元素；
曼朱纳斯：是的，你说得对。但由于文件存在（您可以从bin/hadoop fs-ls中看到），这意味着传递给fileinputstream的字符串路径存在问题。但我会像往常一样把绳子传给它。我查过了，路径值是：comment.bloomfilter。。。所以它必须是对的。

Java hadoop mapreduce distributed-caching bloom-filter

来源：https://stackoverflow.com/questions/33922318/hadoop-mapreduce-distributedcache-usage

2条答案

按热度按时间

cgvd09ve1#

分布式缓存api已被弃用。
您可以使用新的api扩展相同的功能。请查看此处的文档：http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/job.html
在驱动程序中code:- Job job = new Job(); ... job.addCacheFile(new Path(filename).toUri()); 在mapper设置方法中：-

Path[] localPaths = context.getLocalCacheFiles();

赞(0）回复(0）举报 2021-06-02

e4yzc0pl2#

以下操作应起作用：用 URI uri = new URI(... 将下一行改为：

DistributedCache.addCacheFile(new Path("/user/draxent/comment.bloomfilter").toUri(), job.getConfiguration());

赞(0）回复(0）举报 2021-06-02

我来回答

hadoop mapreduce分布式缓存使用率

2条答案

相关问题

热门标签

最新问答