使用分布式缓存读取文件

brjng4g3 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(529)

我在分布式缓存中存储了很多文件，每个文件对应一个用户id。我想将一个特定的文件附加到一个特定的reduce任务中，该文件对应于一个特定的用户id（它将是reducer的键）。但是我不能这样做，因为我使用configure方法从分布式缓存中读取文件，它位于reduce类中reduce方法之前。所以我不能访问reduce类的configure方法中reduce方法的键，因此不能只读取我想要的文件。请帮帮我。

class reduce{

void configure(args)
{

/*I can a particular file from the Path[] here.
I want to select the  file corresponding to the key of the reduce method and pass its
contents to the reduce method. I am not able to do this as I can't access the key of 
the reduce method.*/

}

void reduce(args)
{
}

}

hadoop mapreduce distributed-caching

来源：https://stackoverflow.com/questions/12589899/reading-files-using-distributed-cache

1条答案

按热度按时间

tzdcorbm1#

一个解决方案是分配 Path 按照distributedcachejavadocs中的描述，在配置步骤中从distributedcache到类变量的数组。当然，用reduce代码替换map代码。
这是使用旧的api，它看起来像您的代码正在使用。

public static class MapClass extends MapReduceBase  
 implements Mapper<K, V, K, V> {

   private Path[] localArchives;
   private Path[] localFiles;

   public void configure(JobConf job) {
     // Get the cached archives/files
     localArchives = DistributedCache.getLocalCacheArchives(job);
     localFiles = DistributedCache.getLocalCacheFiles(job);
   }

   public void map(K key, V value, 
                   OutputCollector<K, V> output, Reporter reporter) 
   throws IOException {
     // Use data from the cached archives/files here
     // ...
     // ...
     output.collect(k, v);
   }
 }

赞(0）回复(0）举报 2021-06-03

我来回答

使用分布式缓存读取文件

1条答案

相关问题

热门标签

最新问答