在hadoop中同时在同一个文件上使用两个Map器

cx6n0qe3 于 2021-06-04 发布在 Hadoop

关注(0)|答案(2)|浏览(436)

假设有一个文件和两个不同的独立Map器将在该文件上并行执行。为此，我们需要使用文件的副本。
我想知道的是“两个Map器是否可以使用同一个文件”，这反过来会降低资源利用率并提高系统的时间效率。
有没有在这方面的研究或者hadoop中现有的工具可以帮助克服这个问题。

hadoop hdfs mapreduce distributed-computing

来源：https://stackoverflow.com/questions/16768360/use-two-mappers-on-same-file-simultaneously-in-hadoop

2条答案

按热度按时间

bvk5enib1#

在高层次上，有两种情况我可以想象的问题在手。
案例1：
如果您试图在两个Map器类中编写相同的实现来处理相同的输入文件，而仅仅是为了有效地利用资源，那么这可能不是正确的方法。因为，当一个文件保存在集群中时，它会被划分成块并跨数据节点复制。这基本上为您提供了最有效的资源利用率，因为同一输入文件的所有数据块都是并行处理的。
案例2：
如果您正尝试编写两个不同的Map器实现（使用它们自己的业务逻辑），那么对于您希望根据业务需求执行的特定工作流。是的，您可以使用multipleinputs类将相同的输入文件传递给两个不同的Map器。

MultipleInputs.addInputPath(job, file1, TextInputFormat.class, Mapper1.class);
MultipleInputs.addInputPath(job, file1, TextInputFormat.class, Mapper2.class);

这只能是一个基于您想要实现的解决方案。
谢谢。

赞(0）回复(0）举报 2021-06-04

nkkqxpd92#

假设两个Map器具有相同的 K,V 签名时，可以使用委派Map器，然后调用两个Map器的Map方法：

public class DelegatingMapper extends Mapper<LongWritable, Text, Text, Text> {
    public Mapper<LongWritable, Text, Text, Text> mapper1;
    public Mapper<LongWritable, Text, Text, Text> mapper2;

    protected void setup(Context context) {
        mapper1 = new MyMapper1<LongWritable, Text, Text, Text>();
        mapper1.setup(context);

        mapper2 = new MyMapper1<LongWritable, Text, Text, Text>();
        mapper2.setup(context);
    }

    public void map(LongWritable key, Text value, Context context) {
        // your map methods will need to be public for each class
        mapper1.map(key, value, context);
        mapper2.map(key, value, context);
    }

    protected void cleanup(Context context) {
        mapper1.cleanup(context);
        mapper2.cleanup(context);
    }
}

赞(0）回复(0）举报 2021-06-04

我来回答

在hadoop中同时在同一个文件上使用两个Map器

2条答案

相关问题

热门标签

最新问答