如何删除合并器输出并在mapreduce final输出中只保留reducer输出

jdg4fx2g 于 2021-05-29 发布在 Hadoop

关注(0)|答案(2)|浏览(402)

嗨，我正在运行一个应用程序，从hbase读取记录并写入文本文件。
我在应用程序中使用了组合器，也使用了自定义分区器。我在应用程序中使用了41 reducer，因为我需要创建40 reducer输出文件，以满足自定义分区器中的条件。
所有工作正常，但当我在我的应用程序中使用组合器，它创建每个区域或每个Map器的Map输出文件。
例如，我的应用程序中有40个区域，因此启动了40个Map程序，然后创建了40个Map输出文件。但是reducer不能合并所有的map输出并生成最终的reducer输出文件，该文件将是40个reducer输出文件。
文件中的数据是正确的，但没有任何文件增加。
你知道我怎么才能只得到减速机输出文件吗。

import java.io.IOException;
import org.apache.log4j.Logger;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
public class CommonCombiner extends Reducer<NullWritable, Text, NullWritable, Text> {
    private Logger logger = Logger.getLogger(CommonCombiner.class);
    private MultipleOutputs<NullWritable, Text> multipleOutputs;
    String strName = "";
    private static final String DATA_SEPERATOR = "\\|\\!\\|";
    public void setup(Context context) {
        logger.info("Inside Combiner.");
        multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
    }
    @Override
    public void reduce(NullWritable Key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        for (Text value : values) {
            final String valueStr = value.toString();
            StringBuilder sb = new StringBuilder();
            if ("".equals(strName) && strName.length() == 0) {
                String[] strArrFileName = valueStr.split(DATA_SEPERATOR);
                String strFullFileName[] = strArrFileName[1].split("\\|\\^\\|");
                strName = strFullFileName[strFullFileName.length - 1];
                String strArrvalueStr[] = valueStr.split(DATA_SEPERATOR);
                if (!strArrvalueStr[0].contains(HbaseBulkLoadMapperConstants.FF_ACTION)) {
                    sb.append(strArrvalueStr[0] + "|!|");
                }
                multipleOutputs.write(NullWritable.get(), new Text(sb.toString()), strName);
                context.getCounter(Counters.FILE_DATA_COUNTER).increment(1);
            }
        }
    }
    public void cleanup(Context context) throws IOException, InterruptedException {
        multipleOutputs.close();
    }
}

hadoop mapreduce hadoop2

来源：https://stackoverflow.com/questions/43138492/how-to-remove-combiner-output-and-keep-only-reducer-output-in-mapreduce-final-ou

2条答案

按热度按时间

cig3rfwq1#

让我们把基本问题弄清楚
合并器是一种优化，既可以在Map器上运行，也可以在reduce（reduce的合并阶段）（fetch-merge-reduce阶段）中运行。
找出密钥在数据中的分布情况，给定的Map器是否访问同一批密钥如果是的话，那么combiner就是在帮助别人它没有效果。
1k个区域没有保证它们是平等划分的。你有一些很热的地方
找到热点区域并分开。
请注意：http://bytepadding.com/big-data/map-reduce/understanding-map-reduce-the-missing-guide/

赞(0）回复(0）举报 2021-05-29

xu3bshqb2#

您没有从合并器输出任何数据以供减速机使用。在您的组合器中，您正在使用： multipleOutputs.write(NullWritable.get(), new Text(sb.toString()), strName); 这并不是你如何写出数据，以便在两个阶段之间使用，即从Map器或组合器到reduce阶段。您应该使用： context.write() 在需要多个文件的地方，多路输出只是一种将额外文件写入磁盘的方法。我从没见过它用在合路器上。

赞(0）回复(0）举报 2021-05-29

我来回答

如何删除合并器输出并在mapreduce final输出中只保留reducer输出

2条答案

相关问题

热门标签

最新问答