java—如何在mapreduce程序中遍历文本值的迭代器两次？

fgw7neuy 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(251)

在我的mapreduce程序中，我有一个reducer函数，它计算文本值迭代器中的项数，然后为迭代器中的每个项输出项作为键，计数作为值。因此我需要使用迭代器两次。但是一旦迭代器到达了末尾，我就不能从第一个开始迭代了。如何解决这个问题？我为reduce函数尝试了以下代码：

public static class ReduceA extends MapReduceBase implements Reducer<Text, Text, Text, Text>
{

        public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text>output, Reporter reporter) throws IOException 
        {
            Text t;
            int count =0;                
            String[] attr = key.toString().split(",");      

           while(values.hasNext())               
            {
                values.next();
                count++;

            }

           //Maybe i need to reset my iterator here and start from the beginning but how do i do it?

           String v=Integer.toString(count);
           while(values.hasNext())               
            {
                t=values.next();

                output.collect(t,new Text(v));
            }
        }  
      }

上面的代码产生了空结果，我尝试过在一个列表中插入迭代器的值，但是由于我需要处理很多gbs的数据，我得到了使用列表的java堆空间错误。请帮我修改代码，这样我就可以遍历迭代器两次。

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/23108910/how-to-traverse-an-iterator-of-text-values-twice-in-a-mapreduce-program

1条答案

按热度按时间

jq6vz3qz1#

您可以用简单的方法来完成：声明一个列表，并在第一次迭代时缓存该值。因此，您可以遍历列表并写出输出。你应该有类似的东西：

public static class ReduceA extends MapReduceBase implements
    Reducer<Text, Text, Text, Text> {

public void reduce(Text key, Iterator<Text> values,
        OutputCollector<Text, Text> output, Reporter reporter)
        throws IOException {
    Text t;
    int count = 0;
    String[] attr = key.toString().split(",");
    List<Text> cache = new ArrayList<Text>();

    while (values.hasNext()) {
        cache.add(values.next());
        count++;

    }

    // Maybe i need to reset my iterator here and start from the beginning
    // but how do i do it?

    String v = Integer.toString(count);
    for (Text text : cache) {
        output.collect(text, new Text(v));
    }
}
}

赞(0）回复(0）举报 2021-06-04

我来回答

java—如何在mapreduce程序中遍历文本值的迭代器两次？

1条答案

相关问题

热门标签

最新问答