将iterable存储到arraylist中会导致重复和缺少值

w6lpcovy 于 2021-06-02 发布在 Hadoop

关注(0)|答案(0)|浏览(223)

我正在尝试编写一个hadoopjava代码，以找到几个文件与某些关键字的相关性。当前步骤我能够输出直到=。在reducer中，我需要首先计算包含这个单词的文件的数量，并计算出特定的指标值iv <key, value>=<word@filename, iv> . 我曾经 Iterable<Text> 对于值，由于不允许在该iterable中循环两次，因此我在将值存储到arraylist中时，首先在iterable值中循环一次，以计算文件数。我使用arraylist作为第二个循环的输出。但是，输出有许多重复项，并且缺少许多值。当我将值存储到arraylist中时，代码有什么问题吗？

public void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
  // sum up the number of files containing a particular word
  int file_count = 0;
  ArrayList<Text> value_storage = new ArrayList<Text>();                        
  for (Text val : values) {
    file_count++;
    context.write(key, val);
  }
  for (int i=0; i<value_storage.size(); i++) {
    context.write(key, value_storage.get(i));
  }
}