java在arraylist中缓存iterable以在reducer中迭代两次是行不通的

uinbv5nw  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(478)

我的mr程序有一些奇怪的问题,不知道为什么会这样。也许你能给我一个提示有什么问题?
我的Map器函数就是这样的:

  1. Integer Click_ID = 0;
  2. public void map(LongWritable key, Text value, Context context)
  3. throws IOException , InterruptedException
  4. {
  5. String line = value.toString();
  6. String []lineArr = line.split("\t");
  7. String nm_uv_id = lineArr[0];
  8. String session_id = lineArr[1];
  9. String time_stamp = lineArr[2];
  10. String click_counter = lineArr[3];
  11. String is_robot = lineArr[4];
  12. Click_ID++;
  13. String full_line = Click_ID + "\t"+ nm_uv_id +"\t"+ session_id+"\t"+time_stamp+"\t"+click_counter+"\t"+ is_robot;
  14. context.write(new Text(session_id), new Text(full_line));
  15. }

到目前为止,一切正常-当我设置reducer的数量=0时,我的Map器生成预期的输出。
这是我的减速机的样子。我想做的是,对我的每个键迭代两次。为此,我尝试将iterable的每个值缓存在一个单独的arraylist中:

  1. public void reduce(Text key, Iterable<Text> values, Context context)
  2. throws IOException, InterruptedException {
  3. List<Text> cache = new ArrayList<Text>();
  4. // first iterable
  5. for (Text value : values) {
  6. cache.add(value); }
  7. //second iterable
  8. for (Text entity : cache) {
  9. context.write(key, entity); }
  10. }

}
我用于mr的输入如下所示:

  1. nm_uv_id_1 session_id_2 1234567891 1 is_robot_no
  2. nm_uv_id_1 session_id_2 1234567892 2 is_robot_no
  3. nm_uv_id_1 session_id_2 1234567893 3 is_robot_no
  4. nm_uv_id_1 session_id_2 1234567894 3 is_robot_no
  5. nm_uv_id_1 session_id_1 1234567895 1 is_robot_no
  6. nm_uv_id_1 session_id_1 1234567896 2 is_robot_no
  7. nm_uv_id_1 session_id_1 1234567897 3 is_robot_no
  8. nm_uv_id_1 session_id_1 1234567898 4 is_robot_no
  9. nm_uv_id_1 session_id_1 1234567899 5 is_robot_no
  10. nm_uv_id_1 session_id_1 1234567888 6 is_robot_no
  11. nm_uv_id_1 session_id_1 1234567890 7 is_robot_no
  12. nm_uv_id_1 session_id_1 1234567890 8 is_robot_no
  13. nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  14. nm_uv_id_1 session_id_1 1234567890 10 is_robot_no
  15. nm_uv_id_1 session_id_3 1234567890 1 is_robot_no
  16. nm_uv_id_2 session_id_4 1234587890 1 is_robot_no
  17. nm_uv_id_2 session_id_4 1234587890 2 is_robot_no
  18. nm_uv_id_2 session_id_4 1234587890 3 is_robot_no
  19. nm_uv_id_2 session_id_4 1234587890 4 is_robot_no
  20. nm_uv_id_2 session_id_4 1234587890 5 is_robot_no
  21. nm_uv_id_2 session_id_4 1234587890 6 is_robot_no
  22. nm_uv_id_2 session_id_4 1234587890 7 is_robot_no
  23. nm_uv_id_2 session_id_4 1234587890 8 is_robot_no
  24. nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  25. nm_uv_id_2 session_id_5 1234587890 1 is_robot_no
  26. nm_uv_id_2 session_id_5 1234587890 2 is_robot_no
  27. nm_uv_id_2 session_id_5 1234587890 3 is_robot_yes
  28. nm_uv_id_2 session_id_5 1234587890 4 is_robot_yes
  29. nm_uv_id_2 session_id_5 1234587890 5 is_robot_no
  30. nm_uv_id_2 session_id_5 123457890 6 is_robot_no

但是,我的输出文件如下所示:

  1. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  2. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  3. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  4. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  5. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  6. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  7. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  8. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  9. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  10. session_id_1 13 nm_uv_id_1 session_id_1 1234567890 9 is_robot_no
  11. session_id_2 2 nm_uv_id_1 session_id_2 1234567892 2 is_robot_no
  12. session_id_2 2 nm_uv_id_1 session_id_2 1234567892 2 is_robot_no
  13. session_id_2 2 nm_uv_id_1 session_id_2 1234567892 2 is_robot_no
  14. session_id_2 2 nm_uv_id_1 session_id_2 1234567892 2 is_robot_no
  15. session_id_3 15 nm_uv_id_1 session_id_3 1234567890 1 is_robot_no
  16. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  17. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  18. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  19. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  20. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  21. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  22. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  23. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  24. session_id_4 24 nm_uv_id_2 session_id_4 1234587890 9 is_robot_no
  25. session_id_5 30 nm_uv_id_2 session_id_5 123457890 6 is_robot_no
  26. session_id_5 30 nm_uv_id_2 session_id_5 123457890 6 is_robot_no
  27. session_id_5 30 nm_uv_id_2 session_id_5 123457890 6 is_robot_no
  28. session_id_5 30 nm_uv_id_2 session_id_5 123457890 6 is_robot_no
  29. session_id_5 30 nm_uv_id_2 session_id_5 123457890 6 is_robot_no
  30. session_id_5 30 nm_uv_id_2 session_id_5 123457890 6 is_robot_no

我不明白为什么减速机总是为一个特定的键写相同的键值对。我尝试了几件事,似乎第一个for循环,在那里我做缓存工作良好。当我编写context.write(key,value)时,我得到了预期的输出。然而第二,当我想使用第二个循环中的缓存时,程序为我写了一些奇怪的东西。
有人能帮忙吗?

aoyhnmkz

aoyhnmkz1#

它是重复使用相同的 Text 作为优化的缓冲区。因此,您需要手动克隆以缓存它。
我会改变你的缓存循环:

  1. for (Text value : values) { cache.add(new Text(value)); }

相关问题