在mapreduce中，如何在处理x记录后停止reducer

rqmkfv5c 于 2021-06-04 发布在 Hadoop

关注(0)|答案(1)|浏览(366)

我正在使用mapper加载大量的数据，这些数据有执行时间和一个与之关联的大型查询。。我只需要找到1000个最昂贵的查询，所以我将执行时间作为键输入Map器的输出。我使用1个reducer，只需要写1000条记录，reducer停止处理。
我可以有一个全局计数器，如果（count<1000）{context.write（key，value）}
但这仍将加载所有数十亿条记录，然后不写入它们。
我要减速机在吐出1000张唱片后停止。通过避免下一组记录的寻道时间和读取时间，实现了该方法。
这可能吗？？

hadoop mapreduce reducers Mapper

来源：https://stackoverflow.com/questions/17285857/in-mapreduce-how-do-you-stop-a-reducer-after-processing-x-records

1条答案

按热度按时间

qvtsj1bj1#

您可以通过重写 Reducer.run() 方法：

public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  while (context.nextKey()) {
    reduce(context.getCurrentKey(), context.getValues(), context);
  }
  cleanup(context);
}

您应该能够修改while循环以包含计数器，如下所示：

public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  int count = 0;
  while (context.nextKey() && count++ < 1000) {
    reduce(context.getCurrentKey(), context.getValues(), context);
  }
  cleanup(context);
}

并不是说这不一定会输出最上面的记录，只输出前1000个键控记录（如果reduce实现输出的记录多于一个记录，那么就不起作用——在这种情况下，您可以在reduce方法中增加计数器）

赞(0）回复(0）举报 2021-06-04

我来回答

在mapreduce中，如何在处理x记录后停止reducer

1条答案

相关问题

热门标签

最新问答