如何高效地查找top-k元素？

jogvjijk 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(469)

我有一个大的序列文件存储文档的tfidf值。每行表示行，列是每个项的tfidfs值（行是稀疏向量）。我想使用hadoop为每个文档挑选前k个单词。最简单的解决方案是循环遍历Map器中每一行的所有列并选择top-k，但是随着文件越来越大，我认为这不是一个好的解决方案。在hadoop中有更好的方法吗？

hadoop mapreduce tf-idf

来源：https://stackoverflow.com/questions/30762600/how-to-efficiently-find-top-k-elements

1条答案

按热度按时间

lnxxn5zx1#

1. In every map calculate TopK (this is local top K for each map)
 2. Spawn a signle reduce , now top K from all mappers will flow to this reducer and hence global Top K will be evaluated.

把问题想成

1. You have been given the results of X number of horse races. 
 2. You need to find Top N fastest horse.

赞(0）回复(0）举报 2021-05-30

我来回答

如何高效地查找top-k元素？

1条答案

相关问题

热门标签

最新问答