运行mapreduce程序以获取单词输入列表

bvjveswy 于 2021-05-30 发布在 Hadoop

关注(0)|答案(0)|浏览(271)

我有words.txt，一个包含大量独特单词的文件，如下所示

book
apple
football
camera
playing
mac
Google
samsung
..

我还设计了一个MapReduce程序来记录和计算在一个大的corpus.txt文件中，每行出现了多少个带有“google”的单词。
例如，假设以下语料库：

.......
........
Google receives more than 345 million
Google handled 345 million
........
.......

程序输出为：

[Google,receives]     1
[Google,more]         1
[Google,than]         1
[Google,million]      2
[Google,345]          2
[Google,handled]      1

这个程序只需一个字就行了。但问题是如何在words.txt文件中运行整个单词列表的程序？
我的意思是，我应该为列表中的每个单词运行mapreduce作业，还是有其他方法来完成它？

目前还没有任何答案，快来回答吧！

相关问题