bigdata hadoop java codeforwordcount已修改

yr9zkbsy 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(301)

我必须修改hadoop的wordcount示例，计算以前缀“cons”开头的单词数，然后需要按频率降序对结果进行排序。有人能告诉我怎么写这个的Map器和缩减器代码吗？
代码：

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> 
{ 
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException 
    { 
        //Replacing all digits and punctuation with an empty string 
        String line =  value.toString().replaceAll("\\p{Punct}|\\d", "").toLowerCase();
        //Extracting the words 
        StringTokenizer record = new StringTokenizer(line); 
        //Emitting each word as a key and one as itsvalue 
        while (record.hasMoreTokens()) 
            context.write(new Text(record.nextToken()), new IntWritable(1)); 
    } 
}

Java hadoop hadoop2 bigdata

来源：https://stackoverflow.com/questions/26170827/bigdata-hadoop-java-codefor-wordcount-modified

1条答案

按热度按时间

ulydmbyx1#

要计算以“cons”开头的单词数，您可以在从mapper发出时丢弃所有其他单词。

public void map(Object key, Text value, Context context) throws IOException,
        InterruptedException {
    IntWritable one = new IntWritable(1);
    String[] words = value.toString().split(" ");
    for (String word : words) {
        if (word.startsWith("cons"))
              context.write(new Text("cons_count"), one);
    }
}

reducer现在只接收一个key=cons\u count，您可以对这些值求和以获得计数。
要根据频率对中以“cons”开头的单词进行排序，以cons开头的单词应转到同一个reducer，reducer应将其汇总并排序。要做到这一点，

public class MyMapper extends Mapper<Object, Text, Text, Text> {

@Override
public void map(Object key, Text value, Context output) throws IOException,
        InterruptedException {
      String[] words = value.toString().split(" ");
      for (String word : words) {
        if (word.startsWith("cons"))
              context.write(new Text("cons"), new Text(word));
    }
 }
}

减速器：

public class MyReducer extends Reducer<Text, Text, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<Text> values, Context output)
        throws IOException, InterruptedException {
    Map<String,Integer> wordCountMap = new HashMap<String,Integer>();
    for(Text value: values){
        word = value.get();
        if (wordCountMap.contains(word) {
           Integer count = wordCountMap.get(key);
           count++;
           wordCountMap.put(word,count)
        }else {
         wordCountMap.put(word,new Integer(1));
        }
    }

    //use some sorting mechanism to sort the map based on values.
    // ...

    for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
        context.write(new Word(entry.getKey(),new IntWritable(entry.getValue());
    } 
}

}

赞(0）回复(0）举报 2021-06-03

我来回答

bigdata hadoop java codeforwordcount已修改

1条答案

相关问题

热门标签

最新问答