查找字符串中最长的单词

e0bqpujr  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(255)

我在用r hadoop。我有一个字符串,将每个单词设置为键,将其长度设置为关联值。如何从mapreduce中找到最长的单词?

Sys.setenv("HADOOP_CMD"="/home/hadoop/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.1.jar")
Sys.setenv("HADOOP_HOME"="/home/hadoop/hadoop")
Sys.setenv(JAVA_HOME="/usr/java/latest")

library(rhdfs)
library(rmr2)
hdfs.init()

line = "It's Supercalifragilisticexpialidocious!
Even though the sound of it
Is something quite atrocious
If you say it loud enough
You'll always sound precocious
Supercalifragilisticexpialidocious!"

to.dfs(line, output='/home/m072040031/small_doc.txt',
       format="text")

wordcount = function(input,
                     output,
                     pattern = '[[:punct:][:space:][:digit:]]+'){
  mapreduce(input = input,
            output = output,
            input.format= "text",
            map = function(k,
                           lines){
              v = unlist(strsplit(lines,
                                   split= pattern))
              keyval(v,
                     nchar(v))},
            reduce = function(word,
                              count){
              keyval(word,
                     count)}
            )
}

wordcount("/home/m072040031/small_doc.txt",
          output = "/home/m072040031/small_doc_wc.RData")

info=from.dfs("/home/m072040031/small_doc_wc.RData")

info

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题