java—为什么在拆分字符串并将其重新连接在一起之后,我的reducer函数会得到不同的输出?

ht4b089n  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(183)

我知道这是个奇怪的问题。让我举个例子。我正在编写一个reducer函数,它将 Iterator 它收到的。迭代器中的字符串的格式为“%s,%s,%s”。当我这样写代码时:

  1. public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
  2. StringBuilder indexValue = new StringBuilder();
  3. while (values.hasNext()) {
  4. String data = values.next().toString();
  5. indexValue.append(data);
  6. }
  7. output.collect(key, new Text(indexValue.toString()));
  8. }

我得到的输出似乎是正确的。格式为“%s、%s、%s%s、%s、%s…”
但是,当我这样编写代码时:

  1. public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
  2. StringBuilder indexValue = new StringBuilder();
  3. while (values.hasNext()) {
  4. String data = values.next().toString();
  5. String [] parts = data.split(",");
  6. indexValue.append(parts[0] + "," + parts[1] + "," + parts[2]);
  7. }
  8. output.collect(key, new Text(indexValue.toString()));
  9. }

我得到一个完全不同的,奇怪的输出。首先,输出并不包含所有本应串联的值。其次,它的形式对我来说毫无意义。它看起来像“%s,%s,%s%s”。很明显那里有一些信息缺失。
你知道是什么引起的吗?我完全被难住了。
编辑:我被要求提供原始数据,在这里。我还将在下面提供mapper函数。
Map器函数:

  1. public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
  2. String line = value.toString();
  3. String [] parts = line.split("\t");
  4. int frequency = Integer.parseInt(parts[1]);
  5. String [] documentDataParts = parts[0].split(",");
  6. String term = documentDataParts[0];
  7. String bookFilename = documentDataParts[1];
  8. String chunk = documentDataParts[2];
  9. String documentData = bookFilename + "," + chunk + "," + frequency;
  10. output.collect(new Text(term), new Text(documentData));
  11. }

数据样本:

  1. Ages,LesMiserablesbyVictorHugo.txt,5545 1
  2. Aggeus,LeviathanbyThomasHobbes.txt,1268 1
  3. Aggravateth,LeviathanbyThomasHobbes.txt,995 1
  4. Aggravateth,LeviathanbyThomasHobbes.txt,999 1
  5. Aggravation,LeviathanbyThomasHobbes.txt,1015 1
  6. Aggravation,LeviathanbyThomasHobbes.txt,1691 1
  7. Aggregate,LeviathanbyThomasHobbes.txt,1293 1
  8. Agier,LesMiserablesbyVictorHugo.txt,2790 1
  9. Agincourt,LesMiserablesbyVictorHugo.txt,1510 1
  10. Agn,LesMiserablesbyVictorHugo.txt,5114 1
  11. Agnes,LesMiserablesbyVictorHugo.txt,6450 1
  12. Agnese,LesMiserablesbyVictorHugo.txt,580 1
  13. Agnus,UlyssesbyJamesJoyce.txt,1827 1

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题