mapreduce作业：奇怪的输出？

sauutmhj 于 2021-05-30 发布在 Hadoop

关注(0)|答案(2)|浏览(410)

我正在写我的第一份mapreduce工作。简单的方法是：从文件中计算字母数字字符。我已经完成了生成jar文件并运行它，但是除了调试输出之外，我找不到mr作业的输出。你能帮帮我吗？
我的应用程序类：

import CharacterCountMapper;
import CharacterCountReducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class CharacterCountDriver extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        // Create a JobConf using the processed configuration processed by ToolRunner
        Job job = Job.getInstance(getConf());
        // Process custom command-line options
        Path in = new Path("/tmp/filein");
        Path out = new Path("/tmp/fileout");
        // Specify various job-specific parameters     
        job.setJobName("Character-Count");
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setMapperClass(CharacterCountMapper.class);
        job.setReducerClass(CharacterCountReducer.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.setInputPaths(job, in);
        FileOutputFormat.setOutputPath(job, out);
        job.setJarByClass(CharacterCountDriver.class);
        job.submit();
        return 0;
    }
    public static void main(String[] args) throws Exception {
        // Let ToolRunner handle generic command-line options 
        int res = ToolRunner.run(new Configuration(), new CharacterCountDriver(), args);
        System.exit(res);
      }
}

然后我的mapper类：

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class CharacterCountMapper extends
        Mapper<Object, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        String strValue = value.toString();
        StringTokenizer chars = new StringTokenizer(strValue.replaceAll("[^a-zA-Z0-9]", ""));
        while (chars.hasMoreTokens()) {
            context.write(new Text(chars.nextToken()), one);
        }
    }
}

以及减速器：

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class CharacterCountReducer extends
        Reducer<Text, IntWritable, Text, IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int charCount = 0;
        for (IntWritable val: values) {
            charCount += val.get();
        }
        context.write(key, new IntWritable(charCount));
    }
}

看起来不错，我从ide生成了可运行的jar文件，并按如下方式执行：

$ ./hadoop jar ~/Desktop/example_MapReduce.jar no.hib.mod250.hadoop.CharacterCountDriver
14/11/27 19:36:42 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/11/27 19:36:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/11/27 19:36:42 INFO input.FileInputFormat: Total input paths to process : 1
14/11/27 19:36:42 INFO mapreduce.JobSubmitter: number of splits:1
14/11/27 19:36:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local316715466_0001
14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/staging/roberto316715466/.staging/job_local316715466_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/staging/roberto316715466/.staging/job_local316715466_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/local/localRunner/roberto/job_local316715466_0001/job_local316715466_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/local/localRunner/roberto/job_local316715466_0001/job_local316715466_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/11/27 19:36:43 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/11/27 19:36:43 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/11/27 19:36:43 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
14/11/27 19:36:43 INFO mapred.LocalJobRunner: Waiting for map tasks
14/11/27 19:36:43 INFO mapred.LocalJobRunner: Starting task: attempt_local316715466_0001_m_000000_0
14/11/27 19:36:43 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
14/11/27 19:36:43 INFO mapred.MapTask: Processing split: file:/tmp/filein:0+434
14/11/27 19:36:43 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

那么我猜我的输出文件将在/tmp/fileout中。但事实上，它似乎是空的：

$ tree /tmp/fileout/
/tmp/fileout/
└── _temporary
    └── 0
2 directories, 0 files

有什么我不知道的吗？有人能帮我吗？
问候：-）
编辑：
我差一点在另一个帖子上找到了解决办法。
在charactercountdriver中，我将job.submit（）替换为job.waitforcompletion（true）。我得到了更详细的输出：

/tmp/fileout/
├── part-r-00000
└── _SUCCESS
0 directories, 2 files

但我还是不知道该怎么读，“成功是空的，r-0000部分不是我所期望的：

Absorbantandyellowandporousishe 1
AreyoureadykidsAyeAyeCaptain    1
ICanthearyouAYEAYECAPTAIN       1
Ifnauticalnonsensebesomethingyouwish    1
Ohh     1
READY   1
SPONGEBOBSQUAREPANTS    1
SpongebobSquarepants    3
Spongebobsquarepants    4
Thendroponthedeckandfloplikeafish       1
Wholivesinapineappleunderthesea 1

有什么建议吗？我的密码有错吗？谢谢。

Java hadoop hdfs

来源：https://stackoverflow.com/questions/27177157/mapreduce-job-weird-output

2条答案

按热度按时间

bgibtngc1#

如果我理解正确，您希望您的程序计算输入文件中的字母数字字符。但是，这不是您的代码所做的。您可以更改Map器以计算每行中的字母数字字符：

String strValue = value.toString();
strValue.replaceAll("[^a-zA-Z0-9]", "");
context.write(new Text("alphanumeric", strValue.length());

这将修复你的程序。基本上，Map器输出每行中的字母数字字符作为键。减速机累加每个键的计数。我的零钱，你只用一个键：“字母数字”。关键可能是别的东西，它仍然有效。

赞(0）回复(0）举报 2021-05-30

2ic8powd2#

part-r-00000是减速器输出文件的名称。如果你有更多的减速机，他们将被编号为r-00001，以此类推。

赞(0）回复(0）举报 2021-05-30

我来回答

mapreduce作业：奇怪的输出？

2条答案

相关问题

热门标签

最新问答