选择哪种？

tjjdgumg 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(347)

我观察到有多种方法可以编写hadoop程序的驱动方法。
yahoo的hadoop教程中给出了以下方法

public void run(String inputPath, String outputPath) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("wordcount");

    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(MapClass.class);
    conf.setReducerClass(Reduce.class);

    FileInputFormat.addInputPath(conf, new Path(inputPath));
    FileOutputFormat.setOutputPath(conf, new Path(outputPath));

    JobClient.runJob(conf);
  }

这个方法在 Hadoop The Definitive Guide 2012 奥雷利的书。

public static void main(String[] args) throws Exception {
  if (args.length != 2) {
    System.err.println("Usage: MaxTemperature <input path> <output path>");
    System.exit(-1);
  }
  Job job = new Job();
  job.setJarByClass(MaxTemperature.class);
  job.setJobName("Max temperature");
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(MaxTemperatureMapper.class);
  job.setReducerClass(MaxTemperatureReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  System.exit(job.waitForCompletion(true) ? 0 : 1);
}

在尝试oreilly书中给出的程序时，我发现 Job 类已弃用。由于oreilly的书是基于Hadoop2（Yarn）的，我很惊讶地看到他们使用了不推荐的类。
我想知道每个人都用哪种方法？

hadoop

来源：https://stackoverflow.com/questions/16184227/multiple-ways-to-write-driver-of-hadoop-program-which-one-to-choose

2条答案

按热度按时间

rbpvctlc1#

与您的第一个（yahoo）块稍有不同-您应该使用toolrunner/tool类，这些类利用genericoptionsparser（如eswara的回答中所述）
模板模式类似于：

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ToolExample extends Configured implements Tool {

    @Override
    public int run(String[] args) throws Exception {
        // old API
        JobConf jobConf = new JobConf(getConf());

        // new API
        Job job = new Job(getConf());

        // rest of your config here

        // determine success / failure (depending on your choice of old / new api)
        // return 0 for success, non-zero for an error
        return 0;
    }

    public static void main(String args[]) throws Exception {
        System.exit(ToolRunner.run(new ToolExample(), args));
    }
}

赞(0）回复(0）举报 2021-06-03

sxissh062#

我使用前一种方法，如果我们重写run（）方法，我们可以使用hadoopjar选项，比如-d、-libjars、-files等等，所有这些在几乎任何hadoop项目中都是非常必要的。不确定是否可以通过main（）方法使用它们。

赞(0）回复(0）举报 2021-06-03