输出目录未在jobconf中设置

r7xajy2e 于 2021-06-03 发布在 Hadoop

关注(0)|答案(8)|浏览(278)

我在下面提到一个简单的mapr程序的驱动程序代码

import org.apache.hadoop.fs.Path;
   import org.apache.hadoop.io.IntWritable;
   import org.apache.hadoop.io.Text;
   import org.apache.hadoop.mapred.JobClient;
   import org.apache.hadoop.mapred.JobConf;
   import org.apache.hadoop.mapreduce.Job;
   import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
   import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

  @SuppressWarnings("deprecation")
  public class CsvParserDriver {
      @SuppressWarnings("deprecation")
      public static void main(String[] args) throws Exception
      {
          if(args.length != 2)
          {
              System.out.println("usage: [input] [output]");
              System.exit(-1);
          }

          JobConf conf = new JobConf(CsvParserDriver.class);
          Job job = new Job(conf);
          conf.setJobName("CsvParserDriver");

          FileInputFormat.setInputPaths(job, new Path(args[0]));
          FileOutputFormat.setOutputPath(job, new Path(args[1]));

          conf.setMapperClass(CsvParserMapper.class);
          conf.setMapOutputKeyClass(IntWritable.class);
          conf.setMapOutputValueClass(Text.class);

          conf.setReducerClass(CsvParserReducer.class);
          conf.setOutputKeyClass(Text.class);
          conf.setOutputValueClass(Text.class);

          conf.set("splitNode","NUM_AE");

          JobClient.runJob(conf);
      }
  }

我正在使用下面的命令运行我的代码

hadoop jar CsvParser.jar CsvParserDriver /user/sritamd/TestData /user/sritamd/output

（创建上述命令中所有相应的jar和目录）
我得到的错误是

Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.

hadoop mapreduce Map

来源：https://stackoverflow.com/questions/12087423/output-directory-not-set-in-jobconf

8条答案

按热度按时间

yyhrrdl81#

试试这个

Configuration configuration = new Configuration();
 Job job = new Job(configuration, "MyConfig");

然后

FileInputFormat.setInputPaths(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));

赞(0）回复(0）举报 2021-06-03

6qftjkof2#

我认为您需要将输入和输出目录设置为conf，而不是像这样的作业：

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

赞(0）回复(0）举报 2021-06-03

wgx48brx3#

首先确保您的目录不存在。如果存在，删除它。
第二次在eclipse中运行您的代码，如果它运行正常并且 ArrayOutofBounds 警告。
否则，请检查插入的库，确保插入所有客户端库或检查类是否在包中。
如果上述所有条件都满足，您的工作将执行。

赞(0）回复(0）举报 2021-06-03

jvidinwx4#

我也有同样的问题，但已经解决了。我曾经 job.waitForCompletion(true) 在使用时导致hbase上的spark崩溃 saveAsNewAPIHadoopFile(...) .a您不应该等待您的作业，因为它使用的是旧的hadoop api而不是新的api

赞(0）回复(0）举报 2021-06-03

monwx1rj5#

这可能是由旧api和新api引起的。
这是我的新作业api来进行配置。
步骤1：导入新的api库

import org.apache.hadoop.mapreduce.Job

第二步：通过新的api作业进行配置。

val job = Job.getInstance(conf)
job.getConfiguration.set(TableOutputFormat.OUTPUT_TABLE, tableName)
job.setOutputFormatClass(classOf[TableOutputFormat[Put]])

希望这能对你有所帮助。

赞(0）回复(0）举报 2021-06-03

col17t5w6#

您没有像apachehadoop教程中指定的那样创建hdfs输入和输出目录。
如果你想使用本地目录 file:///user/sritamd/TestData -添加fs前缀。

赞(0）回复(0）举报 2021-06-03

rwqw0loc7#

如果在标准模式（没有集群）上运行hadoop来测试代码，那么输出路径不需要fs前缀。您可以初始化作业并设置路径。下面的代码应该可以工作（确保您使用的是job（来自org.apache.hadoop.mapreduce.job）或jobconf（来自org.apache.hadoop.mapred.jobconf）

Job job = new Job();
        job.setJobName("Job Name");
        job.setJarByClass(MapReduceJob.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        job.setMapperClass(MaxTemperatureMapper.class);
        job.setReducerClass(MaxTemperatureReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        System.exit(job.waitForCompletion(true)? 0:1);

赞(0）回复(0）举报 2021-06-03

rdrgkggo8#

您的hdfs文件系统可能没有被创建，您需要首先格式化给定的目录，并且该目录可以用作hadoop文件的输入和输出
/usr/local/hadoop/bin/hadoop namenode-格式
使用链接：-http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
并遵循每一步

赞(0）回复(0）举报 2021-06-03

我来回答

输出目录未在jobconf中设置

8条答案

相关问题

热门标签

最新问答