我在链接两个mapreduce作业时遇到问题

wko9yo5t 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(263)

第一张Map是

map ( key, line ):
  read 2 long integers from the line into the variables key2 and value2
  emit (key2,value2)

reduce ( key, nodes ):
count = 0 
for n in nodes
    count++
emit(key,count)

第二张Map是：

map ( node, count ):
emit(count,1)

reduce ( key, values ):
sum = 0
for v in values
    sum += v
emit(key,sum)

我为此编写的代码是：

import java.io.IOException;
import java.util.Scanner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Graph extends Configured implements Tool{
@Override   
public int run( String[] args ) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "job1");
    job.setJobName("job1");
    job.setJarByClass(Graph.class);

    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapOutputKeyClass(IntWritable.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setMapperClass(MyMapper.class);
    job.setReducerClass(MyReducer.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job,new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path("job1"));

    job.waitForCompletion(true);

    Job job2 = Job.getInstance(conf, "job2");
    job2.setJobName("job2");

    job2.setOutputKeyClass(IntWritable.class);
    job2.setOutputValueClass(IntWritable.class);

    job2.setMapOutputKeyClass(IntWritable.class);
    job2.setMapOutputValueClass(IntWritable.class);

    job2.setMapperClass(MyMapper1.class);
    job2.setReducerClass(MyReducer1.class);

    job2.setInputFormatClass(TextInputFormat.class);
    job2.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job2,new Path("job1"));
    FileOutputFormat.setOutputPath(job2,new Path(args[1]));

    job2.waitForCompletion(true);

    return 0;

}

public static void main ( String[] args ) throws Exception {
  ToolRunner.run(new Configuration(),new Graph(),args);

}

public static class MyMapper extends Mapper<Object,Text,IntWritable,IntWritable> {
    @Override
    public void map ( Object key, Text value, Context context )
                    throws IOException, InterruptedException {
        Scanner s = new Scanner(value.toString()).useDelimiter(",");
        int key2 = s.nextInt();
        int value2 = s.nextInt();
        context.write(new IntWritable(key2),new IntWritable(value2));
        s.close();
    }
}

public static class MyReducer extends Reducer<IntWritable,IntWritable,IntWritable,IntWritable> {
    @Override
    public void reduce ( IntWritable key, Iterable<IntWritable> values, Context context )
                       throws IOException, InterruptedException {
        int count = 0;
        for (IntWritable v: values) {
            count++;
        };
        context.write(key,new IntWritable(count));
    }
}

public static class MyMapper1 extends Mapper<IntWritable, IntWritable,IntWritable,IntWritable >{
    @Override
    public void map(IntWritable node, IntWritable count, Context context )
                    throws IOException, InterruptedException {

        context.write(count, new IntWritable(1));
    }

}

public static class MyReducer1 extends Reducer<IntWritable,IntWritable,IntWritable,IntWritable> {
    @Override
    public void reduce ( IntWritable key, Iterable<IntWritable> values, Context context )
                       throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable v: values) {
            sum += v.get();
        };
        context.write(key,new IntWritable(sum));
        //System.out.println("job 2"+sum);
    }

}

}
我已经尝试实现了psudocode，arg[0]是输入，arg[1]是输出…..当我运行代码时，我得到的是job1的输出，而不是job2的输出。有什么问题吗？？我认为我没有正确地将job1的输出传递给job2。

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/54755441/i-am-having-trouble-chaining-two-mapreduce-jobs

1条答案

按热度按时间

cigdeys31#

而不是中的job1

FileOutputFormat.setOutputPath(job, new Path("job1"));

改用这个：

String temporary="home/xxx/...."    //store result here
FileOutputFormat.setOutputPath(job, new Path(temporary));

赞(0）回复(0）举报 2021-05-27

我来回答

我在链接两个mapreduce作业时遇到问题

1条答案

相关问题

热门标签

最新问答