合并具有相同密钥的两个Map文件

xoshrz7s  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(330)

我有一个用例,其中我生成了一个包含键/值对的Map文件。内容如下
键1[红、绿、蓝]
在其他情况下,我想用更多的条目更新“value”。我想通过生成第二个Map文件来实现这一点,该文件具有与新条目相同的键
键1[紫色,黄色]
我需要的是我应该能够有一个Map文件如下
键1[红、绿、蓝、紫、黄]
合并Map文件可以做到这一点吗?或者,我们有什么解决办法吗?
敬拉布

5cnsuln7

5cnsuln71#

是的,你可以使用多个输入。
您可以重用相同的Map类,因为两个文件的格式相同。并在reducer中连接这些值。

public class MultipleFiles
{
    public static class Map1 extends Mapper<LongWritable,Text,Text,Text>
    {
        public void map(LongWritable k, Text value, Context context) throws IOException, InterruptedException
        {
            String line=value.toString();
            String[] words=line.split(" ");
            String val1=words[0];
            String val2=words[1];
            context.write(new Text(val1), new Text(val2));
        }
    }
    public static class Red extends Reducer<Text,Text,Text,Text>
    {
        static String merge = "";

        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException , InterruptedException
                {
            int i =0;
            for(Text value:values)
            {
                if(i == 0){
                    merge = value.toString()+",";
                }
                else{
                    merge += value.toString();
                }

                i++;
            }
            context.write(key, new Text(merge));

                }
    }
    public static void main(String[] args) throws Exception
    {
        Configuration c=new Configuration();
        String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
        Path p1=new Path(files[0]);
        Path p2=new Path(files[1]);
        Path p3=new Path(files[2]);
        FileSystem fs = FileSystem.get(c);
        if(fs.exists(p3)){
            fs.delete(p3, true);
        }
        Job j = new Job(c,"multiple");
        j.setJarByClass(MultipleFiles.class);
        j.setReducerClass(Red.class);
        j.setOutputKeyClass(Text.class);
        j.setOutputValueClass(Text.class);
        MultipleInputs.addInputPath(j, p1, TextInputFormat.class, Map1.class);
        MultipleInputs.addInputPath(j,p2, TextInputFormat.class, Map1.class);
        FileOutputFormat.setOutputPath(j, p3);
        System.exit(j.waitForCompletion(true) ? 0:1);

    }

}
ahy6op9u

ahy6op9u2#

You can achieve this by following below steps :

   1) create two different mappers by creating two different classes 
   2) keep the same key in both mapper classes 
   3) In driver class make use of  MultiFileInputFormat .

All the values which belong to same key will automatically come to reducer then you can do what ever you want to do with value .

相关问题