java—直接从reduce写入hadoopMap文件

s1ag04yj 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(461)

我有一个reduce方法，根据记录中的时间戳选择一个文件。
数据中的时间戳可以属于n个不同的天（假设n=5），根据这一天，选择一个文件，并使用相应的路径选择一个mapfile writer。n个路径有n个写入程序

Example : to write record 15-02-2016,Key1,value1 
 A Map File writer object writing to basePath/15-02-2016  will be selected 
and writes key1,value1 using selected writer.

下面是reduce方法

@Override
 protected void reduce(CompositeKey key,Iterable<SomeDataWritable> dataList,
          Reducer<CompositeKey, SomeDataWritable, Text, OutputWritable>.Context context)
          throws IOException, InterruptedException {
          for(SomeDataWritable data:dataList){
            MyMapFileWriter.write(key.getTimeStamp(),key.getId(),new OutPutWritable(data);
           }
}

MyMapFileWriter.write(long timestamp,Text key,OutPutWritable value){
writer=selectWriter(timestamp)// select writer based on timestamp
writer.append(key,value)
}

键的排序日期（day，id）。partitioner是基于day的，groupingcomparator是基于（day，id）的，所以对reduce的调用应该得到按id排序的一天的所有记录。在这里直接从reduce写入文件可以吗？
写入Map文件的键应该按升序排列，reduce方法的多个并行调用（在同一个reducer节点上）是否会导致键顺序错误？
即使没有任何context.write in reduce作业输出路径也有一些输出（我在eclipse中以本地模式运行）。这可能是hadoop reducer的reduce（）编写的Map器输出。如何避免这种情况？

Java hadoop mapreduce bigdata

来源：https://stackoverflow.com/questions/35418724/write-directly-to-a-hadoop-map-file-from-reduce

1条答案

按热度按时间

djp7away1#

我认为通过writer直接写一些文件不是个好主意，因为这与hadoop的容错思想不一致：你运行你的作业，某个节点出现故障，hadoop试图重新安排作业，但是由于您在没有hadoop标准机制的情况下对文件进行写操作，它无法处理部分失败的结果（您应该自己处理）。
根据“坏键”。我不确定我是否理解您的问题，但是一个reducer将处理一个键的数据，例如，一个reducer可以处理键<2016-02-02，id1>的数据，另一个reducer可以处理键<2016-02-01，id2>的记录等。
如果理解正确，应该指定reduce output path FileOutputFormat.setOutputPath(job, OUTPUT_PATH) 所以输入和输出路径将不同。在这种情况下，您将在输出路径中接收与减速器相关的文件。

赞(0）回复(0）举报 2021-05-29

我来回答

java—直接从reduce写入hadoopMap文件

1条答案

相关问题

热门标签

最新问答