apache gora reducer,用于带hbase的多表输出

hts6caw3  于 2021-06-09  发布在  Hbase
关注(0)|答案(1)|浏览(360)

我在hbase表中有小数据通过nutch爬网。它让我们使用apache-gora作为orm。我在hbase中找到了许多处理单个表中数据的示例(mapreduce)。但我的问题是,我必须将数据复制到多个表中(在reducer中)。没有gora,就有一些指导,例如,这个问题等等,但是如何为我的案例做指导。

8wigbo56

8wigbo561#

我从来没有按你的要求做过,但你可能会在gora教程的“构建工作”部分看到答案。这里有一个减速机配置的例子,上面说:

/* Mappers are initialized with GoraMapper.initMapper() or 
 * GoraInputFormat.setInput()*/
GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
    , LogAnalyticsMapper.class, true);

/* Reducers are initialized with GoraReducer#initReducer().
 * If the output is not to be persisted via Gora, any reducer 
 * can be used instead. */
GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);

然后,不用 GoraReducer.initReducerJob() 您只需配置自己的减速机,如下链接所示(如果答案正确):

GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
    , LogAnalyticsMapper.class, true);
job.setOutputFormatClass(MultiTableOutputFormat.class);
job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(2);
TableMapReduceUtil.addDependencyJars(job);
TableMapReduceUtil.addDependencyJars(job.getConfiguration());

要知道在前面的示例中,Map器发出 (TextLong, LongWritable) 键值,所以你的减速机应该是这样的,从你写的链接和答案:

public class MyReducer extends TableReducer<TextLong, LongWritable, Put> {

    private static final Logger logger = Logger.getLogger( MyReducer.class );

    @SuppressWarnings( "deprecation" )
    @Override
    protected void reduce( TextLong key, Iterable<LongWritable> data, Context context ) throws IOException, InterruptedException {
        logger.info( "Working on ---> " + key.toString() );
        for ( Result res : data ) {
            Put put = new Put( res.getRow() );
            KeyValue[] raw = res.raw();
            for ( KeyValue kv : raw ) {
                put.add( kv );
            }

        ImmutableBytesWritable key = new ImmutableBytesWritable(Bytes.toBytes("tableName"));
        context.write(key, put);    

        }
    }
}

再说一次,我从来没有这样做过。。。所以也许行不通:\

相关问题