我在hbase中有一个大表,我想把它们分成几个小表,这样我就更容易使用了(原来的table应该保留。)我怎么做?例如:我有一张table,叫做 all 使用以下行键:
all
animal-1, ... plant-1, ... animal-2, ... plant-2, ... human-1, ... human-2, ...
我想把它分成三张table: animal , plant , human 对于三种生物。我该怎么做?
animal
plant
human
ozxc1zmp1#
可以将mapreduce与multipletableoutputformat一起使用,如下例所示。但在下面的例子中,我从文件中读取 TextInputFormat 相反,您必须使用TableInputFormat 'all' 而不是表1表2。。。你必须使用 'animal', 'planet', 'human' 根据您的需求,如果您对hbase表进行扫描并使用tableinputformat将其传递给mapper,那么您将获得rowkey以及mapper的map方法。您需要比较这一点来决定要插入哪个表。请参见7.2.2。hbase mapreduce读/写示例
TextInputFormat
'all'
'animal', 'planet', 'human'
package mapred; import java.io.IOException; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.hbase.client.Put; public class MultiTableMapper { static class InnerMapper extends Mapper <LongWritable, Text, ImmutableBytesWritable, Put> { public void map(LongWritable offset, Text value, Context context) throws IOException { // contains the line of tab separated data we are working on (needs to be parsed out). //byte[] lineBytes = value.getBytes(); String valuestring[]=value.toString().split(“\t”); String rowid = /*HBaseManager.generateID();*/ “12345”; // rowKey is the hbase rowKey generated from lineBytes Put put = new Put(rowid.getBytes()); put.add(Bytes.toBytes(“UserInfo”), Bytes.toBytes(“StudentName”), Bytes.toBytes(valuestring[0])); try { context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table1”)), put); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } // write to the actions table // rowKey2 is the hbase rowKey Put put1 = new Put(rowid.getBytes()); put1.add(Bytes.toBytes(“MarksInfo”),Bytes.toBytes(“Marks”),Bytes.toBytes(valuestring[1])); // Create your KeyValue object //put.add(kv); try { context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table2”)), put1); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } // write to the actions table } } public static void createSubmittableJob() throws IOException, ClassNotFoundException, InterruptedException { Path inputDir = new Path(“in”); Configuration conf = /*HBaseManager.getHBConnection();*/ new Configuration(); Job job = new Job(conf, “my_custom_job”); job.setJarByClass(InnerMapper.class); FileInputFormat.setInputPaths(job, inputDir); job.setMapperClass(InnerMapper.class); job.setInputFormatClass(TextInputFormat.class); // this is the key to writing to multiple tables in hbase job.setOutputFormatClass(MultiTableOutputFormat.class); //job.setNumReduceTasks(0); //TableMapReduceUtil.addDependencyJars(job); //TableMapReduceUtil.addDependencyJars(job.getConfiguration()); System.out.println(job.waitForCompletion(true)); } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { // TODO Auto-generated method stub MultiTableMapper.createSubmittableJob(); System.out.println(); } }
1条答案
按热度按时间ozxc1zmp1#
可以将mapreduce与multipletableoutputformat一起使用,如下例所示。
但在下面的例子中,我从文件中读取
TextInputFormat
相反,您必须使用TableInputFormat
'all'
而不是表1表2。。。你必须使用'animal', 'planet', 'human'
根据您的需求,如果您对hbase表进行扫描并使用tableinputformat将其传递给mapper,那么您将获得rowkey以及mapper的map方法。您需要比较这一点来决定要插入哪个表。请参见7.2.2。hbase mapreduce读/写示例