我已经使用hbase export实用工具备份了hbase表。
hbase org.apache.hadoop.hbase.mapreduce.Export "FinancialLineItem" "/project/fricadev/ESGTRF/EXPORT"
这已经启动了mapreduce并将我的所有表数据传输到输出文件夹中。根据文件,输出文件的文件格式为序列文件。所以我运行下面的代码从文件中提取密钥和值。
现在我想运行mapreduce从输出文件中读取键值,但得到以下异常
java.lang.exception:java.io.ioexception:找不到值类的反序列化程序:“org.apache.hadoop.hbase.client.result”。如果使用自定义序列化,请确保配置“io.serializations”配置正确。在org.apache.hadoop.mapred.localjobrunner$job.run(localjobrunner。java:406)原因:java.io.ioexception:找不到值类的反序列化程序:“org.apache.hadoop.hbase.client.result”。如果使用自定义序列化,请确保配置“io.serializations”配置正确。在org.apache.hadoop.io.sequencefile$reader.init(sequencefile。java:1964)在org.apache.hadoop.io.sequencefile$reader.initialize(sequencefile。java:1811)在org.apache.hadoop.io.sequencefile$reader。java:1760)在org.apache.hadoop.io.sequencefile$reader。java:1774)在org.apache.hadoop.mapreduce.lib.input.sequencefilerecordreader.initialize(sequencefilerecordreader)。java:50)在org.apache.hadoop.mapred.maptask$newtrackingrecordreader.initialize(maptask。java:478)在org.apache.hadoop.mapred.maptask.runnewmapper(maptask。java:671)在org.apache.hadoop.mapred.maptask.run(maptask。java:330)
这是我的司机代码
package SEQ;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class SeqDriver extends Configured implements Tool
{
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new SeqDriver(), args);
System.exit(exitCode);
}
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s needs two arguments files\n",
getClass().getSimpleName());
return -1;
}
String outputPath = args[1];
FileSystem hfs = FileSystem.get(getConf());
Job job = new Job();
job.setJarByClass(SeqDriver.class);
job.setJobName("SequenceFileReader");
HDFSUtil.removeHdfsSubDirIfExists(hfs, new Path(outputPath), true);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(MySeqMapper.class);
job.setNumReduceTasks(0);
int returnValue = job.waitForCompletion(true) ? 0:1;
if(job.isSuccessful()) {
System.out.println("Job was successful");
} else if(!job.isSuccessful()) {
System.out.println("Job was not successful");
}
return returnValue;
}
}
这是我的Map代码
package SEQ;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MySeqMapper extends Mapper <ImmutableBytesWritable, Result, Text, Text>{
@Override
public void map(ImmutableBytesWritable row, Result value,Context context)
throws IOException, InterruptedException {
}
}
1条答案
按热度按时间tkqqtvp11#
所以我要回答我的问题是,需要什么才能使它发挥作用
因为我们使用hbase来存储数据,这个reducer将结果输出到hbase表,hadoop告诉我们他不知道如何序列化数据。这就是为什么我们需要帮助它。在安装程序内部设置io.serializations变量