无法使用importtsv将数据从hdfs导入hbase

zazmityj  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(633)

我把制表符分隔的文件移到了hdfs中,现在我正试图把它移到hbase中。下面是我的importtsv命令

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -Dimporttsv.skip.bad.lines=false 'sales_fact' hdfs://localhost:54310/my/file.txt

它试图从不存在的位置读取jar。

Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:738)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)

我不明白为什么它把hdfs和本地dir路径混为一谈。hdfs://localhost:54310/home/elijah/downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
运行导入作业的用户对本地目录上的hbase lib具有完全访问权限。

ubof19bj

ubof19bj1#

我看得见 -libjars 缺少选项…您可以使用 -libjars 下面的选项是示例用法:

hadoop jar \
hbase-server-0.98.6-cdh5.2.1.jar \
importtsv \
-libjars /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/high-scale-lib-1.1.1.jar \
-Dimporttsv.separator=, -Dimporttsv.bulk.output=output \
-Dimporttsv.columns=HBASE_ROW_KEY,f:count wordcount \
word_count.csv

你也可以这样做this:-


# export HADOOP_CLASSPATH=`./hbase classpath`

缺少的jar之一,即hbase/lib/htrace-core-3.1.0-incubating.jar将是hbase类路径。在这种情况下应该有用。

相关问题