我创建了一个mapreduce作业,并正在多群集环境中进行测试,但出现以下错误:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.company.hbase.mapreduce.message.maestro.threadIndex.fakecolum.MockTestThreadIndexData.run(MockTestThreadIndexData.java:47)
at com.company.hbase.mapreduce.MaestroUpdateJob.main(MaestroUpdateJob.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
我看到了 hadoop-common-2.6.0.jar
上缺少jar hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common
jar文件存在于/opt/hadoop/share/hadoop/common上,但我的工作是寻找hdfs内部。如果我将所有jar(有很多jar)复制到hdfs中,它就工作了。但问题是,我想明白,这真的有必要吗?有人能解释为什么吗?如果我想在生产中运行它,我需要做这个吗?对吗?
另外,我看到了为什么需要在hdfs中保留hbase/lib文件夹的答案?是的,如果我将mapreduce框架改为yarn,它也可以工作。但我不想和yarn一起工作,我只想理解为什么我必须将所有hadoop lib移动到hdfs来运行mapreduce作业。
更新
下面是如何示例化jobconf
Job job = Job.getInstance(config, "MyJob");
Scan scan = createScan();
Filter filter = createMyFilter();
FilterList filters = createMyFilter();
scan.setFilter(filters);
TableMapReduceUtil.initTableMapperJob(
MY_TABLE,
scan,
MyMapper.class,
null,
null,
job
);
TableMapReduceUtil.initTableReducerJob(
MY_TABLE,
null,
job
);
job.setNumReduceTasks(0);
这是我的 mapred-site.xml
```
mapred.job.tracker
myhost:9001
hadoop.ssl.enabled
true
hadoop.ssl.require.client.cert
false
true
hadoop.ssl.hostname.verifier
DEFAULT
true
hadoop.ssl.keystores.factory.class
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory
true
hadoop.ssl.server.conf
ssl-server.xml
true
hadoop.ssl.client.conf
ssl-client.xml
true
我如何运行它:
HADOOP_CLASSPATH=/opt/hbase/bin/hbase classpath
/opt/hadoop/bin/hadoop jar /tmp/mymapred-1.0-SNAPSHOT-jar-with-dependencies.jar
## 解决方案
最后,我从这个评论中得到了答案:https://stackoverflow.com/a/31950822/13305602
在core-site.xml中,有两个属性用于配置hadoop中的默认文件系统。
暂无答案!
目前还没有任何答案,快来回答吧!