我的hadoop版本是-2.6.0-cdh5.10.0,我使用的是cloudera虚拟机。
我试图通过代码访问hdfs文件系统,以访问这些文件并将其添加为输入或缓存文件。
当我试图通过命令行访问hdfs文件时,我能够列出这些文件。
命令:
[cloudera@quickstart java]$ hadoop fs -ls hdfs://localhost:8020/user/cloudera
Found 5items
-rw-r--r-- 1 cloudera cloudera 106 2017-02-19 15:48 hdfs://localhost:8020/user/cloudera/test
drwxr-xr-x - cloudera cloudera 0 2017-02-19 15:42 hdfs://localhost:8020/user/cloudera/test_op
drwxr-xr-x - cloudera cloudera 0 2017-02-19 15:49 hdfs://localhost:8020/user/cloudera/test_op1
drwxr-xr-x - cloudera cloudera 0 2017-02-19 15:12 hdfs://localhost:8020/user/cloudera/wc_output
drwxr-xr-x - cloudera cloudera 0 2017-02-19 15:16 hdfs://localhost:8020/user/cloudera/wc_output1
当我试图通过map reduce程序访问相同的东西时,我收到了文件not found exception。我的map reduce示例配置代码是:
public int run(String[] args) throws Exception {
Configuration conf = getConf();
if (args.length != 2) {
System.err.println("Usage: test <in> <out>");
System.exit(2);
}
ConfigurationUtil.dumpConfigurations(conf, System.out);
LOG.info("input: " + args[0] + " output: " + args[1]);
Job job = Job.getInstance(conf);
job.setJobName("test");
job.setJarByClass(Driver.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
job.addCacheFile(new Path("hdfs://localhost:8020/user/cloudera/test/test.tsv").toUri());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
return (result) ? 0 : 1;
}
上面代码段中的job.addcachefile行返回filenotfound异常。
2) 我的第二个问题是:
我在core-site.xml中的条目指向localhost:9000 for 默认的hdfs文件系统uri。但是在命令提示符下,我只能在端口8020访问默认的hdfs文件系统,而不能在9000访问。当我尝试使用端口9000时,我遇到了connectionskeened异常。我不确定从何处读取配置。
我的core-site.xml如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/student/tmp/hadoop-local/tmp</value>
<description>A base for other temporary directories.</description>
</property>
-->
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>Default file system URI. URI:scheme://authority/path scheme:method of access authority:host,port etc.</description>
</property>
</configuration>
我的hdfs-site.xml如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/tmp/hdfs/name</value>
<description>Determines where on the local filesystem the DFS name
node should store the name table(fsimage).</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/tmp/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.Usually 3, 1 in our case
</description>
</property>
</configuration>
我收到以下例外情况:
java.io.FileNotFoundException: hdfs:/localhost:8020/user/cloudera/test/ (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at java.io.FileReader.<init>(FileReader.java:58)
at hadoop.TestDriver$ActorWeightReducer.setup(TestDriver.java:104)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
任何帮助都是有用的!
1条答案
按热度按时间yiytaume1#
从hdfs访问文件时,不需要提供完整路径作为参数。namenode本身(来自core site.xml)将添加hdfs://host_address. 您只需要提到您想要访问的文件,以及您的案例中的目录结构
/user/cloudera/test
.来到您的2个问题端口号8020是默认的hdfs端口。这就是为什么您能够访问端口8020的hdfs,即使您没有提到它。connectiondensed异常的原因是hdfs从8020开始,这就是为什么端口9000不需要任何请求,因此它拒绝了连接。
有关默认端口的更多详细信息,请参阅此处