我试图从sparkshell上的hdfs读取文件,得到如下错误。当我创建第一个rdd时,它可以正常工作,但是当我尝试使用该rdd时,它会给我带来一些连接错误。我有单节点的hdfs设置和在同一台机器上,我有Spark运行。请帮忙。当我在同一个框中运行“jps”命令以查看hadoop集群是否按预期工作时,我看到一切正常,并看到下面的输出。
[hadoop@idcrebalancedev ~]$ jps
23606 DataNode
28245 Jps
23982 TaskTracker
26537 Main
23738 SecondaryNameNode
23858 JobTracker
23488 NameNode
下面是rdd创建和错误计数的输出。
scala> val hdfsFile = sc.textFile("hdfs://idcrebalancedev.bxc.is-teledata.com:23488/user/hadoop/reegal/4300.txt")
14/04/08 12:25:15 INFO MemoryStore: ensureFreeSpace(784) called with curMem=35456, maxMem=308713881
14/04/08 12:25:15 INFO MemoryStore: Block broadcast_1 stored as values to memory (estimated size 784.0 B, free 294.4 MB)
hdfsFile: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at textFile at <console>:12
scala> hdfsFile.count()
14/04/08 12:25:22 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 0 time(s).
14/04/08 12:25:23 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 1 time(s).
14/04/08 12:25:24 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 2 time(s).
14/04/08 12:25:25 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 3 time(s).
14/04/08 12:25:26 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 4 time(s).
14/04/08 12:25:27 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 5 time(s).
14/04/08 12:25:28 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 6 time(s).
14/04/08 12:25:29 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 7 time(s).
14/04/08 12:25:30 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 8 time(s).
14/04/08 12:25:31 INFO Client: Retrying connect to server: idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488. Already tried 9 time(s).
java.net.ConnectException: Call to idcrebalancedev.bxc.is-teledata.com/172.29.253.4:23488 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy9.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:898)
at org.apache.spark.rdd.RDD.count(RDD.scala:720)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
at $iwC$$iwC$$iwC.<init>(<console>:20)
at $iwC$$iwC.<init>(<console>:22)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:788)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:833)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:745)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:593)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:600)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:603)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:926)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:876)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:968)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:601)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
at org.apache.hadoop.ipc.Client.call(Client.java:1050)
... 60 more
scala>
在框上输出lsof命令以查看侦听端口是否按预期工作。
[hadoop@idcrebalancedev ~]$ lsof -n|grep LIST
java 23488 hadoop 57u IPv4 91020 0t0 TCP *:59730 (LISTEN)
java 23488 hadoop 66u IPv4 91176 0t0 TCP 127.0.0.1:cslistener (LISTEN)
java 23488 hadoop 77u IPv4 91321 0t0 TCP *:50070 (LISTEN)
java 23606 hadoop 57u IPv4 91167 0t0 TCP *:32866 (LISTEN)
java 23606 hadoop 67u IPv4 91567 0t0 TCP *:50010 (LISTEN)
java 23606 hadoop 68u IPv4 91569 0t0 TCP *:50075 (LISTEN)
java 23606 hadoop 74u IPv4 91599 0t0 TCP *:50020 (LISTEN)
java 23738 hadoop 57u IPv4 91493 0t0 TCP *:49940 (LISTEN)
java 23738 hadoop 67u IPv4 91642 0t0 TCP *:50090 (LISTEN)
java 23858 hadoop 57u IPv4 91660 0t0 TCP *:46014 (LISTEN)
java 23858 hadoop 63u IPv4 91778 0t0 TCP 127.0.0.1:etlservicemgr (LISTEN)
java 23858 hadoop 73u IPv4 91806 0t0 TCP *:50030 (LISTEN)
java 23982 hadoop 61u IPv4 91909 0t0 TCP 127.0.0.1:55097 (LISTEN)
java 23982 hadoop 78u IPv4 92170 0t0 TCP *:50060 (LISTEN)
java 26537 hadoop 10u IPv6 1805728 0t0 TCP *:40865 (LISTEN)
java 26537 hadoop 38u IPv6 1805807 0t0 TCP 172.29.253.4:47852 (LISTEN)
java 26537 hadoop 42u IPv6 1805810 0t0 TCP *:44402 (LISTEN)
java 26537 hadoop 43u IPv6 1805812 0t0 TCP *:32796 (LISTEN)
java 26537 hadoop 44u IPv6 1805816 0t0 TCP *:46234 (LISTEN)
java 26537 hadoop 45u IPv6 1805818 0t0 TCP *:yo-main (LISTEN)
2条答案
按热度按时间vltsax251#
我在尝试通过spark部署(scala shell)访问hdfs文件时遇到了类似的问题
这里必须提到的是,在配置hadoop集群时,
core-site.xml
是包含“文件系统名称”和uri方案的文件。我们应该参考这个文件,把spark rdd做成:
例如:我的
core-site.xml
内容:我的hdfs文件:
通过scala访问
2ul0zpep2#
我发现了几个问题。
1:我们不应该使用我们用来访问web ui的web端口。我一开始用的是这个,所以不起作用。2:所有请求都应该转到name节点,而不是其他任何请求。3:更换localhost:9000 in 上述要求,它开始运作良好。
基于此,我还有一个问题,那就是如何使它在域中工作,而不是在localhost和port中使用。答案可能是您需要在core-site.xml文件中更改它,并指定正确的网址而不是localhost?