从笔记本电脑到远程Hadoop集群的hdfs put失败

nimxete2  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(250)

我在不同的网络上设置了我的Hadoop集群。正因为如此,当我从笔记本电脑上运行hdfs put时,它失败了。
是否有一个端口我应该转发或东西远程访问数据节点?我看到它使用的是错误消息中的本地IP地址。
下面是命令:hdfs dfs -put ~/Documents/reddit-streaming/redditStreaming/target/redditStreaming-1.0-SNAPSHOT.jar hdfs://mydns.asuscomm.com:8021/user/me/jars/
下面是错误消息:

2021-10-14 18:04:55,704 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742036_1212
java.net.UnknownHostException
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
2021-10-14 18:04:55,708 WARN hdfs.DataStreamer: Abandoning BP-668799564-192.168.50.7-1633461871664:blk_1073742036_1212
2021-10-14 18:04:55,752 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.50.31:9866,DS-60974173-31d6-4dcb-a2ba-05ab6431db66,DISK]
2021-10-14 18:05:00,801 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742037_1213
java.net.UnknownHostException
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1757)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1711)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)
2021-10-14 18:05:00,801 WARN hdfs.DataStreamer: Abandoning BP-668799564-192.168.50.7-1633461871664:blk_1073742037_1213
2021-10-14 18:05:00,833 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.50.19:9866,DS-aeaca5a1-562c-4f35-b2fb-6f0b51c5f695,DISK]
2021-10-14 18:05:00,869 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/me/jars/redditStreaming-1.0-SNAPSHOT.jar._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2329)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2942)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:915)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:593)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:600)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:568)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:552)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2966)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1573)
        at org.apache.hadoop.ipc.Client.call(Client.java:1519)
        at org.apache.hadoop.ipc.Client.call(Client.java:1416)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:530)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1084)
        at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1898)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1700)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:707)

在我的笔记本电脑上的hdfs-site.xml文件中有以下属性:

<property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>true</value>
</property>

我还可以在UI中看到两个数据节点都在运行。

moiiocjp

moiiocjp1#

我假设您已经转发了namenode端口(8021),因为它可以看到存在2个datanode?
是的,数据节点有自己的端口,客户端需要使用这些端口才能实际写入数据
检查dfs.datanode.address的值,确保可以为每个数据阳极建立到此处列出的端口的连接。
如果您查看错误,您会发现这是9866
不包括数据节点数据节点信息存储[192.168.50.31:9866
此外,IIUC、use.datanode.hostname配置实际上需要位于群集中,而不是本地笔记本电脑配置中,以便协议返回主机名而不是IP
如果您想查看每个数据阳极的Web门户,还可以打开一个HTTP端口(也应该可以从Namenode UI访问)
另一种更安全/暴露更少的选择是在网络之间建立一个边缘节点,您只能通过SSH将& SFTP文件发送到该节点(假设您没有共享文件服务器),然后从该节点运行hdfs命令。
要再次迭代,您应通过任何面向互联网的路由器在动态DNS上公开不带Kerberos和TLS的Hadoop群集

相关问题