我正在尝试将一个大文件(32gb)复制到hdfs中。我在hdfs中复制文件从来没有遇到过任何问题,但这些文件都比较小。我在用 hadoop fs -put <myfile> <myhdfsfile>
高达13.7 gb,一切正常,但我有一个例外:
hadoop fs -put * /data/unprocessed/
Exception in thread "main" org.apache.hadoop.fs.FSError: java.io.IOException: Input/output error
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:150)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:217)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:191)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
Caused by: java.io.IOException: Input/output error
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:242)
at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.read(RawLocalFileSystem.java:91)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:144)
... 20 more
当我检查日志文件(在我的namenode和datanodes上)时,我看到文件上的租约被删除了,但是没有指定原因。根据日志文件,一切都很顺利。以下是我的namenode日志的最后几行:
2013-01-28 09:43:34,176 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /data/unprocessed/AMR_EXPORT.csv. blk_-4784588526865920213_1001
2013-01-28 09:44:16,459 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.6.114:50010 is added to blk_-4784588526865920213_1001 size 30466048
2013-01-28 09:44:16,466 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file /data/unprocessed/AMR_EXPORT.csv from client DFSClient_1738322483
2013-01-28 09:44:16,472 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /data/unprocessed/AMR_EXPORT.csv is closed by DFSClient_1738322483
2013-01-28 09:44:16,517 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 168 Total time for transactions(ms): 26Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
有人知道这件事吗?我查过了 core-default.xml
以及 hdfs-default.xml
对于我可以覆盖的财产,这将延长租赁或左右,但找不到一个。
2条答案
按热度按时间rn0zuynd1#
一些建议:
如果要复制多个文件,那么使用多个put会话
如果只有一个大文件,则在复制之前使用压缩,或者可以将大文件拆分为小文件,然后进行复制
7uhlpewt2#
这听起来是读取本地文件的问题,而不是hdfs客户机的问题。堆栈跟踪显示读取一直冒泡的本地文件时出现问题。由于客户端在读取文件时由于ioexception而断开了连接,因此租约被丢弃。