hadoop错误暂停job reduce进程

carvr3hs  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(296)

我已经在我的双节点集群设置上运行了几次hadoop作业(字数计算示例),并且´到现在为止他一直工作得很好。我一直得到一个runtimeexception,它将reduce进程延迟19%:

2013-04-13 18:45:22,191 INFO org.apache.hadoop.mapred.Task: Task:attempt_201304131843_0001_m_000000_0 is done. And is in the process of commiting
    2013-04-13 18:45:22,299 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201304131843_0001_m_000000_0' done.
    2013-04-13 18:45:22,318 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
    2013-04-13 18:45:23,181 WARN org.apache.hadoop.mapred.Child: Error running child
    java.lang.RuntimeException: Error while running command to get file permissions : org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:710)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:443)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

    at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:468)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

有人知道是什么导致了这一切吗?
编辑:我自己解决的。如果其他任何人遇到相同的问题,这是由主节点上的etc/hosts文件引起的。我没有´t输入从节点的主机名和地址。以下是主节点上我的主机文件的结构:

127.0.0.1   MyUbuntuServer
    192.xxx.x.xx2   master
    192.xxx.x.xx3   MySecondUbuntuServer
    192.xxx.x.xx3   slave
j13ufse2

j13ufse21#

这里描述了一个类似的问题:http://comments.gmane.org/gmane.comp.apache.mahout.user/8898
那里的信息可能与其他版本的hadoop有关。上面写着:
java.lang.runtimeexception:运行获取文件权限的命令时出错:java.io.ioexception:无法运行程序“/bin/ls”:error=12,空间不足
他们的解决方案是通过mapred.child.java.opts**-xmx1200m更改heapsize
另请参见:https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!主题/cdh用户/bhgyjdnkmge
嗯,阿夫纳

相关问题