我正在两个redhat6.4linux系统上运行hadoop testdfsio write程序
100%Map16%减少
我运行testdfsio write工作负载作为
hadoop jar hadoop-test-1.2.1.jar -write -nrFiles 960 -fileSize 1024 .
格式化namenode后,它在一次运行时运行良好,但在第二次运行时,它再次失败,因为它在完成Map任务后挂起。
100%Map16%减少。
格式化namenode之后,它可以通过
hadoop jar hadoop-test-1.2.1.jar -write -nrFiles 960 -fileSize 1024
但是当我运行read工作负载时
hadoop jar hadoop-test-1.2.1.jar -read -nrFiles 960 -fileSize 1024
它卡在最后阶段后,因为:-
100% map 16% reduce done.
为什么reduce任务不能正常完成?
主节点上tasktracker的日志显示(时间和类名缩短):-
...0:15,541 INFO ....JvmManager: JVM : jvm_201309241959_0001_m_226512462 exited with exit code 0. Number of tasks it ran: 1
...0:15,814 INFO ....TaskTracker: attempt_201309241959_0001_m_000958_0 0.0% reading test_io_8@790197504/1073741824 ::host = 9.122.227.170
...0:16,768 INFO ....TaskTracker: Received KillTaskAction for task: attempt_201309241959_0001_m_000957_1
...0:16,768 INFO ....TaskTracker: About to purge task: attempt_201309241959_0001_m_000957_1
...0:16,768 INFO ....IndexCache: Map ID attempt_201309241959_0001_m_000957_1 not found in cache
...0:17,559 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16597223% reduce > copy (478 of 960 at 0.00 MB/s) >
...0:18,355 INFO ....TaskTracker: attempt_201309241959_0001_m_000958_0 1.0% finished test_io_8 ::host = 9.122.227.170
...0:18,355 INFO ....TaskTracker: Task attempt_201309241959_0001_m_000958_0 is done.
...0:18,355 INFO ....TaskTracker: reported output size for attempt_201309241959_0001_m_000958_0 was 93
...0:18,356 INFO ....TaskTracker: addFreeSlot : current free slots : 2
...0:18,498 INFO ....JvmManager: JVM : jvm_201309241959_0001_m_832308806 exited with exit code 0. Number of tasks it ran: 1
...0:20,584 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16597223% reduce > copy (478 of 960 at 0.00 MB/s) >
...0:21,697 INFO ....TaskTracker.clienttrace: src: 9.122.227.170:50060, dest: 9.122.227.170:48771, bytes: 93, op: MAPRED_SHUFFLE, cliID: attempt_201309241959_0001_m_000958_0, duration: 6041257
...0:26,608 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:32,632 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:35,655 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:41,679 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:47,700 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:50,721 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:56,744 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...0:59,766 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:05,789 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:11,812 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:14,835 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:20,859 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:26,885 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:29,908 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:35,931 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:41,955 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:44,978 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:51,002 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...1:57,025 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:00,048 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:06,072 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:12,096 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:15,119 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:21,143 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:27,167 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
...2:30,190 INFO ....TaskTracker: attempt_201309241959_0001_r_000000_0 0.16631946% reduce > copy (479 of 960 at 0.00 MB/s) >
屏幕截图显示hadoop进程被困在作业的reduce阶段。
1条答案
按热度按时间hfsqlsce1#
我解决了dns名称解析的问题,我必须编辑/etc/hosts文件,删除localhost的条目,并在其中添加实际的主机名,在RedhatLinux6.4中,这对我来说确实有效。使hadoop集群能够识别hadoop主节点和hadoop从节点。