“错误的输入路径”在单节点ec2示例上设置一个简单的mrjob

qf9go6mv  于 2021-07-13  发布在  Hadoop
关注(0)|答案(0)|浏览(279)

我正在尝试使用hadoop和java在python中运行一个简单的字数计算程序 mrjob . 我在一个t2.microec2示例上安装了一个伪分布式hadoop2.7.3。程序运行方式为:

  1. python mr_word_count.py -r hadoop hdfs:///user/ubuntu/input/lorem.txt -o output

但它失败了,错误如下:

  1. Using configs in /home/ubuntu/.mrjob.conf
  2. Looking for hadoop binary in /home/ubuntu/hadoop/hadoop-2.7.3/bin...
  3. Found hadoop binary: /home/ubuntu/hadoop/hadoop-2.7.3/bin/hadoop
  4. Using Hadoop version 2.7.3
  5. Creating temp directory /tmp/mr_word_count.ubuntu.20210403.013125.236375
  6. uploading working dir files to hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd...
  7. Copying other local files to hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/
  8. Running step 1 of 1...
  9. Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  10. session.id is deprecated. Instead, use dfs.metrics.session-id
  11. Initializing JVM Metrics with processName=JobTracker, sessionId=
  12. Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
  13. Cleaning up the staging area file:/tmp/mapred/staging/ubuntu1155540475/.staging/job_local1155540475_0001
  14. Error launching job , bad input path : File does not exist: /tmp/mapred/staging/ubuntu1155540475/.staging/job_local1155540475_0001/files/mr_word_count.py#mr_word_count.py
  15. Streaming Command Failed!
  16. Attempting to fetch counters from logs...
  17. Can't fetch history log; missing job ID
  18. No counters found
  19. Scanning logs for probable cause of failure...
  20. Can't fetch history log; missing job ID
  21. Can't fetch task logs; missing application ID
  22. Step 1 of 1 failed: Command '['/home/ubuntu/hadoop/hadoop-2.7.3/bin/hadoop', 'jar', '/home/ubuntu/hadoop/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar', '-files', 'hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd/mr_word_count.py#mr_word_count.py,hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/ubuntu/tmp/mrjob/mr_word_count.ubuntu.20210403.013125.236375/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/ubuntu/input/lorem.txt', '-output', 'hdfs:///user/ubuntu/output', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 mr_word_count.py --step-num=0 --mapper', '-combiner', '/bin/sh -ex setup-wrapper.sh python3 mr_word_count.py --step-num=0 --combiner', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 mr_word_count.py --step-num=0 --reducer']' returned non-zero exit status 512.

似乎运行程序应该将我的程序复制到/tmp/mapred/staging/,但是没有,所以我怀疑我在某个地方丢失了配置。python代码只是本地的,输入文件是hdfs格式的。
我在这里看到了一堆问题,它们都存在着几乎相同的错误(特别是this和this),但是对配置xmls所做的任何更改都没有修复这个错误。如果我在本地运行它就可以了( -r local )或内联( -r inline )模式,但不是hadoop运行程序( -r hadoop ).
这是我要运行的程序:https://gist.github.com/k4v/5d0d1425977fe7e228e7a1e538f72d68
hadoop配置文件:
core-site.xml文件
hdfs-site.xml文件
mapred-site.xml(我不使用yarn,因为它会导致任何mapreduce作业挂起在机器的1gbram上)
正在运行以下进程:

  1. $ jps
  2. 23283 Jps
  3. 21846 NodeManager
  4. 21545 SecondaryNameNode
  5. 21674 ResourceManager
  6. 21325 DataNode
  7. 21149 NameNode

请帮我找出我遗漏了什么。谢谢您。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题