使用python的hadoop emr

jtoj6r0c 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(320)

我使用hadoop流来使用python中的mapper和reducer代码来运行mapreduce作业。我在s3中输入了数据，我正在尝试将其用于工作。但是，当我像这样运行命令-->

bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file aish1/mapperi.py  
-mapper  aish1/mapperi.py -file aish1/reduceri.py -reducer aish1/reduceri.py 
-file s3://INLOCATION -input s3://INLOCATION -output s3://OUTLOCATION

我得到一个错误：

File: /home/hadoop/s3:/INLOCATION does not exist, or is not readable. 
Streaming Command  Failed!

我不明白为什么它会在我的s3位置前面添加/home/hadoop/。任何帮助都将不胜感激！

hadoop python emr

来源：https://stackoverflow.com/questions/24318164/hadoop-emr-using-python

1条答案

按热度按时间

carvr3hs1#

不要使用 -file 准备 input . 论元 -file 当您想使用本地文件系统中的文件时，应该使用，所以hadoop会将它们上传到hdfs。在您的情况下，输入已经在适当的位置。
更改您的调用： bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file aish1/mapperi.py -mapper aish1/mapperi.py -file aish1/reduceri.py -reducer aish1/reduceri.py -input s3://INLOCATION -output s3://OUTLOCATION

赞(0）回复(0）举报 2021-06-04

我来回答

使用python的hadoop emr

1条答案

相关问题

热门标签

最新问答