我正在研究hadoop,但是当我尝试提交mapreduce作业时,hadoop似乎开始了,但随后挂起,没有任何进展或其他活动的迹象。申请状态页面说它已经提交并显示了工作,但什么也没有发生,我很好奇在哪里可以找到解决这个问题的方法。
我使用的是hadoop版本2.7.1,安装在OSX10.10.4中,使用自制和Java1.8.0\U45。我按照以下说明进行了配置:https://datarecipe.wordpress.com/2015/06/05/setup-hadoop-2-6-on-mac-osx-10-9/
数据是一个名为“purchases.txt”的简单文本文件,包含以下内容(以制表符分隔):
2013-03-29 2:30 miami cup 2.43 visa
2013-04-23 1:34 miami cup 2.43 visa
2013-04-23 10:15 LA spoon 1.32 visa
2013-04-28 6:34 LA bottle 3.56 cash
2013-05-23 1:43 miami glass 3.21 visa
我已将其上载到hadoop中(已创建数据文件夹):
hadoop fs -put purchases.txt /data/
然后,我在python中设置了以下Map器(根据在线教程),并将其命名为“mapper.py”:
import sys
def mapper():
for line in sys.stdin:
tempdata = line.strip().split("\n")
for l in tempdata:
if (len(l.split("\t")) == 6):
date, time, store, item, cost, payment = l.split("\t")
print("{0}\t{1}".format(store,cost))
def main():
mapper()
if __name__=="__main__":
main()
我对reducer代码也做了同样的操作,并将其命名为“reducer.py”:
import sys
def reducer():
salesTotal = 0
oldKey = None
for line in sys.stdin:
data = line.strip().split("\t")
if len(data)!=2:
continue
thisKey, thisSale = data
if oldKey and oldKey != thisKey:
print("{0}\t{1}".format(oldKey,salesTotal))
salesTotal=0
oldKey = thisKey
salesTotal+=float(thisSale)
if oldKey != None:
print("{0}\t{1}".format(oldKey,salesTotal))
def main():
reducer()
if __name__=="__main__":
main()
在命令行上测试这些代码:
Tophers-Retina-MBP:Hadoop tkessler$ cat purchases.txt | ./mapper.py | sort | ./reducer.py
LA 4.88
miami 5.640000000000001
但是,当我在hadoop中运行流进程来运行它时,它只是在这里暂停:
Tophers-Retina-MBP:lib tkessler$ hadoop jar ./hadoop-streaming-2.7.1.jar -mapper ~/PycharmProjects/Hadoop/mapper.py -reducer ~/PycharmProjects/Hadoop/reducer.py -file ~/PycharmProjects/Hadoop/mapper.py -input /data -output /project1out
packageJobJar: [/Users/tkessler/PycharmProjects/Hadoop/mapper.py, /var/folders/f_/3zvmc1g95lqgt1cp2dtnrtqw0000gp/T/hadoop-unjar2355518779286421017/] [] /var/folders/f_/3zvmc1g95lqgt1cp2dtnrtqw0000gp/T/streamjob8766144507660069606.jar tmpDir=null
它似乎可以很好地启动作业,并接受Map器和reducer,并且运行“mapred job-list all”显示作业都在运行,但它永远不会完成,状态只是列为“unknown”。我不确定这是hadoop配置问题,还是其他问题,如果有人有任何见解的话。
附加:
当我运行下面的示例命令时,进度似乎只停留在以下行:
Tophers-Retina-MBP:~ tkessler$ hadoop jar /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 4 1000
Number of Maps = 4
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
1条答案
按热度按时间g6ll5ycj1#
通过关闭namenode和datanodes清除hadoop,然后使用卸载它
brew uninstall hadoop
然后按照本页上的说明进行设置:http://amodernstory.com/2014/09/23/installing-hadoop-on-mac-osx-yosemite/现在似乎工作得很好,所以可能只是一个轻微的配置更改(可能是临时文件位置),但它现在处理Map器和缩减器非常好。