我目前正试图建立一个非常简单的流媒体程序(wordcount的例子),但我不能让它成功运行。
当我从命令行启动mapper和reducer脚本时,它们运行起来没有任何错误( cat test.txt | python Mapper.py | sort | python Reducer.py
)但将其作为emr作业运行总是失败的。我的测试文件只是一个lorem ipsum段落。
这是我的Map
# !/usr/bin/python
import sys
import re
for line in sys.stdin:
print >> sys.stderr, line
line = line.strip()
words = line.split()
for word in words:
print '%s %s' % (word, 1)
print >> sys.stderr, word
这是我的
# !/usr/bin/python
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
print >> sys.stderr, 'Started reducing'
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
word, count = line.split(' ', 1)
print >> sys.stderr, 'Mapper input %s &s' % (word, count)
# convert count (currently a string) to int
try:
count = int(count)
except ValueError:
# count was not a number, so silently
# ignore/discard this line
continue
# this IF-switch only works because Hadoop sorts map output
# by key (here: word) before it is passed to the reducer
if current_word == word:
current_count += count
else:
if current_word:
# write result to STDOUT
print '%s %s' % (current_word, current_count)
current_count = count
current_word = word
# do not forget to output the last word if needed!
if current_word == word:
print '%s %s' % (current_word, current_count)
控制器日志:
2016-07-29T05:50:05.872Z INFO Ensure step 14 jar file command-runner.jar
2016-07-29T05:50:05.874Z INFO StepRunner: Created Runner for step 14
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar hadoop-streaming -files s3://cg-das-bucket-ireland/Mapper.py,s3://cg-das-bucket-ireland/Reducer.py -mapper Mapper.py -reducer Reducer.py -input s3://cg-das-bucket-ireland/input/ -output s3://cg-das-bucket-ireland/output/'
INFO Environment:
TERM=linux
CONSOLETYPE=serial
SHLVL=5
JAVA_HOME=/etc/alternatives/jre
HADOOP_IDENT_STRING=hadoop
LANGSH_SOURCED=1
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_ROOT_LOGGER=INFO,DRFA
AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
UPSTART_JOB=rc
MAIL=/var/spool/mail/hadoop
EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
PWD=/
HOSTNAME=ip-172-31-27-208
LESS_TERMCAP_se=[0m
LOGNAME=hadoop
UPSTART_INSTANCE=
AWS_PATH=/opt/aws
LESS_TERMCAP_mb=[01;31m
_=/etc/alternatives/jre/bin/java
LESS_TERMCAP_me=[0m
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
LESS_TERMCAP_md=[01;38;5;208m
runlevel=3
AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
UPSTART_EVENTS=runlevel
HISTSIZE=1000
previous=N
HADOOP_LOGFILE=syslog
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
EC2_HOME=/opt/aws/apitools/ec2
HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-14S2ZME8BF550
LESS_TERMCAP_ue=[0m
AWS_ELB_HOME=/opt/aws/apitools/elb
RUNLEVEL=3
USER=hadoop
HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-14S2ZME8BF550/tmp
PREVLEVEL=N
HOME=/home/hadoop
HISTCONTROL=ignoredups
LESSOPEN=||/usr/bin/lesspipe.sh %s
AWS_DEFAULT_REGION=eu-west-1
LANG=en_US.UTF-8
LESS_TERMCAP_us=[04;38;5;111m
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-14S2ZME8BF550/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-14S2ZME8BF550/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-14S2ZME8BF550
INFO ProcessRunner started child process 14342 :
hadoop 14342 2311 0 05:50 ? 00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar hadoop-streaming -files s3://cg-das-bucket-ireland/Mapper.py,s3://cg-das-bucket-ireland/Reducer.py -mapper Mapper.py -reducer Reducer.py -input s3://cg-das-bucket-ireland/input/ -output s3://cg-das-bucket-ireland/output/
2016-07-29T05:50:09.949Z INFO HadoopJarStepRunner.Runner: startRun() called for s-14S2ZME8BF550 Child Pid: 14342
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 1 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 152 seconds
2016-07-29T05:52:40.083Z INFO Step created jobs: job_1469088917715_0008
2016-07-29T05:52:40.084Z WARN Step failed as jobs it created failed. Ids:job_1469088917715_0008
错误日志:
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:332)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
我还尝试过在没有reducer脚本的情况下运行aws wordcount示例,只是聚合,但没有成功。
任何帮助这个问题将不胜感激!
暂无答案!
目前还没有任何答案,快来回答吧!