aws emr上的流程序不工作

我目前正试图建立一个非常简单的流媒体程序（wordcount的例子），但我不能让它成功运行。
当我从命令行启动mapper和reducer脚本时，它们运行起来没有任何错误( cat test.txt | python Mapper.py | sort | python Reducer.py )但将其作为emr作业运行总是失败的。我的测试文件只是一个lorem ipsum段落。
这是我的Map


# !/usr/bin/python

import sys
import re

for line in sys.stdin:
    print >> sys.stderr, line
    line = line.strip()
    words = line.split()
    for word in words:
        print '%s %s' % (word, 1)
        print >> sys.stderr, word

这是我的


# !/usr/bin/python

from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

print >> sys.stderr, 'Started reducing'

# input comes from STDIN

for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()

    # parse the input we got from mapper.py
    word, count = line.split(' ', 1)

    print >> sys.stderr, 'Mapper input %s &s' % (word, count)

    # convert count (currently a string) to int
    try:
        count = int(count)
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
        continue

    # this IF-switch only works because Hadoop sorts map output
    # by key (here: word) before it is passed to the reducer
    if current_word == word:
        current_count += count
    else:
        if current_word:
            # write result to STDOUT
            print '%s %s' % (current_word, current_count)
        current_count = count
        current_word = word

# do not forget to output the last word if needed!

if current_word == word:
    print '%s %s' % (current_word, current_count)

控制器日志：

2016-07-29T05:50:05.872Z INFO Ensure step 14 jar file command-runner.jar
2016-07-29T05:50:05.874Z INFO StepRunner: Created Runner for step 14
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar hadoop-streaming -files s3://cg-das-bucket-ireland/Mapper.py,s3://cg-das-bucket-ireland/Reducer.py -mapper Mapper.py -reducer Reducer.py -input s3://cg-das-bucket-ireland/input/ -output s3://cg-das-bucket-ireland/output/'
INFO Environment:
  TERM=linux
  CONSOLETYPE=serial
  SHLVL=5
  JAVA_HOME=/etc/alternatives/jre
  HADOOP_IDENT_STRING=hadoop
  LANGSH_SOURCED=1
  XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
  HADOOP_ROOT_LOGGER=INFO,DRFA
  AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
  UPSTART_JOB=rc
  MAIL=/var/spool/mail/hadoop
  EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
  PWD=/
  HOSTNAME=ip-172-31-27-208
  LESS_TERMCAP_se=[0m
  LOGNAME=hadoop
  UPSTART_INSTANCE=
  AWS_PATH=/opt/aws
  LESS_TERMCAP_mb=[01;31m
  _=/etc/alternatives/jre/bin/java
  LESS_TERMCAP_me=[0m
  NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
  LESS_TERMCAP_md=[01;38;5;208m
  runlevel=3
  AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
  UPSTART_EVENTS=runlevel
  HISTSIZE=1000
  previous=N
  HADOOP_LOGFILE=syslog
  PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
  EC2_HOME=/opt/aws/apitools/ec2
  HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-14S2ZME8BF550
  LESS_TERMCAP_ue=[0m
  AWS_ELB_HOME=/opt/aws/apitools/elb
  RUNLEVEL=3
  USER=hadoop
  HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-14S2ZME8BF550/tmp
  PREVLEVEL=N
  HOME=/home/hadoop
  HISTCONTROL=ignoredups
  LESSOPEN=||/usr/bin/lesspipe.sh %s
  AWS_DEFAULT_REGION=eu-west-1
  LANG=en_US.UTF-8
  LESS_TERMCAP_us=[04;38;5;111m
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-14S2ZME8BF550/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-14S2ZME8BF550/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-14S2ZME8BF550
INFO ProcessRunner started child process 14342 :
hadoop   14342  2311  0 05:50 ?        00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar hadoop-streaming -files s3://cg-das-bucket-ireland/Mapper.py,s3://cg-das-bucket-ireland/Reducer.py -mapper Mapper.py -reducer Reducer.py -input s3://cg-das-bucket-ireland/input/ -output s3://cg-das-bucket-ireland/output/
2016-07-29T05:50:09.949Z INFO HadoopJarStepRunner.Runner: startRun() called for s-14S2ZME8BF550 Child Pid: 14342
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 1 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 152 seconds
2016-07-29T05:52:40.083Z INFO Step created jobs: job_1469088917715_0008
2016-07-29T05:52:40.084Z WARN Step failed as jobs it created failed. Ids:job_1469088917715_0008

错误日志：

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:332)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

我还尝试过在没有reducer脚本的情况下运行aws wordcount示例，只是聚合，但没有成功。
任何帮助这个问题将不胜感激！

aws emr上的流程序不工作

暂无答案！

相关问题

热门标签

最新问答