我正在尝试用hadoop流动作编写一个工作流,它执行一个awk程序,下面是我的场景
hadoop流式处理命令在客户端可以正常工作。但是,当作为oozie工作流执行时,它无法工作,因为它无法找到第二个文件。请注意,awk脚本位于本地主目录中,本地主目录也安装在hadoop上,输入路径位于示例中的hdfs上
在cli中,我还附加了我在hue中配置的流式工作流,该工作流没有按预期工作。
/usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.3.0-mr1-cdh5.1.0.jar -D mapreduce.job.reduces=0 -D mapred.reduce.tasks=0 -input /user/cloudera/input/file1 /user/cloudera/input/file2 -output /user/cloudera/awk/ouput -mapper /home/cloudera/diff_files/op_code/sample.awk -file /home/cloudera/diff_files/op_code/sample.awk
Workflow.xml
------------------
<workflow-app name="awk" xmlns="uri:oozie:workflow:0.4">
<global>
<configuration>
<property>
<name></name>
<value></value>
</property>
</configuration>
</global>
<start to="awk-streaming"/>
<action name="awk-streaming" cred="">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<streaming>
<mapper>/home/clouderasample.awk</mapper>
<reducer>/home/clouderasample.awk</reducer>
</streaming>
<configuration>
<property>
<name>mapred.output.dir</name>
<value>/user/cloudera/awk/output</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/user/cloudera/awk/input</value>
</property>
</configuration>
<file>/user/cloudera/awk/input/file1#file1</file>
<file>/user/cloudera/awk/input/file2#file2</file>
</map-reduce>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
1条答案
按热度按时间bt1cpqcv1#
请查看此链接了解更多详细信息http://wiki.apache.org/hadoop/jobconffile