如何在使用流jar时在oozie中提到组合器

aydmsdu9 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(351)

我正在通过oozie打电话给一个流媒体工作。我能够运行这个成功与Map器和缩小。但我不明白的是，我如何通过合路器。我所有的Map器、还原器和合并器都是用python编写的。这样行吗？

<map-reduce>
 <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <prepare>
        <delete path="${HADOOP_LIB}/OutPath"/>
    </prepare>
    <streaming>
        <mapper>python mapper.py</mapper>
        <combiner>python combiner.py</combiner>
        <reducer>python reducer.py</reducer>

    </streaming>
    <configuration>
        <property>
            <name>mapred.input.dir</name>
            <value>${HADOOP_LIB}/input</value>
        </property>
        <property>
            <name>mapred.output.dir</name>
            <value>${HADOOP_LIB}/OutPath</value>
        </property>
    </configuration>
    <file>mapper.py</file>
    <file>combiner.py</file>
    <file>reducer.py</file>
</map-reduce>

我找不到任何使用标签的地方。或者，我可以在shell脚本中使用streaming jar命令和-combiner选项，并从oozie调用该作业。

hadoop oozie python combiners

来源：https://stackoverflow.com/questions/25340961/how-to-mention-a-combiner-in-oozie-while-using-streaming-jar