输入记录与pythonmapreduce中的输出记录不匹配

mm5n2pyu 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(424)

我正在用python编写一个map-reduce程序。当我使用-

cat input.csv|python mapper.py > output.tsv

但是当我使用下面的命令运行它时，我没有得到想要的输出-

nohup hadoop jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/tools/lib/hadoop-streaming-2.7.0-mapr-1607.jar -Dmapreduce.job.queuename=queue_name -Dmapred.map.tasks=1000 -Dmapred.reduce.tasks=0 -input /path/sample_reduce.csv -output /path/map_output -mapper "mapper_try.py" -reducer NONE -file mapper_try.py > mapp_try2.out &

上面说这项工作已经成功地完成了，但我也得到了以下信息-

Map-Reduce Framework
            Map input records=1096
            Map output records=92
            Input split bytes=122610
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=0
            CPU time spent (ms)=840560
            Physical memory (bytes) snapshot=353314721792
            Virtual memory (bytes) snapshot=4310996582400
            Total committed heap usage (bytes)=2036214005760