myreducer不能完全减少数据

fumotvh3  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(169)

我使用的是合并器以及Map器和还原器。
我的Map程序代码如下:


# !/usr/bin/env python

import sys
import datetime

def main():
    for line in sys.stdin:
        data = line.strip().split("\t")
        if len(data) != 6:
            continue
        try:
            float(data[4])
        except:
            continue
        else: # no exception
            sale_date = data[0]
            ymd = sale_date.split('-')
            date_obj = datetime.date(int(ymd[0]), int(ymd[1]), int(ymd[2]))
            print "{0}\t{1}".format(date_obj.weekday(), data[4])

main()

我的减速机代码如下:


# !/usr/bin/env python

# this reducer also acts as a combiner

import collections
import sys

def main():

    sales_counter = collections.defaultdict(int)
    sales_sum = collections.defaultdict(float)

    for line in sys.stdin:
        data = line.strip().split("\t")
        if len(data) == 3: # acting as reducer
            sales_counter[data[0]] = sales_counter[data[0]] + int(data[1])
            sales_sum[data[0]] = sales_sum[data[0]] + float(data[2])
        elif len(data) == 2: # acting as combiner
            sales_counter[data[0]] = sales_counter[data[0]] + 1
            sales_sum[data[0]] = sales_sum[data[0]] + float(data[1])
        else:
            continue # invalid line read, ignore

    for key in sorted(sales_sum):
        print key,"\t",sales_counter[key],"\t",sales_sum[key]

main()

数据文件格式如下(仅显示前10行):

2012-01-01  09:00   San Jose    Men's Clothing  214.05  Amex
2012-01-01  09:00   Fort Worth  Women's Clothing    153.57  Visa
2012-01-01  09:00   San Diego   Music   66.08   Cash
2012-01-01  09:00   Pittsburgh  Pet Supplies    493.51  Discover
2012-01-01  09:00   Omaha   Children's Clothing 235.63  MasterCard
2012-01-01  09:00   Stockton    Men's Clothing  247.18  MasterCard
2012-01-01  09:00   Austin  Cameras 379.6   Visa
2012-01-01  09:00   New York    Consumer Electronics    296.8   Cash
2012-01-01  09:00   Corpus Christi  Toys    25.38   Discover
2012-01-01  09:00   Fort Worth  Toys    213.88  Visa

结果如下:

0   34034   8529272.78
0   567400  141834839.29
1   22715   5660345.68
1   566889  141586312.46
2   22611   5625669.74
2   555219  138745830.2
3   22666   5633975.27
3   567051  141719805.3
4   25365   6363847.75
4   563769  141051081.75
5   34131   8560716.09
5   555310  138849461.48
6   34071   8503163.7
6   567245  141793631.77

我希望每个键只能看到一个条目(第一列)。将每个键的部分结果结合起来,就可以得到正确的结果。但我的问题是为什么每个键都有部分结果?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题