我使用的是合并器以及Map器和还原器。
我的Map程序代码如下:
# !/usr/bin/env python
import sys
import datetime
def main():
for line in sys.stdin:
data = line.strip().split("\t")
if len(data) != 6:
continue
try:
float(data[4])
except:
continue
else: # no exception
sale_date = data[0]
ymd = sale_date.split('-')
date_obj = datetime.date(int(ymd[0]), int(ymd[1]), int(ymd[2]))
print "{0}\t{1}".format(date_obj.weekday(), data[4])
main()
我的减速机代码如下:
# !/usr/bin/env python
# this reducer also acts as a combiner
import collections
import sys
def main():
sales_counter = collections.defaultdict(int)
sales_sum = collections.defaultdict(float)
for line in sys.stdin:
data = line.strip().split("\t")
if len(data) == 3: # acting as reducer
sales_counter[data[0]] = sales_counter[data[0]] + int(data[1])
sales_sum[data[0]] = sales_sum[data[0]] + float(data[2])
elif len(data) == 2: # acting as combiner
sales_counter[data[0]] = sales_counter[data[0]] + 1
sales_sum[data[0]] = sales_sum[data[0]] + float(data[1])
else:
continue # invalid line read, ignore
for key in sorted(sales_sum):
print key,"\t",sales_counter[key],"\t",sales_sum[key]
main()
数据文件格式如下(仅显示前10行):
2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex
2012-01-01 09:00 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Diego Music 66.08 Cash
2012-01-01 09:00 Pittsburgh Pet Supplies 493.51 Discover
2012-01-01 09:00 Omaha Children's Clothing 235.63 MasterCard
2012-01-01 09:00 Stockton Men's Clothing 247.18 MasterCard
2012-01-01 09:00 Austin Cameras 379.6 Visa
2012-01-01 09:00 New York Consumer Electronics 296.8 Cash
2012-01-01 09:00 Corpus Christi Toys 25.38 Discover
2012-01-01 09:00 Fort Worth Toys 213.88 Visa
结果如下:
0 34034 8529272.78
0 567400 141834839.29
1 22715 5660345.68
1 566889 141586312.46
2 22611 5625669.74
2 555219 138745830.2
3 22666 5633975.27
3 567051 141719805.3
4 25365 6363847.75
4 563769 141051081.75
5 34131 8560716.09
5 555310 138849461.48
6 34071 8503163.7
6 567245 141793631.77
我希望每个键只能看到一个条目(第一列)。将每个键的部分结果结合起来,就可以得到正确的结果。但我的问题是为什么每个键都有部分结果?
暂无答案!
目前还没有任何答案,快来回答吧!