具有如下csv文件:
date,value,id_point,coordinateX,coordinateY,station
{'$date': '2020-03-28T04:45:00Z'},0,A107,39.45855,-0.45851,AvFrance
{'$date': '2020-02-28T04:45:00Z'},45,A122,39.45855,-0.45851,AvSpain
{'$date': '2020-04-28T04:45:00Z'},33,A107,39.45855,-0.45851,AvFrance
{'$date': '2020-05-28T04:45:00Z'},12,A133,39.45855,-0.45851,AvItaly
{'$date': '2020-06-28T04:45:00Z'},0,A107,39.45855,-0.45851,AvFrance
{'$date': '2020-07-28T04:45:00Z'},77,A117,39.45855,-0.45851,AvSpain
{'$date': '2020-08-28T04:45:00Z'},46,A122,39.45855,-0.45851,AvSpain
{'$date': '2020-09-28T04:45:00Z'},51,A198,39.45855,-0.45851,AvItaly
我需要使用mrjob类来编写一个mapreduce软件,为每个站点查找最大流量值,并显示该值的日期。我试着这样做:
from mrjob.job import MRJob
class Maxim(MRJob):
def mapper(self, key, line):
(date,value,id_point,coordinateX,coordinateY,station) = line.split(',')
if date !='date':
yield str(station), int(value)
def reducer(self, station, value, date):
yield station, date, max(value)
if __name__ == '__main__':
Maxim.run()
找不到问题,请帮忙。
暂无答案!
目前还没有任何答案,快来回答吧!