使用mrjob从csv查找最大值

ubby3x7f  于 2021-05-27  发布在  Hadoop
关注(0)|答案(0)|浏览(263)

具有如下csv文件:

date,value,id_point,coordinateX,coordinateY,station   
{'$date': '2020-03-28T04:45:00Z'},0,A107,39.45855,-0.45851,AvFrance  
{'$date': '2020-02-28T04:45:00Z'},45,A122,39.45855,-0.45851,AvSpain   
{'$date': '2020-04-28T04:45:00Z'},33,A107,39.45855,-0.45851,AvFrance   
{'$date': '2020-05-28T04:45:00Z'},12,A133,39.45855,-0.45851,AvItaly   
{'$date': '2020-06-28T04:45:00Z'},0,A107,39.45855,-0.45851,AvFrance   
{'$date': '2020-07-28T04:45:00Z'},77,A117,39.45855,-0.45851,AvSpain   
{'$date': '2020-08-28T04:45:00Z'},46,A122,39.45855,-0.45851,AvSpain   
{'$date': '2020-09-28T04:45:00Z'},51,A198,39.45855,-0.45851,AvItaly

我需要使用mrjob类来编写一个mapreduce软件,为每个站点查找最大流量值,并显示该值的日期。我试着这样做:

from mrjob.job import MRJob   

class Maxim(MRJob):

    def mapper(self, key, line):
        (date,value,id_point,coordinateX,coordinateY,station) = line.split(',')
        if date !='date':
            yield str(station), int(value)

    def reducer(self, station, value, date):
        yield station, date, max(value)

if __name__ == '__main__':
    Maxim.run()

找不到问题,请帮忙。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题