apachepig:计算日期和当前日期之间的天数

von4xj4u  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(452)

我有一个电影列表,格式如下(#,标题,年份,分级,持续时间):

1,The Nightmare Before Christmas,1993,3.9,4568
2,The Mummy,1932,3.5,4388
3,Orphans of the Storm,1921,3.2,9062
4,The Object of Beauty,1991,2.8,6150
5,Night Tide,1963,2.8,5126
6,One Magic Christmas,1985,3.8,5333
7,Muriel's Wedding,1994,3.5,6323
8,Mother's Boys,1994,3.4,5733
9,Nosferatu: Original Version,1929,3.5,5651
10,Nick of Time,1995,3.4,5333
...

我在每个元组中都有年份,我需要把它当作 1st Jan of each year .
我需要计算这个日期和当前日期之间的天数
我的方法:

movies = LOAD 'movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
daysbetween_data = foreach movies generate DaysBetween(ToDate(year,'<WHAT FORMAT TO GIVE HERE>'), ToDate(<CURRENT DATE HERE>));

你知道怎么做吗?

r1zk6ea1

r1zk6ea11#

将年份加载到chararray字段,使用concat将01-01-附加到年份字段,以便获得格式“mm dd yyyy”,然后使用todate和daysbetween。

movies = LOAD 'movies_data.csv' USING PigStorage(',') as (id:int,name:chararray,year:chararray,rating:double,duration:int);
daysbetween_data = foreach movies generate DaysBetween(ToDate(CONCAT('01-01-',year),'MM-dd-yyyy'),CurrentTime());

相关问题