在pig中存储日期和时间

6ljaweal  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(282)

我试图存储一个txt文件,其中有两列日期和时间分别。大概是这样的:1999-01-01 12:08:56
现在我想使用pig执行一些日期操作,但是我想存储日期和时间,比如1999-01-01t12:08:56(我选中了这个链接):http://docs.oracle.com/javase/6/docs/api/java/text/simpledateformat.html
我想知道的是,我可以使用什么样的格式,我的日期和时间在一列中,这样我就可以把它输入到pig中,然后如何将该日期加载到pig中。我知道我们把它改成了datetime,但它显示错误。有人能告诉我如何把日期和时间数据一起加载吗。举个例子会很有帮助。

kb5ga3dv

kb5ga3dv1#

请让我知道这是否适合你。

input.txt  
1999-01-01 12:08:56  
1999-01-02 12:08:57  
1999-01-03 12:08:58  
1999-01-04 12:08:59  

PigScript:  
A = LOAD 'input.txt' using PigStorage(' ') as(date:chararray,time:chararray);  
B = FOREACH A GENERATE CONCAT(date,'T',time) as myDateString;  
C = FOREACH B GENERATE ToDate(myDateString);  
dump C;  

Output:  
(1999-01-01T12:08:56.000+05:30)  
(1999-01-02T12:08:57.000+05:30)  
(1999-01-03T12:08:58.000+05:30)  
(1999-01-04T12:08:59.000+05:30)  

Now the myDateString is in date object, you can process this data using all the build in date functions.

Incase if you want to store the output as in this format 
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)

you can use REGEX_EXTRACT to parse the each data till "."  something like this  

D = FOREACH C GENERATE ToString($0) as temp;
E = FOREACH D GENERATE REGEX_EXTRACT(temp, '(.*)\\.(.*)', 1);
dump E;

Output:
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)

相关问题