我有一个如下的数据集。
ravi,savings,avinash,2,char,33,F,22,44,12,13,33,44,22,11,10,22,2006-01-23
avinash,current,sandeep,3,char,44,M,33,11,10,12,33,22,39,12,23,19,2001-02-12
supreeth,savings,prabhash,4,char,55,F,22,12,23,12,44,56,7,88,34,23,1995-03-11
lavi,current,nirmesh,5,char,33,M,11,10,33,34,56,78,54,23,445,66,1999-06-15
Venkat,savings,bunny,6,char,11,F,99,12,34,55,33,23,45,66,23,23,2016-05-18
最后一列(example:2006-01-23)是日期。我正在使用pig命令加载上面的数据。下面是我用来加载文件的代码。
file = LOAD 'FI_USER_CREDS_TBL_T.txt'
USING PigStorage(',') AS (USER_ID:chararray,
ROLE_ID:chararray,
USER_PW:chararray,
NUM_PWD_HISTORY:int,
PWD_HISTORY:chararray,
PWD_LAST_MOD_TIME:int,
NUM_PWD_ATTEMPTS:int,
NEW_USER_FLG:chararray,
LOGIN_TIME_LOW:int,
LOGIN_TIME_HIGH:int,
DISABLED_FROM_DATE:int,
DISABLED_UPTO_DATE:int,
PW_EXPY_DATE:int,
ACCT_EXPY_DATE:int,
ACCT_INACTIVE_DAYS:int,
LAST_ACCESS_TIME:int,
TS_CNT:int,
DTL__CAPXTIMESTAMP:int,
ETL_INSERT_DATE:datetime);
但它并没有读取日期列,而是在使用dump file命令后给出以下输出。
(ravi,savings,avinash,2,char,33,,22,44,12,13,33,44,22,11,10,22,,)
(avinash,current,sandeep,3,char,44,,33,11,10,12,33,22,39,12,23,19,,)
(supreeth,savings,prabhash,4,char,55,,22,12,23,12,44,56,7,88,34,23,,)
(lavi,current,nirmesh,5,char,33,,11,10,33,34,56,78,54,23,445,66,,)
(Venkat,savings,bunny,6,char,11,,99,12,34,55,33,23,45,66,23,23,,)
我怎样才能看日期栏。
请在这方面帮助我。
谢谢你。
2条答案
按热度按时间ru9i0ody1#
以字符形式加载数据。
file=使用pigstorage(',')作为(user\u id:chararray,role\u id:chararray,user\u pw:chararray,num\u pwd)加载'fi\u user\u creds\u tbl\u t.txt'_history:int,密码_history:chararray,pwd\u last\u mod(最后一个模块)_time:int,数字\u pwd_attempts:int,新用户_flg:chararray,登录时间_low:int,登录时间_high:int,从中禁用_date:int,禁用_date:int,密码扩展_date:int,帐户扩展_date:int,账户未激活_days:int,上次访问_time:int,ts号_cnt:int,dtl\ U资本timestamp:int,etl\插入_date:chararray);
--todate内置函数用于转换为datetime,需要指定格式
file2=foreach file generate user\u id,role\u id,user\u pw,num\u pwd\u history,pwd\u history,pwd\u last\u mod\u time,num\u pwd\u attempts,new\u user\u flg,login\u time\u low,login\u time\u high,disabled\u from\u date,disabled\u upto\u date,pw\u expy\u date,acct\u expy\u date,acct\u inactive\u days,last\u access\u time,ts\u cnt,dtl\u capxtimestamp,todate(etl\u insert\u date,'yyyy-mm-dd')as etl\u insert\u date;
描述文件2;转储文件2;
kxeu7u2r2#
将日期加载为字符,然后转换为日期格式
比如:
file2=foreach文件生成todate(date,'dd/mm/yyyy')作为日期,。。。。
请尝试此链接以供参考,http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/builtin/todate.html 或者http://docs.oracle.com/javase/7/docs/api/java/text/simpledateformat.html