我对pig编程很陌生,而且我与多个字段有关系(我正在简化下面这个示例中的模式)。我做了多次计算,最后我试着加入结果。但是我没有得到任何结果,如果我运行一个描述,那么这个模式似乎是正确的。另外,在查看语法检查时,唯一吸引我注意的是这个警告:warn org.apache.pig.pigserver-遇到警告隐式\u cast \u to \u chararray。
输入
(123,1,-52.39,-1,2006-05-15)
(123,1,-52.39,-1,2007-04-04)
(123,2,-55.15,-1,2006-05-15)
(123,3,-49.64,-1,2006-05-15)
(123,4,52.39,1,2006-05-15)
(123,4,-52.39,-1,2007-04-04)
(123,4,52.39,1,2007-04-04)
(123,4,-52.39,-1,2007-04-09)
(123,5,86.86,1,2007-04-04)
(123,5,-86.86,-1,2007-04-09)
期望输出:
(123,1,-104.78,-2,2007-04-04)
(123,2,-55.15,-1,2006-05-15)
(123,3,-49.64,-1,2006-05-15)
(123,4,0,0,2007-04-09)
(123,5,0,0,2007-04-09)
c1 = load 'file.csv' using PigStorage(',') as (ID, LN, PAY_AMT:double,UNIT_QTY:int, PD_DT);
c2 = FOREACH c1 GENERATE ID, LN, PAY_AMT, UNIT_QTY;
c3 = group c2 by (ID, LN);
c3agg = FOREACH c3 GENERATE FLATTEN(group) as (ID,LN),
SUM(c2.PAY_AMT) as PdAmt, SUM(c2.UNIT_QTY) as Unit_qty;
描述c3agg;
c3agg:{id:bytearray,ln:bytearray,pdamt:double,unit\u qty:long}
所以现在我试图得到max(pd_dt),因为使用实际的max操作符是行不通的(或者至少我不能在不使用下面的代码的情况下找到它)。
c4 = foreach c1 generate ID, LN, PD_DT;
c5 = group c4 by (ID, LN);
c3dt = FOREACH c5 { -- get MAX(PD_DT),
c5ord = ORDER c4 by PD_DT DESC;
c5lmt = LIMIT c5ord 1;
GENERATE FLATTEN(c5lmt);};
描述c3dt;
c3dt:{c5lmt::id:bytearray,c5lmt::ln:bytearray,c5lmt::pd\u dt:bytearray}
正在尝试加入,但不返回任何内容:
cj = JOIN c3agg BY (ID, LN), c3dt BY (ID, LN);
dump cj;
我试着使用字段位置,但结果相同。cj=join c3agg by($0,$1),c3dt by($0,$1);
describe cj;
cj: {c3agg::ID: bytearray,c3agg::LN: bytearray,c3agg::PdAmt: double,c3agg::Unit_qty: long,c3dt::c5lmt::ID: bytearray,c3dt::c5lmt::LN: bytearray,c3dt::c5lmt::PD_DT: bytearray}
另外,我尝试定义字段类型,例如id:chararray和ln:int,但仍然没有结果。我真的不明白我做错了什么?
谢谢您!
暂无答案!
目前还没有任何答案,快来回答吧!