我正在尝试在泰坦尼克号数据上使用一些pig函数。有一次我把范围缩小到乘客等级和票价(票价):
代码如下:
sh echo "1. create FarePclass with two fields"
FarePclass = FOREACH train GENERATE Pclass,Fare ;
DUMP FarePclass;
sh echo "2. create FareByClass grouping by Pclass"
FareByPclass = GROUP FarePclass BY Pclass ;
--FareByPclass = GROUP FarePclass ALL;
--DUMP FareByPclass;
DESCRIBE FareByPclass;
sh echo "3. get average"
AvgFareByPclass = FOREACH FareByPclass GENERATE (float) SUM(FarePclass.Fare);
下面是步骤#1和输出中dump语句的一些示例行:
(2,10.5)
(3,7.05)
(3,29.125)
(2,13)
(1,30)
(3,23.45)
(1,30)
(3,7.75)
2. create FareByClass grouping by Pclass
FareByPclass: {group: chararray,FarePclass: {(Pclass: chararray,Fare: chararray)}}
3. get average
2014-08-28 20:56:23,288 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 1045:
<file titanic_dypler_datafu.pig, line 36, column 56> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
我有这个脚本,我正试图让最后一行运行。avgfarebypclass=foreach farebypclass generate(float)sum(farepclass.fare);
我在尝试运行它时遇到这个错误:cannotcastbag with schema:bag{:tuple(fare:chararray)}漂浮。
你能建议怎么投票价吗?我是不是在概念上遗漏了一些关于如何处理这件事的东西?
1条答案
按热度按时间flseospp1#
在你已经尝试过求和之后,把chararray票价转换成浮点数已经太迟了;它们必须是数字才能求和。可能最明智的转换是在farepclass的第一个投影中: