无法使用pig foreach显示数据

kmbjn2e3  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(555)

我有一个txt文件中的示例数据集 (Format: Firstname,Lastname,age,sex) :

(Eric,Ack,27,M)
(Jenny,Dicken,27,F)
(Angs,Dicken,28,M)
(Mahima,Mohanty,29,F)

我想展示 age 以及 firstname 年龄大于27岁的员工。在进行了一段时间并寻找一些指针之后,我陷入了困境:
我正在使用以下方法加载此数据集:

tuple_record = LOAD '~/Documents/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

描述给我这种格式:

describe tuple_record
tuple_record: {details: (firstname: chararray,lastname: chararray,age: int,sex: chararray)}

然后我用这个来压平记录:

flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);

描述扁平化给了我这个:

describe flatten_tuple_record
flatten_tuple_record: {details::firstname: chararray,details::lastname: chararray,details::age: int,details::sex: chararray}

现在我想根据年龄筛选:

filter_by_age = FILTER flatten_tuple_record BY age > 27;

然后我根据年龄分组:

group_by_age = GROUP filter_by_age BY age;

现在是为了显示名字和年龄;我试过了,但没有成功:

display_details = FOREACH group_by_age GENERATE group,firstname;

下面是错误消息:

2015-02-01 08:39:37,752 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: 
<line 5, column 54> Invalid field projection. Projected field [firstname] does not exist in schema: group:int,filter_by_age:bag{:tuple(details::firstname:chararray,details::lastname:chararray,details::age:int,details::sex:chararray)}

请引导。

qnzebej0

qnzebej01#

你的pig语句看起来不错,但是在按年龄过滤数据之后,你可以直接得到名字和年龄。请遵循以下说明:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

flatten_tuple_record = FOREACH tuple_record GENERATE FLATTEN(details);

describe flatten_tuple_record;

filter_by_age = FILTER flatten_tuple_record BY age > 27;

details = FOREACH filter_by_age GENERATE firstname, age;

dump details;

更新:

这里我们甚至可以跳过flatten语句:

tuple_record = LOAD '/user/cloudera/Pig_Tuple.txt' AS (details:tuple(firstname:chararray,lastname:chararray,age:int,sex:chararray));

describe tuple_record;

filter_by_age = FILTER tuple_record BY details.age > 27;

details = FOREACH filter_by_age GENERATE details.firstname, details.age;

dump details;

在这两种情况下,结果将是:

(Angs,28)
(Mahima,29)

相关问题