pig-filter或者如何进入一个包或元组

8aqjt8rx 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(350)

正如你所看到的，我们可以对第一个应用过滤器，因为我们可以在温度上使用骨料。现在我们如何在字符串上应用第二个过滤器？
我们只是想在天气晴朗且部分多云的情况下过滤e。

Weather = LOAD 'hdfs:/home/hduser/final/Weather.csv' USING PigStorage(',');
A = FOREACH Weather GENERATE (int)$0 AS year, (int)$1 AS month, (int)$2 AS day, (int)$4 AS temp, $14 AS cond, (double)$5 as dewpoint , (double)$10 as wind;

group_by_day = GROUP A BY (year,month,day);

架构：

{day: (year: int,month: int, day: int), temperature {(temp: int)},                   

   condition: {cond: bytearray)}, dewPoint: {(dewpoint: double)} windSpeed:

   {(wind: double)}}

hadoop apache-pig

来源：https://stackoverflow.com/questions/36902661/pig-filter-or-how-to-get-in-side-of-a-bag-or-tuple

1条答案

按热度按时间

juud5qan1#

您必须在下面的语句中将cond强制转换为chararray。由于您没有在load语句中指定数据类型，所有字段都将加载为bytearray。这是pigstorage选择的默认数据类型。

A = FOREACH Weather GENERATE (int)$0 AS year, (int)$1 AS month, (int)$2 AS day, (int)$4 AS temp, (chararray)$14 AS cond, (double)$5 as dewpoint , (double)$10 as wind;

编辑
我可以用bagtostring函数得到结果，你可以用一步过滤。

D = FILTER C BY (MIN(temperature) >= 60 AND MAX(temperature) <= 79) AND (BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy');

或者你的情况呢

f = FILTER e BY BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy';

赞(0）回复(0）举报 2021-05-30

我来回答

pig-filter或者如何进入一个包或元组

1条答案

相关问题

热门标签

最新问答