pig-filter或者如何进入一个包或元组

8aqjt8rx  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(350)

正如你所看到的,我们可以对第一个应用过滤器,因为我们可以在温度上使用骨料。现在我们如何在字符串上应用第二个过滤器?
我们只是想在天气晴朗且部分多云的情况下过滤e。

Weather = LOAD 'hdfs:/home/hduser/final/Weather.csv' USING PigStorage(',');
A = FOREACH Weather GENERATE (int)$0 AS year, (int)$1 AS month, (int)$2 AS day, (int)$4 AS temp, $14 AS cond, (double)$5 as dewpoint , (double)$10 as wind;

group_by_day = GROUP A BY (year,month,day);

架构:

{day: (year: int,month: int, day: int), temperature {(temp: int)},                   

   condition: {cond: bytearray)}, dewPoint: {(dewpoint: double)} windSpeed:

   {(wind: double)}}
juud5qan

juud5qan1#

您必须在下面的语句中将cond强制转换为chararray。由于您没有在load语句中指定数据类型,所有字段都将加载为bytearray。这是pigstorage选择的默认数据类型。

A = FOREACH Weather GENERATE (int)$0 AS year, (int)$1 AS month, (int)$2 AS day, (int)$4 AS temp, (chararray)$14 AS cond, (double)$5 as dewpoint , (double)$10 as wind;

编辑
我可以用bagtostring函数得到结果,你可以用一步过滤。

D = FILTER C BY (MIN(temperature) >= 60 AND MAX(temperature) <= 79) AND (BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy');

或者你的情况呢

f = FILTER e BY BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy';

相关问题