在pig中使用regex\u提取方法时打印空白

svujldwt  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(332)

我想拆分字符串进行面积转换。我有这样的数据。

(149Sq.Yards)
(151Sq.Yards)
(190Sq.Yards)
(190Sq.Yards)

我想像这样拆分上面的数据。

149  sq.yards
151  sq.yards

我尝试了以下代码。

a = LOAD '/user/ahmedabad/Makkan_PropertyDetails_Apartment_Ahmedabad.csv' using PigStorage('\t') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray,PriceperArea:chararray,NoOfBedRooms:int,NoOfBathRooms:int,FloorNoOfProperty:chararray,TotalFloors:int,Possession:chararray,BuiltUpArea:chararray,Furnished:chararray,Ownership:chararray,NewResale:chararray,Facing:chararray,title:chararray,PropertyAddress:chararray,NearByFacilities:chararray,PropertyFeatures:chararray,Sellerinfo:chararray,Description:chararray);
b = FOREACH a GENERATE BuiltUpArea; 
c = FILTER b BY (BuiltUpArea matches '.*Sq.Yards.*');
d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(.*)', 1) * 9;

同时转储d。它打印为空。

xlpyo6sf

xlpyo6sf1#

您提到的正则表达式将匹配所有字符,因此它将尝试像这样进行乘法 (149Sq.Yards * 9) . 这就是输出为null的原因。
下面的正则表达式将从输入中单独拆分数字,然后像这样进行乘法 (149 * 9) .

d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(^[0-9]+)', 1) * 9;
dump d;

相关问题