=== input3.xml =====
<document>
<url>htp://www.abc.com/</url>
<category>Sports</category>
<usercount>120</usercount>
<reviews>
<review>good site</review>
<review>This is Avg site</review>
<review>Bad site</review>
</reviews>
</document>
A = LOAD'input3.xml' using
org.apache.pig.piggybank.storage.XMLLoader('document').HBaseStorage as
(data:chararray);
B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(data,'(?s)<document>.*?<url>
([^>]*?)</url>.*?<category>([^>]*?)</category>.*?<usercount>([^>]*?)</usercount>.*?
<reviews>.*?<review>\\s*([^>]*?)\\s*</review>.*?</reviews>.*?</document>')) as
(url:chararray,catergory:chararray,usercount:int,review:chararray);
1条答案
按热度按时间xzv2uavs1#
我给出了使用input3.xml文件将pig加载到hbase的示例。