使用多个分隔符加载数据

0x6upsns  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(417)

大家好,我对使用apache pig加载数据有问题,文件格式如下:

"1","2","xx,yy","a,sd","3"

所以我想用多重分隔符来加载它 "," 2双引号和一个逗号,如:

A = LOAD 'file.csv' USING PigStorage('","') AS (f1,f2,f3,f4,f5);

但是pigstorage不接受多重分隔符 "," 我该怎么做?非常感谢你!

daolsyd0

daolsyd01#

pigstorage使用单个字符作为分隔符。您将使用piggybank的内置函数。下载piggybank.jar并保存在与pigscript相同的文件夹中。在pigscript中注册jar。

REGISTER piggybank.jar;

DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

A = LOAD 'test1.txt' USING CSVLoader(',') AS (f1:int,f2:int,f3:chararray,f4:chararray,f5:int);
B = FOREACH A GENERATE f1,f2,f3,f4,f5;
DUMP B;

另一种选择是将数据加载到一行中,然后使用strsplit

A = LOAD 'test1.txt' USING TextLoader() AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(line, '","'));
DUMP B;

相关问题