使用pig拉丁语的列到行转换

ekqde3dh  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(404)
A = load 'input.txt'; 
dump A;
"0,1, 2,3,4 
5, 6,7, 8,9
B = foreach A generate FLATTEN(TOBAG(*));
dump B
("0)
(1)
( 2)
(3)
(4)
(5)
( 6)
(7)
( 8)
(9)

我想对上面的每个字段执行一些替换和修剪操作。如何将其转换回原始格式?
预期产量

0,1,2,3,4

5,6,7,8,9
neekobn8

neekobn81#

是的,这确实是一个实验性的问题。
行到列转换和列到行转换!!
是的,通过从等级操作员那里得到一点帮助,我想我们可以做到这一点
我尝试了下面的输入代码
输入:

0,1,2,3,4 
 5,6,7,8,9

在下面的pig脚本中有两个dump语句

numbers = LOAD '/home/inputfiles/col_to_row.txt' USING PigStorage() As(line:chararray);

numbers_rank = RANK numbers;

numbers_each = FOREACH numbers_rank GENERATE  $0 as rank_key,FLATTEN(TOKENIZE(line)) as each_number;

rows_to_columns = FOREACH numbers_each GENERATE each_number;

dump rows_to_columns;--Will give you each number in a separate row..

numbers_grp = GROUP numbers_each BY rank_key;

columns_to_rows = FOREACH numbers_grp GENERATE FLATTEN(BagToTuple(numbers_each.each_number));

dump columns_to_rows; -- Will give you as Per original input data set

输出:

dump rows_to_columns;

         (0)
         (1)
         (2)
         (3)
         (4)
         (5)
         (6)
         (7)
         (8)
         (9)

   dump columns_to_rows;

         (0,1,2,3,4)
         (5,6,7,8,9)
0qx6xfy6

0qx6xfy62#

您可以用regex做一个简单的替换。自从 REPLACE 函数调用java String.replaceAll() 您可以使用java兼容的regex。演示如下:

grunt> A = load 'input.txt' as (f1:chararray);
grunt> DUMP A;
("0,1, 2,3,4 )
(5, 6,7, 8,9)
grunt> B = foreach A generate FLATTEN(TOBAG(*));
grunt> DUMP B;
("0,1, 2,3,4 )
(5, 6,7, 8,9)
grunt> X = FOREACH B GENERATE REPLACE($0, '[^0-9,]', '');
grunt> DUMP X;
(0,1,2,3,4)
(5,6,7,8,9)
grunt> Y = FOREACH X GENERATE FLATTEN(STRSPLIT($0, ','));
grunt> DUMP Y;
(0,1,2,3,4)
(5,6,7,8,9)
grunt> Z = FOREACH Y GENERATE $0;
grunt> DUMP Z;
(0)
(5)

相关问题