目的:将组的唯一键作为文件夹名,包内容作为记录。
File : employee.txt
#JoiningDate Employee Id Employee Name
20140302 1 A
20140302 2 B
20140302 3 C
20140303 4 D
20140303 5 E
20140303 6 F
Pig脚本:
X = load 'employee.txt' using PigStorage('\t') as (joining_date:chararray, employee_id:long, employee_name:chararray);
Y = group X by joining_date;
Output of this would be (Y) :
(20140302, {(20140302,1,A), (20140302,2,B), (20140302,3,C)})
(20140303, {(20140303,4,D), (20140303,5,E), (20140303,6,F)})
目标是在输出路径中有两个文件夹:
1. outputfolder/20140302 : having three records
20140302,1,A
20140302,2,B
20140302,3,C
2. outputfolder/20140303 :
20140303,4,D
20140303,5,E
20140303,6,F
尝试
store Y into 'outputfolder' using org.apache.pig.piggybank.storage.MultiStorage('outputfolder', '0', 'none', ',');
结果如下:
1. outputfolder/20140302/20140302-0
(20140302, {(20140302,1,A), (20140302,2,B), (20140302,3,C)})
2. outputfolder/20140303/20140303-0
(20140303, {(20140303,4,D), (20140303,5,E), (20140303,6,F)})
1条答案
按热度按时间brgchamk1#
一种方法是在
store
命令。输出将存储在
outputfolder/20140302
文件夹和文件名的开头是这样的20140302-0,000