pigstorage读取pig脚本中的压缩文件

cfh9epnr  于 2021-06-21  发布在  Pig
关注(0)|答案(0)|浏览(214)

我有一个程序,转储标签分离的数据文件压缩到s3。
我有一个pig脚本,它从s3 bucket加载数据。我在文件名中指定了.zip扩展名,以便pig知道所使用的压缩。
pig脚本运行并将数据转储回s3。
日志显示它正在处理记录,但转储的文件都是空的。
这是日志的摘录

  1. Input(s):
  2. Successfully read 375 records (435 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename1.zip"
  3. Successfully read 444 records (442 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename2.zip"
  4. Output(s):
  5. Successfully stored 375 records (1605 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output1-folder"
  6. Successfully stored 444 records (1814 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output2-folder"
  7. Successfully stored 0 records in: "s3://<bucket-name>/<job-id>/test-folder/output/output3-folder"

加载和存储数据的代码是:

  1. data1 = load '$input1'
  2. using PigStorage('\t') as
  3. (field1:long,
  4. field2:long,
  5. field3:double
  6. );
  7. data2 = load '$input2'
  8. using PigStorage('\t') as
  9. (field1:long,
  10. field2:long,
  11. field3:double
  12. );
  13. store output1 into '$output1-folder'
  14. using PigStorage('\t', '-schema');
  15. store output2 into '$output2-folder'
  16. using PigStorage('\t', '-schema');
  17. store output3 into '$output3-folder'
  18. using PigStorage('\t', '-schema');

压缩文件的代码

  1. public static void compressFile(String originalArchive, String zipArchive) throws IOException {
  2. try (
  3. ZipOutputStream archive = new ZipOutputStream(new FileOutputStream(zipArchive));
  4. FileInputStream file = new FileInputStream(originalArchive);
  5. ) {
  6. final int bufferSize = 100 * 1024;
  7. byte[] buffer = new byte[bufferSize];
  8. archive.putNextEntry(new ZipEntry(zipArchive));
  9. int count = 0;
  10. while ((count = file.read(buffer)) != -1) {
  11. archive.write(buffer, 0, count);
  12. }
  13. file.close();
  14. archive.closeEntry();
  15. archive.close();
  16. }
  17. }

感谢您的帮助!
谢谢!

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题