pigstorage读取pig脚本中的压缩文件

cfh9epnr  于 2021-06-21  发布在  Pig
关注(0)|答案(0)|浏览(194)

我有一个程序,转储标签分离的数据文件压缩到s3。
我有一个pig脚本,它从s3 bucket加载数据。我在文件名中指定了.zip扩展名,以便pig知道所使用的压缩。
pig脚本运行并将数据转储回s3。
日志显示它正在处理记录,但转储的文件都是空的。
这是日志的摘录

Input(s):
Successfully read 375 records (435 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename1.zip"
Successfully read 444 records (442 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename2.zip"

Output(s):
Successfully stored 375 records (1605 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output1-folder"
Successfully stored 444 records (1814 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output2-folder"
Successfully stored 0 records in: "s3://<bucket-name>/<job-id>/test-folder/output/output3-folder"

加载和存储数据的代码是:

data1 = load '$input1'
    using PigStorage('\t') as
    (field1:long,
     field2:long,
     field3:double
);

data2 = load '$input2'
    using PigStorage('\t') as
    (field1:long,
     field2:long,
     field3:double
);

store output1 into '$output1-folder'
    using PigStorage('\t', '-schema');

store output2 into '$output2-folder'
    using PigStorage('\t', '-schema');

store output3 into '$output3-folder'
    using PigStorage('\t', '-schema');

压缩文件的代码

public static void compressFile(String originalArchive, String zipArchive) throws IOException {
    try (
            ZipOutputStream archive = new ZipOutputStream(new FileOutputStream(zipArchive));
            FileInputStream file    = new FileInputStream(originalArchive);
    ) {
        final int bufferSize = 100 * 1024;
        byte[] buffer = new byte[bufferSize];

        archive.putNextEntry(new ZipEntry(zipArchive));

        int count = 0;
        while ((count = file.read(buffer)) != -1) {
                archive.write(buffer, 0, count);
        }
        file.close();
        archive.closeEntry();
        archive.close();

    }
}

感谢您的帮助!
谢谢!

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题