hadoop—如何使用java解压存储在hdfs中的文件,而不首先复制到本地文件系统?

kupeojn6  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(394)

我们正在hdfs中存储包含xml文件的zip文件。我们需要能够使用java以编程方式解压文件并将包含的xml文件流式输出。filesystem.open返回fsdatainputstream,但zipfile构造函数仅将文件或字符串作为参数。我真的不想使用filesystem.copytolocalfile。
是否可以流式传输存储在hdfs中的zip文件的内容,而不首先将zip文件复制到本地文件系统?如果是,怎么办?

yh2wf1be

yh2wf1be1#

嗨,请找到样本代码,

public static Map<String, byte[]> loadZipFileData(String hdfsFilePath) {
            try {
                ZipInputStream zipInputStream = readZipFileFromHDFS(new Path(hdfsFilePath));
                ZipEntry zipEntry = null;
                byte[] buf = new byte[1024];
                Map<String, byte[]> listOfFiles = new LinkedHashMap<>();
                while ((zipEntry = zipInputStream.getNextEntry()) != null ) {
                    int bytesRead = 0;
                    String entryName = zipEntry.getName();
                    if (!zipEntry.isDirectory()) {
                        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
                        while ((bytesRead = zipInputStream.read(buf, 0, 1024)) > -1) {
                            outputStream.write(buf, 0, bytesRead);
                        }
                        listOfFiles.put(entryName, outputStream.toByteArray());
                        outputStream.close();
                    }
                    zipInputStream.closeEntry();
                }
                zipInputStream.close();
                return listOfFiles;
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

protected ZipInputStream readZipFileFromHDFS(FileSystem fileSystem, Path path) throws Exception {
    if (!fileSystem.exists(path)) {
        throw new IllegalArgumentException(path.getName() + " does not exist");
    }
    FSDataInputStream fsInputStream = fileSystem.open(path);
    ZipInputStream zipInputStream = new ZipInputStream(fsInputStream);
    return zipInputStream;
}

相关问题