rust 如何在一个(巨大的)gzip文件的行之间切换？

gcmastyq 于 2024-01-08 发布在其他

关注(0)|答案(2)|浏览(137)

所以，我试图在一个比可用RAM大的gz压缩文件上执行一种面向行的操作，所以首先将其阅读到字符串中是被排除的。问题是，如何在rust（短于gunzip file.gz|./my-rust-program）中做到这一点？
我目前的解决方案基于flate2和一堆缓冲读取器：

use std::path::Path;
use std::io::prelude::*;
use std::io::BufReader;
use std::fs::File;
use flate2::bufread::GzDecoder as BufGzDecoder;
fn main() {
    let mut fname = "path_to_a_big_file.gz";
    let f = File::open(fname).expect("Ooops.");
    let bf = BufReader::new(f); // Here's the first reader so I can plug data into BufGzDecoder.
    let br = BufGzDecoder::new(bf); // Yep, here. But, oops, BufGzDecoder has not lines method,
                                    // so try to stick it into a std BufReader.
    let bf2 = BufReader::new(br); // What!? This works!? Yes it does.
    // After a long time ...
    eprintln!("count: {}",bf2.lines().count());
    // ... the line count is here.
}

字符串
为了把上面的话，我注意到我不能直接插入一个文件到flate2::bufread::GzDecoder，所以我首先创建了std::io::BufReader示例，它与前者的构造函数方法兼容。但是，我没有看到任何有用的迭代器与flate2::bufread::GzDecoder相关，所以我在它上面构建了另一个std::io::BufReader。令人惊讶的是，这起作用了。我得到了我的Lines迭代器，它在我的机器上仅用了一分钟多的时间就读取了整个文件，但感觉它过于冗长，不优雅，而且可能效率低下（更担心这一部分）。

rust

来源：https://stackoverflow.com/questions/65777925/how-do-i-iterate-over-the-lines-of-a-huge-gzipped-file

2条答案

按热度按时间

xxhby3vn1#

问题中描述的每个“缓冲诱导”步骤在这里都是必要的。

GZip解码器的实现需要一个缓冲读取器作为解码过程的一部分。缓冲器将保存压缩数据，由于GZip的工作方式，通过它无法进行换行。
1.然后，第二个BufReader将用于识别行分隔模式，并准确地返回完整的文本行。
然而，对于第一个有一个快捷方式，flate2 crate提供了read::GzDecoder，它接受一个常规的读取器，并自动在其上使用缓冲阅读。

use flate2::read::GzDecoder;

let reader = BufReader::new(GzDecoder::new(file));

字符串
这样做之后，推荐的提高效率的方法是确保程序是用正确的配置文件（release 模式）构建的，并通过使用read_line而不是lines()迭代器来为每行重用相同的String值，从而减少内存分配的数量。
另请参阅：

赞(0）回复(0）举报 2024-01-08

kgqe7b3p2#

有些gzip文件可能有多个成员，请参阅flate2文档。在这种情况下，行迭代器将只遍历第一个结果并意外停止（参见示例here）。您可能需要使用MultiGzDecoder来避免该问题：

use flate2::read::MultiGzDecoder;
use std::fs::File;
use std::io::{BufRead, BufReader};

let file = File::open("path_to_a_big_file.gz").expect("Ooops.");
let reader = BufReader::new(MultiGzDecoder::new(file));

for line in reader.lines() {
    //do something
}

字符串

赞(0）回复(0）举报 2024-01-08

我来回答

rust 如何在一个(巨大的)gzip文件的行之间切换？

2条答案

相关问题

热门标签

最新问答