c++ 如何处理文件解压缩中的剩余位

uz75evzq 于 2024-01-09 发布在其他

关注(0)|答案(2)|浏览(128)

我正在创建一个文件压缩和解压缩，我不知道如何处理剩余的位时，我解压缩。
例如，我有63 bits.length（），由于byte = 8 bits，bits.length（）% 8 = 7，因此仍然有7 bits。
下面是我的压缩代码：

void compressFile(string inputFile) {
    huffmanTree();
    system("cls");
    cout << "\n\n\t\t\t\tProcessing...";
    Sleep(5000);

    ifstream inputedFile(inputFile); // Open the file in binary mode
    ofstream compressedFile("compressed.huff");

    if (!inputedFile.is_open() || !compressedFile.is_open()) {
        cout << "\t\t\t\tError: Unable to open file for compression." << endl;
        return;
    }

    string bits; // Use a string to accumulate bits for each character
    char ch;
    while (inputedFile.get(ch)) {
        bits += treeCode[(int)ch];
        if(bits.length() >= 8){
            // Process complete groups of 8 bits
            for (int i = 0; i + 8 <= bits.length(); i += 8) {
                compressedFile.put((char)stoi(bits.substr(i, 8), NULL, 2));
            }
            bits = bits.substr(bits.length() - bits.length() % 8);
        }
    }
    if (!bits.empty()) {
        compressedFile.put((char)stoi(bits, NULL, 2));
    }

    system("cls");
    cout << "\t\t\t\t---------------------------------------------" << endl;
    cout << "\n\n\t\t\t\tSuccessful: File has been compressed." << endl;
    cout << "\n\n\t\t\t\tThe file name is compressed.huff." << endl;
    cout << "\t\t\t\t---------------------------------------------" << endl;
    cout << "\t\t\t\t";
    system("pause");
    inputedFile.close();
    compressedFile.close();
}

字符串
下面是我的解压代码：

void decompressFile(string compressedFile) {

    system("cls");
    cout << "\n\n\t\t\t\tProcessing...";
    Sleep(5000);

    ifstream compressedFileStream(compressedFile, ios::binary); // Open the file in binary mode
    ofstream decompressedFile("decompressed.txt");

    if (!compressedFileStream.is_open() || !decompressedFile.is_open()) {
        cout << "\n\t\t\t\tError: Unable to open file for decompression." << endl;
        return;
    }

    huffmanTree();

    Node* root = head->node; // Save the root of the Huffman tree
    Node* current = root;    // Initialize the current node

    char byte; // Read bytes for decompression
    while (compressedFileStream.get(byte)) {
        for (int i = 7; i >= 0; i--) {

            // Traverse the tree based on each bit in the byte
            char bit = (byte & (1 << i)) ? '1' : '0';

            if (bit == '0') {
                current = current->left;
            }
            else if (bit == '1') {
                current = current->right;
            }

            if (current->left == NULL && current->right == NULL) {
                decompressedFile << current->character;
                cout << "decompressed" <<current->character;
                current = root; // Reset current to the root for the next character
            }
        }
    }
    system("pause");
    system("cls");
    cout << "\t\t\t\t---------------------------------------------" << endl;
    cout << "\n\n\t\t\t\tSuccessful: File has been decompressed." << endl;
    cout << "\n\n\t\t\t\tThe file name is decompressed.txt." << endl;
    cout << "\t\t\t\t---------------------------------------------" << endl;
    cout << "\t\t\t\t";
    system("pause");
    compressedFileStream.close();
    decompressedFile.close();
}

型
我应该怎么做才能在不丢失字符的情况下恢复我的压缩文件。

c++

来源：https://stackoverflow.com/questions/77772567/how-to-handle-the-remaining-bits-in-file-decompression

2条答案

按热度按时间

bvjveswy1#

压缩器必须以某种方式“刷新”其输出比特流。也就是说，如果最后一个压缩字节中的比特数小于8，它仍然必须输出该字节。用任何值填充剩余的比特-零，一或垃圾。
这就导致了下一个问题--解压缩器如何知道在哪里停止？如果它在预期的结束之后解压缩额外的位，解压缩的输出将在结束时包含垃圾。
1.最简单的方法是将比特流的“真实的”长度（比特数）从压缩器传递到解压缩器，与比特流分开。
1.压缩器可以有一个专用的令牌“消息结束”;它应该在结束时对其进行编码，并使用垃圾填充最后一个字节中未使用的位（与什么无关）。
1.压缩器可以在比特流的末尾放置一个专用的“刷新”比特序列。最好的刷新是“100...”-也就是说，放置一个等于1的比特，然后，放置等于0的比特，直到字节满为止。
后一种方法是最干净的，但在解压器中最难实现。它应该提前读取1个字节，并使用该字节的文件结束指示来找到当前字节中的流结束。

Node* root = head->node; // Save the root of the Huffman tree
    Node* current = root;    // Initialize the current node

    char byte; // Read bytes for decompression
    char next_byte;
    compressedFileStream.get(byte);
    while (true) {
        compressedFileStream.get(next_byte);
        int num_bits = 8;
        if (compressedFileStream.eof()) {
            // current byte is the last one!
            // find the last bit equal to 1 in it.
            // All bits preceding it are data bits.
            if (byte == 0)
            {
                fprintf(stderr, "Error!\n");
            }
            for (int i = 0; i < 8; i++)
            {
                if (byte & (1 << i))
                {
                    num_bits = 7 - i;
                    break;
                }
            }
        }
        for (int i = 7; i >= 8 - num_bits; i--) {
            // Traverse the tree based on each bit in the byte
            char bit = (byte & (1 << i)) ? '1' : '0';
            ...
        }
        if (num_bits < 8)
            break;
        byte = next_byte;
    }

字符串
上面的代码未经测试;我希望它没有错误。使用这种思想的位级对齐需要精确的编码，但是一旦你做对了，问题就解决了。

赞(0）回复(0）举报 2024-01-09

ffvjumwh2#

您的代码当前执行以下操作：
1.读取一个字节（它会给你一个解压缩的值+在文件中前进光标）。
1.处理1 × 1位，首先检查您没有到达文件的结尾。
我看到了一些问题：

将if (compressedFileStream.eof())放入for循环中，看起来像是希望文件在一个字节的中间结束，但这并没有发生。

此外，一旦你完成了测试，只有在从文件中阅读之后再做一次测试才有意义：如果你没有读取任何东西，它返回的值将不会改变。

就像我上面说的，在阅读一个有效字节后，你要做的第一件事就是测试你是否已经到达文件的末尾。

如果你设法读取了一个字节，你知道它是有效的;因此你的测试在阅读下一个字节之前做**会更有意义。
考虑到这一点，似乎你所要做的就是在阅读文件之前移动eof：

Node* root = head->node; // Save the root of the Huffman tree
Node* current = root;    // Initialize the current node

char byte; // Read bytes for decompression
while (!compressedFileStream.eof()) {
    compressedFileStream.get(byte);
    for (int i = 7; i >= 0; i--) {
        if (!(byte >> i) & 1) {
            current = current->left;
        }
        else {
            current = current->right;
        }

        if (!current->left && !current->right) {
            decompressedFile << current->character;
            current = root; // Reset current to the root for the next character
        }
    }
} 
// Close the file / free resources here.

字符串
附言：

根据compressedFileStream是什么类型（你没有在你的问题中提到它，我不想在那里做任何假设），可能只需要测试get返回的值，就可以知道你在文件结束后什么时候尝试读取。
当我在做的时候，我纠正了你做char bit = (byte & (1 << i)) ? '1' : '0'的部分
NULL不是C++。以后使用nullptr（或者像我一样测试指针）。

注意：在压缩过程中，必须确保最后一个字节的最后一位不能被解释为字符。

如果选择用零填充最后一个字节，则必须确保没有字符由7个或更少的连续零序列表示。
相反的方法是，你正常地计算你的压缩，当你这样做的时候，检测一个序列，它可以肯定地填充最后一个字节，而不会被解释为一个字符。例如：
如果一个稍微罕见的字符由8位或更多位表示，那么解码它的前7位将不表示字符。
如果你找不到这样的情况，那么你可以创建它：在压缩过程中选择最罕见的字符并将0添加到它的序列中。然后序列与它具有相同的开始，但用1代替0，然后一系列0足以填充一个字节就可以了。
您还可以保存初始文件的字符数，这样就可以很容易地知道何时到达结尾（这是我可能会做的，至少对于过程的v1）。

赞(0）回复(0）举报 2024-01-09

我来回答

c++ 如何处理文件解压缩中的剩余位

2条答案

相关问题

热门标签

最新问答