C++将二进制文件读入uint8数组以返回十进制整数，结果错误

plicqrtu 于 11个月前发布在其他

关注(0)|答案(1)|浏览(116)

我尝试解析一个二进制文件并从中提取不同的数据结构，其中一个可以是uint 8或int 8（也可以是uint 16，int 16.
为了有一个最通用的方法，我从给定的文件指针中读入数据，并将其保存在uint 8数组（缓冲区）中。
通过我的测试，我假设文件内容为40（十六进制）应该导致结果整数64。这就是为什么我的测试方法Assert这个值是海岸。**不幸的是，uint 8数组的内容总是导致十进制整数52。**我不知道为什么，尝试了各种其他的方法来读取特定数量的字节，并将它们分配给一个整数变量。
谢谢你，如果有人可以帮助：）
我的read_int方法：

int read_int(FILE * file,int n,bool is_signed) throw(){
  assert(n>0);
  uint8_t n_chars[n];
  int result;
  for (int i = 0; i < n; i++)
  {
    if(fread(&n_chars[i],sizeof(n_chars[i]),1,file)!=1){
        std::cerr<< "fread() failed!\n";
        throw new ReadOpFailed();
    }
    result*=255;
    result+=n_chars[i];
  }
    std::cout<< "int read: "<<result<<"\n";
    return result;

//-------------Some ideas that didn't work out either------------------
    // std::stringstream ss;
    // ss << std::hex << static_cast<int>(static_cast<unsigned char>(n_chars)); // Convert byte to hexadecimal string
    // int result;
    // ss >> result; // Parse the hexadecimal string to integer
    // std::cout << "result" << result<<"\n";

字符串
有一个小测试非常失败......端序检测部分给出了小端序的输出（不知道这是否是问题的一部分）。

struct TestContext{
    FILE * create_test_file_hex(char * input_hex,const char * rel_file_path = "test.gguf") {
        std::ofstream MyFile(rel_file_path, std::ios::binary);

        // Write to the file
        MyFile << input_hex;

        // Close the file
        MyFile.close();

        
        // std::fstream outfile (rel_file_path,std::ios::trunc);
        // char str[20] = 
        // outfile.write(str, 20);
        // outfile.close();

        FILE *file = fopen(rel_file_path,"rb");
        try{
            assert(file != nullptr);
        }catch (int e){
            std::cout << "file couldn't be opened due to exception n° "<<std::to_string(e)<<"\n";
            ADD_FAILURE(); 
        }
        std::remove(rel_file_path); //remove file whilst open, to be able to use it, but delete it after the last pointer was deleted.
    return file;
    }
};

TEST(test_tool_functions, test_read_int){
    int n = 1;
    // little endian if true
    if(*(char *)&n == 1) {std::cout<<"Little Endian Detected!!!\n";}
    else{std::cout<<"Big Endian Detected!!!\n";}
    std::string file_hex_content = "400A0E00080000016";
    
    uint64_t should;
    std::istringstream("40") >> std::hex >> should;
    ASSERT_EQ(should,64);
    
    uint64_t result = read_int(TestContext().create_test_file_hex(file_hex_content.data()),1,false);
    ASSERT_EQ(result,should);
}

型

c++

来源：https://stackoverflow.com/questions/77454585/c-read-binary-file-into-uint8-array-to-return-decimal-int-gives-wrong-result

1条答案

按热度按时间

6ie5vjzr1#

问题的根本原因是您的file_hex_content由ASCII字符字节组成（形成一个人类可读的十六进制数字串表示），而不是构成二进制整数表示的字节。因此，它不是以单个字节0x40（也称为64）开始，而是以字节'4'开始（ASCII字节值52）后面跟着另一个字节'0'（ASCII值48）。单个字节64（0x40）corresponds到ASCII字符'@'，而不是两个字符'4'和'0'。
下面是一个小的序列化例子，只要你在同一个架构上进行序列化和非序列化，并且没有可移植性的问题，那么字节序也不是一个问题。

#include <cstdint>
#include <ios>
#include <iostream>
#include <sstream>

int main() {
  std::stringstream encoded;

  const uint64_t source{0xabcd1234deadbeefULL};
  encoded.write(reinterpret_cast<const char*>(&source), sizeof(source));

  uint64_t target;
  encoded.read(reinterpret_cast<char*>(&target), sizeof(target));

  std::cout << "source == target: " << std::hex << source << " == " << target
            << "\nserialized bytes:";
  for (const uint8_t byte : encoded.str())
    std::cout << ' ' << static_cast<uint32_t>(byte);
  std::cout << std::endl;
}

字符串
上面程序的输出，当executed在我的 little endian 机器上时，看起来像这样：

source == target: abcd1234deadbeef == abcd1234deadbeef
serialized bytes: ef be ad de 34 12 cd ab

型
正如预期的那样，序列化的字符串从最低位字节0xef开始，以最高位字节0xab结束。在 big endian 平台上，第二行将从最高位字节到最低位字节排序，即ab cd 12 34 de ad be ef。

赞(0）回复(0）举报 11个月前

我来回答

C++将二进制文件读入uint8数组以返回十进制整数，结果错误

1条答案

相关问题

热门标签

最新问答