R语言 有没有办法交换字节来读取二进制DEC格式?

bvhaajcl  于 2023-05-04  发布在  其他
关注(0)|答案(1)|浏览(214)

我有旧的二进制文件写在什么被称为'DEC'格式。为了从这种格式中获得4字节浮点数的正确值,我可以执行以下操作:
1.读取字节
1.交换最后两个字节和前两个字节(交换字1和字2)
1.使用readBin()将字节转换为数字
1.将该值除以4
我以为readBin()中会有一个endian选项[c('little','big','swap')]来处理这个问题,但事实似乎并非如此。下面是一个示例和一些代码,显示了当前的解决方法。

# Start with actual value from sample file:
# 4 bytes representing target value of 1.290
# in practice dec_bytes is read in by readBin(con, raw(), n=4)
dec_bytes <- writeBin(1.290, raw(), size=4)
# Now rearrange bytes swapping words
pc_bytes <- c(dec_bytes[3], dec_bytes[4], dec_bytes[1], dec_bytes[2])
# Now use readBin to give numeric value of bytes
pc_float <- readBin(pc_bytes, numeric(), n=1, size=4)
pc_float 
# [1] 0.5161456
# Now divide by 4 to get the correct answer
pc_float <- pc_float / 4
pc_float 
#[1] 0.1290364

我显然可以创建一个函数来做到这一点,如上所述,但实际的问题是:有没有更简单有效的方法来做到这一点?在我大约30年前编写或发现的一些C代码中,我使用了以下函数,我只能假设它实际上是有效的:

float ConvertDecToFloat(char bytes[4])
{
    char p[4];
    p[0] = bytes[2];
    p[1] = bytes[3];
    p[2] = bytes[0];
    p[3] = bytes[1];
    if (p[0] || p[1] || p[2] || p[3])
        --p[3];          // adjust exponent

    return *(float*)p;
}

因此--p[3]在重新排列后从最后一个字节中减去1,从而得到正确的答案,而不必除以4。不确定这是否可以在R中完成,而无需转换为整数并返回到字节。

k4emjkb1

k4emjkb11#

由同事回答(感谢Michael Schwartz)。简单的向量化解决方案是创建用于重组字节向量值的索引的向量。我有两个可行的解决方案:

# Test on a vector with 24 bytes, convert to 6 doubles of 4 bytes each
values <- c(1, 12, 123, 1234, 12345, 123456)
pc_bytes0 <- writeBin(values, raw(), size = 4)

# Need to shuffle the byte order to reproduce DEC order
# using same procedure we will use to unshuffle

# Swapping needed to convert from PC to DEC byte order
# DEC byte 1 -> 3, 2 -> 4, 3 >- 1, 4 -> 2
byte_adjust <- rep(c(2, 2, -2, -2), 6) 
# Original index order
pc_byte_index <- seq(1:24) # original byte order
# New index order for DEC data storage, add adjustment vector
dec_byte_index <- pc_byte_index + byte_adjust
# Now reshuffle the original data using the index to get the DEC order
dec_bytes <- pc_data[dec_byte_index]
# This what readBin(raw()) will return from DEC file, 
# so actual process starts here.
# Note: To get the true DEC byte array we would have to subtract 01 
# from the 2nd byte in each 4 byte sequence

# Approach 1, make a long vector of original byte order and another of offsets
# and add together
# Data is in DEC sequence, so make vector of original order
dec_byte_index <- seq(1:24) # original byte index order
# These are the index offsets needed
byte_adjust <- rep(c(2, 2, -2, -2), 6)
# Offset original order by adding 
pc_byte_index <- dec_byte_index + byte_adjust
# Apply PC byte order to data
pc_bytes <- dec_bytes[pc_byte_index]
# Now the data can by read in the correct order and correction applied
pc_float <- readBin(pc_bytes, double(), n=6, size=4)
pc_float 
#> pc_float 
#[1]      1     12    123   1234  12345 123456

# Approach 2, use single index, reshape to matrix and apply 
# index representing desired order of 4 original bytes
byte_index <- c(3, 4, 1, 2)
# Convert data to matrix 
dec_byte_matrix <- matrix(dec_bytes, nrow=4, ncol=6)
# Use indicies to swap
pc_bytes <- dec_byte_matrix[index, ]
# Now compute floats
pc_float <- readBin(pc_bytes, double(), n=6, size=4)
#> pc_float 
#[1]      1     12    123   1234  12345 123456

我用microbench测试了一下,这两个之间的处理时间没有明显的区别。注意,对于原始DEC数据,pc_float需要除以4以获得正确答案,除非进行字节调整。

相关问题