c++ 这是将64b值的缓冲区重新格式化为16b的最快方法吗？

62lalag4 于 2023-07-01 发布在其他

关注(0)|答案(1)|浏览(93)

我有一个数据流，它将物理上的64位值输出到缓冲区。当缓冲区达到某个水平时，需要将其重新格式化为连续的16位值。真实的值永远不会超过数据流产生的每个值的64位中的24位，因此这相当于将24b值截断为16b并重新排列缓冲区，因此这些值现在是连续的。我相信我已经找到了最快的方法来做到这一点，但我不确定是否有优化，我可能会错过或更快的方法提供的C++标准实用程序。下面是一个MRE，显示了我的重新格式化功能，以及一个测试工具，用于生成像我遇到的数据和重新格式化的时间。

#include <iostream>
#include <chrono>
#include <unistd.h>

int num_samples = 160000;

void fill_buffer(uint8_t** buffer){
  *buffer = (uint8_t*)malloc(num_samples * sizeof(uint64_t));
  for (int i = 0; i < num_samples; i += 8){
    (*buffer)[i] = rand() % 0xFF;
    (*buffer)[i + 1] = rand() % 0xFF;
    (*buffer)[i + 2] = rand() % 0xFF;
  }
}

void reformat_1(uint8_t* buf){
  uint64_t* p_8byte = (uint64_t*)buf;
  uint16_t* p_2byte = (uint16_t*)buf;

  for (int i = 0; i < num_samples; i++){
    p_2byte[i] = p_8byte[i] >> 8;
  }
}

int main(int argc, char const* argv[]){
  uint8_t* buffer = NULL;

  fill_buffer(&buffer);
  auto start = std::chrono::high_resolution_clock::now();
  reformat_1(buffer);
  auto stop = std::chrono::high_resolution_clock::now();
  auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
  std::cout << "Time taken by function one: " << duration.count() << " microseconds" << std::endl;

  return 0;
}

我也很乐意听到关于我的基准测试设置的反馈，我发现有趣的是，使用-O3，我从文件中读取的实际样本数据大约为130uS，而使用随机生成的数据，我看到的数据更接近1800uS，所以这显然不是一个完美的代表性示例。
另一件事，我会注意到，我会认为将工作对我的实际时间（与合成），但显然不是：虽然num_samples在这里是一个幻数，但在实践中，它是计算出来的，通常是常数（并不总是），但编译器不会用常数来替换展开循环等（我认为）。

c++

来源：https://stackoverflow.com/questions/76576499/is-this-the-fastest-way-to-reformat-a-buffer-of-64b-values-to-16b

1条答案

按热度按时间

b1zrtrql1#

这一微小改进将性能提高了约10%：

void reformat_2(uint8_t* buf){
  uint32_t* p_8byte = (uint32_t*)buf;
  uint16_t* p_2byte = (uint16_t*)buf;
  uint16_t* p_2end  = p_2byte + num_samples;

  while(p_2byte < p_2end){
    *p_2byte++ = *p_8byte >> 8;
    p_8byte += 2;
  }
}

为了查看更清晰的数字，我将缓冲区大小增加了100倍，达到16M条目。

赞(0）回复(0）举报 2023-07-01

我来回答

c++ 这是将64b值的缓冲区重新格式化为16b的最快方法吗？

1条答案

相关问题

热门标签

最新问答