c++ 如何防止thrust：：reduce_by_key写入可分页内存？

3lxsmp7m 于 2023-05-20 发布在其他

关注(0)|答案(1)|浏览(105)

我正在编写一个应用程序，它使用多个并发CUDA流。当我的thrust::reduce_by_key调用似乎要写入可分页内存时，我的其他流正在阻塞。我认为返回值是问题所在。
如何防止返回值被写入可分页内存？
我将包括演示我尝试的解决方案的代码。

#include <thrust/system/cuda/vector.h>
#include <thrust/host_vector.h>
#include <thrust/pair.h>
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <thrust/random.h>

int main(void)
{
  int N = 20;
  thrust::default_random_engine rng;
  thrust::uniform_int_distribution<int> dist(10, 99);

  // initialize data
  thrust::device_vector<int> array(N);
  for (size_t i = 0; i < array.size(); i++)
    array[i] = dist(rng);

  // allocate storage for sums and indices
  thrust::device_vector<int> sums(N);
  thrust::device_vector<int> indices(N);

  // make a pinned memory location for the returned pair of iterators
  typedef thrust::device_vector<int>::iterator  dIter;
  thrust::pair<dIter, dIter>*  new_end;

  const unsigned int bytes =  sizeof(thrust::pair<dIter, dIter>);
  cudaMallocHost((void**)&new_end, bytes);

  for(int i = 0 ; i< 20; i++){   // you can see in the profiler each operator writes 4 bytes to pageable memory

        *new_end = thrust::reduce_by_key
            (thrust::make_counting_iterator(0),
             thrust::make_counting_iterator(N),
             array.begin(),
             indices.begin(),
             sums.begin(),
             thrust::equal_to<int>(),
             thrust::plus<int>());
  }
  std::cout << "done \n";
  return 0;
}

这是我的分析器的图片，显示了从设备到主机可分页内存

的拷贝

c++

来源：https://stackoverflow.com/questions/62907675/how-do-you-prevent-thrustreduce-by-key-from-writing-to-pageable-memory

1条答案

按热度按时间

t40tm48m1#

我正在编写一个应用程序，它使用多个并发CUDA流。当我的thrust::reduce_by_key似乎要写入可分页内存时，我的其他流正在阻塞
这种阻塞行为不是由“写入可分页内存”引起的。它是由显式同步调用引起。一般来说，自CUDA 10.1（Thrust 1.9.4）版本起，all normal synchronous algorithms are blocking。您可以通过使用探查器检查API跟踪来自己确认这一点。但是，您至少可以通过将调用启动到stream中来限制阻塞的范围，尽管我懒得测试这是否以有用的方式修改了cuda_cub::synchronize的行为。
如何防止返回值被写入可分页内存？
不是说这和你的问题有任何关系，但你不能。重要的是要记住，与您最初的问题相反，thrust::reduce_by_key不是内核，它是执行一系列操作的主机代码，包括将返回值从设备内存复制到主机堆栈变量。程序员无法控制内部机制，显然，试图使用自己的固定内存值来接受通过值传递的结果是荒谬的，不会有任何效果。
正如评论中所建议的，如果您需要您的问题所建议的操作内部控制的粒度级别，那么推力是错误的选择。使用cub：：device：：reduce_by_key --这与thrust实现使用的算法相同，但您可以显式控制临时内存、同步、流以及如何访问调用结果。然而，这并不适合初学者。

赞(0）回复(0）举报 2023-05-20

我来回答

c++ 如何防止thrust：：reduce_by_key写入可分页内存？

1条答案

相关问题

热门标签

最新问答