c++ 在CUDA中使用cusolverSpScsrlsvchol处理稀疏线性问题时出现分割错误

bhmjp9jg  于 2023-06-07  发布在  其他
关注(0)|答案(1)|浏览(134)

我正在尝试将一个线性问题移植到CUDA,以加快解决时间。我已经成功地使用cusolverDn来处理GPU上的密集问题。然而,当我试图使用cusolverSpScsrlsvchol将其应用于稀疏问题时,我总是得到一个分段错误。
为了调试这个问题,我使用了CUDA计算消毒器,并收到了以下输出:

$ /c/Programme/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/bin/compute-sanitizer.bat --tool memcheck bin/FEMaster_gpu.exe
========= COMPUTE-SANITIZER

========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 0 errors
Segmentation fault

我将问题缩小到以下最小代码片段:

cusolverSpHandle_t handle_cusolver_sp;
    cusparseHandle_t   handle_cusparse;

    // loading handles
    cusolverSpCreate(&handle_cusolver_sp);
    cusparseCreate  (&handle_cusparse);

    // get properties
    cudaSetDevice(0);

    // create csr arrays on cpu
    float host_csr_values[4]{1,1,1,1};
    int   host_csr_col_id[4]{0,1,2,3};
    int   host_csr_row_pt[5]{0,1,2,3,4};
    float host_rhs       [4]{0,3,7,1};
    int   host_singular  [1]{0};

    // allocate arrays on the gpu
    float* dev_csr_values;
    int  * dev_csr_col_id;
    int  * dev_csr_row_pt;
    float* dev_rhs;
    int  * dev_singular;

    runtime_assert_cuda(cudaMalloc((void**) &dev_csr_values,4 * sizeof(float)));
    runtime_assert_cuda(cudaMalloc((void**) &dev_csr_col_id,4 * sizeof(int  )));
    runtime_assert_cuda(cudaMalloc((void**) &dev_csr_row_pt,5 * sizeof(int  )));
    runtime_assert_cuda(cudaMalloc((void**) &dev_rhs       ,4 * sizeof(float)));
    runtime_assert_cuda(cudaMalloc((void**) &dev_singular  ,1 * sizeof(int  )));

    // move data to gpu
    runtime_assert_cuda(cudaMemcpy(dev_csr_values, host_csr_values, 4 * sizeof(float), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_csr_col_id, host_csr_col_id, 4 * sizeof(int  ), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_csr_row_pt, host_csr_row_pt, 5 * sizeof(int  ), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_rhs       , host_rhs       , 4 * sizeof(float), cudaMemcpyHostToDevice));

    // create matrix descriptor
    cusparseMatDescr_t descr;
    runtime_assert_cuda(cusparseCreateMatDescr(&descr));
    runtime_assert_cuda(cusparseSetMatType     (descr, CUSPARSE_MATRIX_TYPE_GENERAL));
    runtime_assert_cuda(cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO    ));

    runtime_assert_cuda(cusolverSpScsrlsvchol(handle_cusolver_sp,
                                              4,
                                              4,
                                              descr,
                                              dev_csr_values,
                                              dev_csr_row_pt,
                                              dev_csr_col_id,
                                              dev_rhs,
                                              0,    // tolerance
                                              0,    // reorder
                                              dev_rhs,
                                              dev_singular));

稀疏矩阵的值就是对角矩阵的值。
为了简单起见,我删除了内存释放、输出检索和其他类似的调用。这段代码看起来很简单,但它会导致一个分段错误。此问题特别是在调用cusolverSpScsrlsvchol期间发生。
我在这个问题上被困了一天多,我不知道为什么它不起作用。任何帮助将不胜感激!

uurity8g

uurity8g1#

API声明奇异性参数应该在主机存储器空间中,而不是设备中。

相关问题