c++ CUDA内核模板示例化导致编译错误

k75qkfdt 于 2023-10-20 发布在其他

关注(0)|答案(1)|浏览(89)

我试图定义一个模板CUDA内核的逻辑操作的形象。代码看起来像这样：

#define AND 1
#define OR 2
#define XOR 3
#define SHL  4
#define SHR 5 

template<typename T, int opcode> 
__device__ inline T operation_lb(T a, T b)
{
    switch(opcode)
    {
    case AND:
        return a & b;
    case OR:
        return a | b;
    case XOR:
        return a ^ b;
    case SHL:
        return a << b;
    case SHR:
        return a >> b;
    default:
        return 0;
    }
}

//Logical Operation With A Constant
template<typename T, int channels, int opcode> 
__global__ void kernel_logical_constant(T* src, const T val, T* dst, int width, int height, int pitch)
{
    const int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
    const int yIndex = blockIdx.y * blockDim.y + threadIdx.y;

    if(xIndex >= width || yIndex >= height) return;

    unsigned int tid = yIndex * pitch + (channels * xIndex);

    #pragma unroll
    for(int i=0; i<channels; i++)
        dst[tid + i] = operation_lb<T,opcode>(src[tid + i],val);
}

问题是，当我示例化内核进行位移位时，出现了以下编译错误
错误1错误：Ptx程序集由于错误而中止
内核示例如下所示：

template __global__ void kernel_logical_constant<unsigned char,1,SHL>(unsigned char*,unsigned char,unsigned char*,int,int,int);

对于unsigned char、unsigned short、1和3通道以及所有逻辑操作，还有19个类似的时刻。但是仅比特移位示例，即，SHL和SHR导致错误。当我删除这些示例时，代码可以完美地编译和工作。如果我用operation_lb设备函数中的任何其他操作替换位移位，代码也可以工作。我想知道这是否与由于内核的许多不同示例而生成的ptx代码量有关。
我使用CUDA 5.5，Visual Studio 2010，Windows 8 x64。编译compute_1x, sm_1x。
如果你能帮忙的话，我将不胜感激。

c++

来源：https://stackoverflow.com/questions/18915718/cuda-kernel-template-instantiation-causing-compilation-error

1条答案

按热度按时间

oiopk7p51#

最初的问题指定发帖者使用compute_20, sm_20。因此，我无法使用代码here重现错误。然而，在评论中指出，实际上使用的是sm_10。当我切换到编译sm_10时，我能够重现错误。
它 * 似乎 * 是编译器中的一个bug。我这么说只是因为我不相信编译器应该生成汇编器无法处理的代码。然而，除此之外，我不知道根本原因。我向NVIDIA提交了一份bug报告。
在我有限的测试中，它似乎只发生在unsigned char上，而不是int上。
对于cc2.0和更新版本的设备，可能的解决方法是在编译时指定-arch=sm_20。

赞(0）回复(0）举报 2023-10-20

我来回答

c++ CUDA内核模板示例化导致编译错误

1条答案

相关问题

热门标签

最新问答