gcc c++协程运行avx SIMD代码，但导致SIGSEGV

vxqlmq5t 于 2023-08-06 发布在其他

关注(0)|答案(1)|浏览(106)

c++协程运行avx SIMD代码，但导致AVX和AVX 512的SIGSEGV

HelloCoroutine hello(int& index, int id, int group_size) {
#if 1
    __mmask8 res=0;
    for(auto i= index++; i< 20; i=index++)
    {

#if 0
// error
        std::cout <<"step 1" <<std::endl;
        __m512i v_offset = _mm512_set1_epi64(int64_t (i));
        std::cout <<"step 2" <<std::endl;
        __m512i v_size = _mm512_set1_epi64(int64_t(group_size));
        std::cout <<"step 3" <<std::endl;
        res = _mm512_cmpgt_epi64_mask(v_offset, v_size);
#elif 1 
// error
        std::cout <<"step 1" <<std::endl;
        __m256i v_offset = _mm256_set1_epi32(int32_t (i));
        std::cout <<"step 2" <<std::endl;
        __m256i v_size = _mm256_set1_epi32(int32_t(group_size));
        std::cout <<"step 3" <<std::endl;
        res = _mm256_cmpgt_epi32_mask(v_offset, v_size);

#else
// OK
        std::cout <<"step 1" <<std::endl;
        __m128i v_offset = _mm_set1_epi32(int32_t (i));
        std::cout <<"step 2" <<std::endl;
        __m128i v_size = _mm_set1_epi32(int32_t(group_size));
        std::cout <<"step 3" <<std::endl;
        res = _mm_cmpgt_epi32_mask(v_offset, v_size);
#endif       
#else
    int res=0;
    for(auto i= index++; i< 20; i=index++)
    {
        res = i > group_size;
#endif
        cout <<i << " > " << group_size <<" ? " << (int)res<<endl;
        co_await std::suspend_always();
    }
}

字符串
在https://godbolt.org/z/hcP988z8b上编译
-std=c++20 -fcoroutines -mbmi2 -mavx -mavx512f -mavx512pf -mavx512er -mavx512cd -mavx512vl
但avx和avx 512的结果错误，仅SSE工作正常
返回的程序：139程序终止信号：SIGSEGV步骤1

gcc

来源：https://stackoverflow.com/questions/76727854/c-coroutine-runs-avx-simd-code-but-causes-sigsegv

1条答案

按热度按时间

wbgh16ku1#

这似乎是一个GCC bug，除非协程被记录为不支持带有alignof(T) > alignof(max_align_t)的局部变量（例如__m256i或__m512i）。
您可以将它报告给https://gcc.gnu.org/bugzilla/（最好使用最小的AVX 2测试用例
对于只需要AVX 2而不是AVX-512的版本，我可以在桌面上测试它，并在需要32字节对齐的vmovdqa YMMWORD PTR [rbx+0x40],ymm0上看到它的故障。（存储vpbroadcastd的结果，初始化__m256i v_offset = set1...。）（https://godbolt.org/z/8vfz3v5v1只修复__m256i块，用-std=gnu++20 -fcoroutines -O2 -march=skylake编译）
IDK为什么使用RBX而不是RSP来访问局部变量;我猜这就是协程在hello(hello(int&, int, int)::_Z5helloRiii.Frame*) [clone .actor]:版本的函数中的工作方式。在那个协程版本中，GCC仍然只是将堆栈指针与and rsp, -32/sub rsp, 192对齐，但这对相对于RBX存储的东西没有帮助。
请注意，您的所有3个版本都需要AVX-512，只是矢量宽度不同。像_mm_cmpgt_epi32_mask这样的Compare-into-mask始终需要AVX-512。
如果你想要一个AVX 2或SSE的整数掩码，你需要_mm_cmpgt_epi32和_mm_movemask_epi8（每字节1位）或_mm_movemask_ps( _mm_castsi128_ps(cmp_result) )（每int 32 1位），或_mm256等效值。
使用-march=native或-march=skylake-avx512、-march=znver4或其他。没有真实的的CPU同时支持AVX 512 ER（Xeon Phi）和AVX 512 VL（其他一切）。https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512
如果你的CPU不支持AVX-512，你会得到SIGILL（在所有3个），而不是SIGSEGV。

赞(0）回复(0）举报 2023-08-06

我来回答

gcc c++协程运行avx SIMD代码，但导致SIGSEGV

1条答案

相关问题

热门标签

最新问答