gcc std：：复数乘法非常慢

5lwkijsr 于 2022-11-13 发布在其他

关注(0)|答案(2)|浏览(164)

我注意到使用重载的*运算符将两个std::complex值相乘比写出运算要慢得多。我看到了50倍的差异。这完全是荒谬的。我理解运算符需要检查输入中的NaN，因为无穷大的定义是多么复杂。这真的能解释50倍的时间差异吗？
我使用的是GCC 5.4.0，标志为-O3 -mavx -mavx2 -msse2 -mfma -mbmi。
下面是测试代码：

#include <iostream>
#include <complex>
#include <chrono>
#include <vector>

int main( void ) {
  size_t N = 10000;
  std::vector< std::complex< double >> inbuf( N );
  for( size_t k = 0; k < N; ++k ) {
     inbuf[ k ] = std::complex< double >( std::rand(), std::rand() ) / ( double )RAND_MAX - 0.5;
  }

  std::complex< double > c2 = { 0, 0 };
  auto t0 = std::chrono::steady_clock::now();
  for( size_t i = 0; i < 10000; ++i ) {
     for( size_t j = 0; j < N - 1; ++j ) {
        double re = inbuf[ j ].real() * inbuf[ j + 1 ].real() - inbuf[ j ].imag() * inbuf[ j + 1 ].imag();
        double im = inbuf[ j ].real() * inbuf[ j + 1 ].imag() + inbuf[ j ].imag() * inbuf[ j + 1 ].real();
        c2.real( c2.real() + re );
        c2.imag( c2.imag() + im );
     }
  }
  auto t1 = std::chrono::steady_clock::now();
  double time = ( std::chrono::duration< float >( t1 - t0 ) ).count();
  std::cout << c2 << " using manual *: " << time << std::endl;

  c2 = { 0, 0 };
  t0 = std::chrono::steady_clock::now();
  for( size_t i = 0; i < 10000; ++i ) {
     for( size_t j = 0; j < N - 1; ++j ) {
        c2 += inbuf[ j ] * inbuf[ j + 1 ];
     }
  }
  t1 = std::chrono::steady_clock::now();
  time = ( std::chrono::duration< float >( t1 - t0 ) ).count();
  std::cout << c2 << " using stdlib *: " << time << std::endl;
  return 0;
}

以下是输出：

(-2.45689e+07,-134386) using manual *: 0.109344
(-2.45689e+07,-134386) using stdlib *: 5.4286

编辑：考虑到评论中的不同结果，我用不同的编译选项做了更多的测试。结果表明，-mfma和-mavx开关导致“stdlib”版本非常慢。-mfma开关使“manual”版本的性能提高了大约25%，但使“stdlib”版本的速度降低了大约13倍：

cris@carrier:~/tmp/tests> g++ complex_test.cpp -o complex_test -O3 -std=c++11
cris@carrier:~/tmp/tests> ./complex_test                                     
(-2.45689e+07,-134386) using manual *:0.138276
(-2.45689e+07,-134386) using stdlib *:0.412056
cris@carrier:~/tmp/tests> g++ complex_test.cpp -o complex_test -O3 -mfma -std=c++11 
cris@carrier:~/tmp/tests> ./complex_test                                                  
(-2.45689e+07,-134386) using manual *:0.106551
(-2.45689e+07,-134386) using stdlib *:5.37662

我也试过clang-800（Mac OS），没有看到这种极端的速度下降。Mac上的g++-5和Linux上的g++-5是一样的。也许我发现了一个编译器错误？

gcc

来源：https://stackoverflow.com/questions/42659668/stdcomplex-multiplication-is-extremely-slow

2条答案

按热度按时间

qcbq4gxm1#

这篇文章让我非常担心！使用-O3和-ffast-math（或-Ofast），差异就会消失。
gcc differences between -O3 vs -Ofast optimizations
直型：
g++ -std=c11定时复杂. cpp && ./a.输出
（-2.50606e+07，-29494.2）使用手册 *：5.20456
（-2.50606e+07，-29494.2），使用标准数据库 *：4.02066
奥法斯特：
g -快速-标准=c11定时复杂. cpp && ./a.输出
（-2.50606e+07，-29494.2）使用手册 *：0.154484
（-2.50606e+07，-29494.2），使用标准数据库 *：0.155045
O3：
g -O3 -标准=c11定时复合体. cpp && ./a.输出
（-2.50606e+07，-29494.2）使用手册 *：0.193446
（-2.50606e+07，-29494.2），使用标准数据库 *：0.350336
O3 +快速数学：
C11时间复杂. cpp && ./a.输出
（-2.50606e+07，-29494.2）使用手册 *：0.154603
（-2.50606e+07，-29494.2），使用标准数据库 *：0.156592
快速数学：
C++11时间复杂度. cpp && ./a.输出
（-2.50606e+07，-29494.2）使用手册 *：5.17364
（-2.50606e+07，-29494.2），使用标准数据库 *：4.0194