#include <thread>
#include <vector>
#include <benchmark/benchmark.h>
#include <boost/thread/barrier.hpp>
void work() {
volatile int sum = 0;
for (int i = 0; i < 100'000'000; i++) {
sum += i;
}
}
static void thread_routine(boost::barrier& barrier, benchmark::State& state, int thread_id) {
// do setup here, if needed
barrier.wait(); // wait until each thread is created
if (thread_id == 0) {
state.ResumeTiming();
}
barrier.wait(); // wait until the timer is started before doing the work
// do some work
work();
barrier.wait(); // wait until each thread completes the work
if (thread_id == 0) {
state.PauseTiming();
}
barrier.wait(); // wait until the timer is stopped before destructing the thread
// do teardown here, if needed
}
void f(benchmark::State& state) {
const int num_threads = 1000;
boost::barrier barrier(num_threads);
std::vector<std::thread> threads;
threads.reserve(num_threads);
for (int i = 0; i < num_threads; i++) {
threads.emplace_back(thread_routine, std::ref(barrier), std::ref(state), i);
}
for (std::thread& thread : threads) {
thread.join();
}
}
static void BM_AlreadyMultiThreaded(benchmark::State& state) {
for (auto _ : state) {
state.PauseTiming();
f(state);
state.ResumeTiming();
}
}
BENCHMARK(BM_AlreadyMultiThreaded)->Iterations(10)->Unit(benchmark::kMillisecond)->MeasureProcessCPUTime(); // NOLINT(cert-err58-cpp)
BENCHMARK_MAIN();
在我的机器上,这段代码输出(跳过头部):
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------------------------
BM_AlreadyMultiThreaded/iterations:10/process_time 1604 ms 200309 ms 10
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------------------------
BM_AlreadyMultiThreaded/iterations:10/process_time 1680 ms 200102 ms 10
1条答案
按热度按时间ecr0jaav1#
使用线程屏障同步原语等待,直到所有线程都被创建,或者完成设置等。此解决方案使用
boost::barrier
,但从C++20开始也可以使用std::barrier
,或实现自定义屏障。如果自己实现要小心,因为很容易搞砸,但this answer似乎是正确的。将
benchmark::State & state
传递给你的函数和线程,以便在需要时暂停/取消暂停。在我的机器上,这段代码输出(跳过头部):
如果我注解掉所有的
state.PauseTimer()
/state.ResumeTimer()
,它会输出:我认为80 ms的真实的时间/ 200 ms的CPU时间差在统计上是显著的,而不是噪声,这支持了这个例子正确工作的假设。