这是关于这个问题的后续:Why O_DIRECT is slower than normal read?
我遵循了回答问题的建议,并使用单独的线程实现了预读,但是O_DIRECT版本仍然比非O_DIRECT版本慢。下面是我的代码:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <malloc.h>
#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
#include <condition_variable>
#define BUFSIZE 134217728
// globals
std::mutex mut;
unsigned char* buffers[12]; // global array of pointers to buffers where file will be read
int bytes_read[12] = {0};
std::condition_variable cv;
// write_head is the shared variable associated with cv
int write_head = 0; // index of buffer currently being written to
void producer_thread()
{
int fd;
const char* fname = "1GB.txt";
if ((fd = open(fname, O_RDONLY|O_DIRECT)) < 0) {
printf("%s: cannot open %s\n", fname);
exit(2);
}
for (int i = 0; i < 12; ++i){
unsigned char* buf = buffers[i];
int n = read(fd,buf,BUFSIZE);
bytes_read[i] = n;
// wake up consumer thread
{
std::lock_guard<std::mutex> lk(mut);
write_head = i + 1;
}
cv.notify_all();
if ( n == 0 ){ // if we have reached end of file
std::cout << "Read to end of file" << std::endl;
std::cout << "Buffers used: " << i << std::endl;
return;
}
}
}
void consumer_thread(){
unsigned long result = 0;
for (int i = 0; i < 12; ++i){
// wait for buffer to become available for reading
{
std::unique_lock<std::mutex> lk(mut);
cv.wait(lk, [&]() { return i < write_head; });
}
int n = bytes_read[i];
if ( n == 0 ) {
std::cout << "Result: " << result;
return ;
}
// now process the data
unsigned char* buf = buffers[i];
for (int j=0; j<n; ++j)
result += buf[j];
}
}
int main (int argc, char* argv[]) {
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;
puts("Allocating buffers");
auto start = high_resolution_clock::now();
int alignment = 4096;
// allocate 10 buffers and put them into the global buffers array
for (int i = 0; i < 10; ++i){
unsigned char* buf = (unsigned char*) memalign(alignment, BUFSIZE);
buffers[i] = buf;
}
auto end = high_resolution_clock::now();
/* Getting number of milliseconds as a double. */
duration<double, std::milli> ms_double = end - start;
puts("finished allocating buffers");
std::cout << "time taken: " << ms_double.count() << "ms\n";
// start producer and consumer threads
std::thread t1(producer_thread), t2(consumer_thread);
t1.join();
t2.join();
return 0;
}
字符串
以下是我使用的命令:
g++ fsum.cpp -O3
free && sync && echo 3 > /proc/sys/vm/drop_caches && free
time ./a.out
型
对于非O_DIRECT版本,我只是从上面的源代码中删除了O_DIRECT并重新编译。
如上所示,在刷新页面缓存之后运行每次测量。
以下是我的结果:
O_DIRECT: 0.810s, 0.811s, 0.722s, 0.818s, 0.669s
non-O_DIRECT: 0.666s, 0.754s, 0.615s, 0.634s, 0.634s
型
看起来非O_DIRECT版本始终比O_DIRECT版本快0.1- 0.2秒左右。从字面上看,唯一的区别是文件在O_DIRECT版本中是用O_DIRECT打开的,而在非O_DIRECT版本中不是用O_DIRECT打开的-其他一切都是一样的。
预读仍然是个问题吗?也许Linux的预读比我实现的预读更有效?
更新:我在下面附上了iostat日志:
root@x:~/test# g++ fsum2.cc -O3
root@x:~/test# iostat
Linux 6.1.0-9-amd64 (x) 01/07/23 _x86_64_ (16 CPU)
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
dm-0 0.93 183.61 4.90 457.03 190635797 5089164 474519964
dm-1 0.93 183.61 4.90 457.03 190632469 5089164 474519964
dm-2 0.00 0.00 0.00 0.00 2296 0 0
nvme0n1 1.30 183.62 4.90 457.93 190649314 5089166 475449288
root@x:~/test# time ./a.out
Allocating buffers
time taken0.077616ms
finished allocating buffers
Read to end of file
Buffers used: 8Result: 0
real 0m0.737s
user 0m0.100s
sys 0m0.141s
root@x:~/test# iostat
Linux 6.1.0-9-amd64 (x) 01/07/23 _x86_64_ (16 CPU)
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
dm-0 0.93 184.55 4.90 457.03 191613521 5089196 474519964
dm-1 0.93 184.55 4.90 457.03 191610193 5089196 474519964
dm-2 0.00 0.00 0.00 0.00 2296 0 0
nvme0n1 1.31 184.56 4.90 457.92 191627038 5089198 475449288
root@x:~/test# nano fsum2.cc
root@x:~/test# g++ fsum2.cc -O3
root@x:~/test# free && sync && echo 3 > /proc/sys/vm/drop_caches && free
root@x:~/test# iostat
Linux 6.1.0-9-amd64 (x) 01/07/23 _x86_64_ (16 CPU)
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
dm-0 0.93 184.57 4.90 456.96 191660405 5090008 474519964
dm-1 0.93 184.57 4.90 456.96 191657077 5090008 474519964
dm-2 0.00 0.00 0.00 0.00 2296 0 0
nvme0n1 1.31 184.58 4.90 457.86 191673922 5090010 475449288
root@x:~/test# time ./a.out
Allocating buffers
time taken0.027392ms
finished allocating buffers
Read to end of file
Buffers used: 8Result: 0
real 0m0.614s
user 0m0.089s
sys 0m0.246s
root@x:~/test# iostat
Linux 6.1.0-9-amd64 (x) 01/07/23 _x86_64_ (16 CPU)
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
dm-0 0.94 185.51 4.90 456.96 192639133 5090024 474519964
dm-1 0.93 185.51 4.90 456.96 192635805 5090024 474519964
dm-2 0.00 0.00 0.00 0.00 2296 0 0
nvme0n1 1.31 185.52 4.90 457.85 192652650 5090026 475449288
型
1条答案
按热度按时间kadbb4591#
事实证明,这是因为我使用磁盘加密。
在我当前的系统上(与以前的系统相同,除了没有磁盘加密-我只是用相同的选项重新安装了操作系统,除了这次没有磁盘加密),我得到的非O_DIRECT的中值为0.389s,O_DIRECT的中值为0.369s。因此,O_DIRECT使程序在我当前的系统(没有磁盘加密)上更快,但它使程序在我以前的系统(使用磁盘加密)上更慢。
我不知道为什么。