gcc 如何诊断一个成员变量在代码段中被随机设置为零？

68de4m5k 于 2023-05-18 发布在其他

关注(0)|答案(1)|浏览(144)

我无法展示导致这个问题的整个代码（而且我在物理上也不可能做到），我认为实际解决这个问题可能超出了这里的范围。所以我想问的问题是我如何诊断这个问题。
基本上，我有一个类似于这样的类（注意，我 * 无法 * 再现我在MVCE中遇到的问题，所以我只是大致显示我正在做什么，以便我可以获得有关我需要调试的 * 工具 * 的帮助）：

#include <memory>
#include <array>
#include <semaphore> 
#include <thread> 

struct SharedDataStructure{
    SharedDataStructure(){
         for(auto& value : semaphore_array){
             value = std::make_unique<std::unique_ptr<std::counting_semaphore<2>>(2); 
         }
    }
    std::uint32_t get_latest_index(){... calculates some index}; 
    std::array<std::atomic<uint32_t>,16> atomics_member; 
    std::array<std::unique_ptr<std::counting_semaphore<2>, 16> semaphore_array; 
    std::uint32_t dummy0;

}

struct ThreadClass{
    ThreadClass(std::atomic<bool>& stop_flag, SharedDataStructure& shared_data_structure){
        auto thread_function = [&shared_data_structure, control_toke](){
            ...

            ... create socket
            
            std::array<mmsghdr, 1024> msgvec; 
            std::array< iovec, 1024> iovecs; 
            auto thread_socket = socket(...);
            ... //initialize  initilaize mmsghdr and iovecs up here
            ... //set thread_socket options, bind to (0.0.0.0) and some port here etc.. 
            timespec timeout = {}; 
            timeout.tv_sec = 1; 
            while(!stop_flag){
                // up until this point, shared_data_structure.semaphore_array is filled with what appear to be actual normal pointers. 
                auto packet_count = recvmmsg(thread_socket, msgvec.data(), msgvec.size(), 0, timeout); 
                // after this point, shared_data_structure.semaphore_array is filled with nullptrs, as if somethign zeroed all the values out. 
                for(std::size_t i = 0; i < packet_count; ++packet_count){
                    auto index = shared_data_structure.calculate_latest_index(); // in debugger this is 1, so no out of bounds
                    //causes segmentation fault, because for some reason now all the pointers in semaphore_array are nullptr...
                    shared_data_structure.semaphore_array[index].aquire(); 
                    //do something here. 
                    shared_data_structure.semaphore_array[index].release();
                }
            }
        }
        m_thread = std::thread(thread_function); 
    }
    ~ThreadClass(){
        m_thread.join(); 
    }
    std::thread m_thread; 

}

void create_thread_class(std::atomic<bool>& stop_flag){
    SharedDataStructure shared_data_structure;
    ThreadClass thread_class_0(stop_flag, shared_data_structure); 
    //ThreadClass thread_class_1(stop_flag, shared_data_structure); happens whether this is commented out or not. 
    while(!stop_flag.load()){
        //... at this point this just became an empty loop for debugging. 
    }
}
//invoke create_thread_class in a thread on it's own later.

在这行代码之前：

auto packet_count = recvmmsg(thread_socket, msgvec.data(), msgvec.size(), 0, timeout);

SharedDataStructure::semaphore_array包含一堆“真实的”指针，就像我初始化的那样。但之后，所有指针都变成nullptr。注意，结构中的其他值似乎都没有受到影响。
我都不知道怎么调试这个了。很明显，recvmmsg(...)应该对一个类中没有使用的成员有 * 零 * 的影响。我想我调用了某种未定义的行为，但我甚至不知道如何找到它。结果看起来类似于缓冲区溢出，但我不明白这将如何影响堆栈变量（我不认为recvmmsg在堆栈上做了大量的事情？）.

如何诊断此类问题？

gcc

来源：https://stackoverflow.com/questions/76214146/how-to-diagnose-a-member-variable-being-set-to-zero-randomly-in-segment-of-code

1条答案

按热度按时间

tkclm6bt1#

好吧，我用评论说的，我找到了问题。首先，我使用Clion，我试图通过那里使用GDB，它有点工作，但它卡住了。我会解释我做了什么。
所以我使用了命令行。
我以前

(gdb) break my_file.h:[[line where for(std::size_t i = 0; i < packet_count; ++packet_count) is]]

然后我就做了

(gdb) watch shared_data_structure.semaphore_array[0]

我运行了这个程序并试着观看。一开始我得到了一些奇怪的结果。首先，当我在recvmmsg函数中单步执行错误时，它跳到了thread_function lambda的函数参数，然后又回到了thread_socket的位置。然后，在我再次执行步骤后，gdb中的所有内容都冻结了大约30秒（这可能是我放弃使用Clion的GDB接口的地方）。我想这和我没有glibc的debug-info有关（它之前警告过我），但是我没有能力在我的系统上安装debug-info的东西，所以虽然这可能对我有帮助，但这不是一个选择。
在它冻结之后，它抛出了一些关于它没有找到std::array<std::unique_ptr<std::counting_semaphore<2>, std::default_deleter>, 16>的构造函数调试表示的python错误
但在它显示值已设置在这一点上。手表工作了。
问题是，我用bt运行了一个堆栈跟踪，然后……它只是recvmmesg和程序的其余部分，没有更深层次的内容。
基本上

recvmmsg()
some_std_invoke_stuff()
some_std_thread_stuff()
... all irrelevant out of scope

同样，debug-info可能会有所帮助，但至少在这一点上，我知道问题是100%发生在recvmmsg上。
之后我使用Valgrind检查代码，过了一段时间我看到了一堆错误，比如“mmsghdr[633] accessed uninitialized bytes”（一堆连续值）。所以我不得不诊断mmsghdr初始化。
我没有在上面包括这个，因为再次，解决我的特定问题并不是我想要的（我不得不通过记忆来写这个，我不记得了），但基本上这就是初始化的样子：

std::array<mmsghdr, 1024> msgvec; 
std::array<iovec, 1024> iovecs; 
my::aligned_vector<std::byte, buffer_size * n> buffers; 

for(std::size_t i = 0; i < 1024; ++i){
    iovecs[i].iov_base         = (&buffers + (i * buffer_size));
    iovecs[i].iov_len          = buffer_size;
    msgs[i].msg_hdr.msg_iov    = &iovecs[i];
    msgs[i].msg_hdr.msg_iovlen = 1;
}

一开始我想可能我的I迭代得太远了，所以我Assert反对

for(std::size_t i = 0; i < 1024; ++i){
    assert(buffers.size() + (i * buffer_size + buffer_size)); 
    iovecs[i].iov_base         = (&buffers + (i * buffer_size));
    iovecs[i].iov_len          = buffer_size;
    msgs[i].msg_hdr.msg_iov    = &iovecs[i];
    msgs[i].msg_hdr.msg_iovlen = 1;
}

才意识到真正的问题iov_base是一个void*，它可以接受任何类型的指针。这意味着(&buffers + (i * buffer_size))是my::aligned_vector<std::byte, buffer_size * n>的指针，而不是一堆字节。我不得不将其更改为(buffers.data() + (i * buffer_size))

for(std::size_t i = 0; i < 1024; ++i){
    MY_ASSERT(buffers.size() + (i * buffer_size + buffer_size)); 
    iovecs[i].iov_base         = static_cast<std::byte>(buffers.data() + (i * buffer_size));
    iovecs[i].iov_len          = buffer_size;
    msgs[i].msg_hdr.msg_iov    = &iovecs[i];
    msgs[i].msg_hdr.msg_iovlen = 1;
}

然后我的问题就解决了。

赞(0）回复(0）举报 2023-05-18

我来回答

gcc 如何诊断一个成员变量在代码段中被随机设置为零？

1条答案

相关问题

热门标签

最新问答