模型训练使用paddle fluid: 1.5.1 GPU版本
C预测依赖的版本(cpu预测）: CONFIGS("baidu/lib/paddlepaddle@v1.5.2-avx-mkl-map_PD_BL@git_tag")
在线预测场景：
利用AnalysisConfig创建predictor，多线程预测（每个线程的predictor尝试过在主线程创建和子线程创建，创建没有问题，最后都会出core），每次最多预测50个（batch_size=50)
出core现象：
同一批预测集，并不是每次跑都会出core，线程数越少概率会小一些，debug了多次出core的信息，每次出core的预测样本都不一样（所以应该不是特定输入触发的core），但是出core的地方是一致的，出core的信息如下
Using host libthread_db library "/opt/compiler/gcc-4.8.2/lib/libthread_db.so.1".
Core was generated by `./bin/sug-as . --flagfile=./conf/gflags.conf'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f584d5f53f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
(gdb) bt
#0 0x00007f584d5f53f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#1 0x00007f584d5f67d8 in abort () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#2 0x00007f584dee5c65 in**gnu_cxx::**verbose_terminate_handler () at ../../../../libstdc-v3/libsupc++/vterminate.cc:95
#3 0x00007f584dee3e06 in**cxxabiv1::terminate (handler=)
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
#4 0x00007f584dee2ec9 incxa_call_terminate (ue_header=0x7f4b98035fc0) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#5 0x00007f584dee3a7a in**cxxabiv1::gxx_personality_v0 (version=, actions=,
exception_class=, ue_header=, context=)
at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:670
#6 0x00007f584d97c853 in *Unwind_RaiseException_Phase2 (exc=exc@entry=0x7f4b98035fc0, context=context@entry=0x7f4bd2bf9900)
at ../../../libgcc/unwind.inc:62
#7 0x00007f584d97cd87 in *Unwind_Resume (exc=0x7f4b98035fc0) at ../../../libgcc/unwind.inc:230
#8 0x00007f5858ecc735 in paddle::memory::detail::BuddyAllocator::Free(void) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#9 0x00007f5858ec9175 in void paddle::memory::legacy::Freepaddle::platform::CPUPlace(paddle::platform::CPUPlace const&, void, unsigned long) () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#10 0x00007f5858ec9f15 in paddle::memory::allocation::LegacyAllocator::FreeImpl(paddle::memory::allocation::Allocation*) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#11 0x00007f585804ebf9 in paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_>, paddle::framework::proto::VarType_Type, unsigned long) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#12 0x00007f5858278ca4 in paddle::operators::FusionSeqConvEltAddReluKernel::Compute(paddle::framework::ExecutionContext const&) const () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#13 0x00007f5858279dd3 in std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::FusionSeqConvEltAddReluKernel, paddle::operators::FusionSeqConvEltAddReluKernel >::operator()(char const, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::*M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#14 0x00007f5858e721c7 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void*, boost::detail::variant::void*, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#15 0x00007f5858e72843 in paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#16 0x00007f5858e6d8d4 in paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#17 0x00007f585807f518 in paddle::framework::NaiveExecutor::Run() () from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#18 0x00007f5857f10718 in paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor > const&, std::vector<paddle::PaddleTensor, std::allocatorpaddle::PaddleTensor >*, int) ()
from /home/map/sug-as.new/bin/../lib/libpaddle_fluid.so
#19 0x0000000000512cd7 in map_sug_as::SemanticsManager::batch_predict (this=0x16d31b0 <map_sug_as::g_data+108816>,
data_buf=0x25708e9e8, query=..., poi_name=..., probs=...)
at baidu/mapsearch/sug-as/code/rd/src/Framework/semantics_manager.cpp:267
#20 0x00000000005ee02c in map_sug_as::RankerModel::calc_nn_feature (this=this@entry=0x7f4b9ae77070, cut_num=cut_num@entry=50,
nn_features=...) at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/ranker_model.cpp:321
#21 0x00000000005ee74d in map_sug_as::RankerModel::reranking (this=this@entry=0x7f4b9ae77070)
at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/ranker_model.cpp:385
---Type to continue, or q to quit---
#22 0x00000000005eeff9 in map_sug_as::RankerModel::calc_weight (this=0x7f4b9ae77070)
at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/ranker_model.cpp:103
#23 0x00000000005ef924 in map_sug_as::RerankerManager::calc_weight (this=0x7f4b9ae76ee0, databuf=0x25708e9e8, sorted_pois=...)
at baidu/mapsearch/sug-as/code/rd/src/Strategy/Reranker/reranker_manager.cpp:30
#24 0x0000000000519374 in map_sug_as::SugAsServer::rerank_queue (this=this@entry=0x7f4b9865c210)
at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_server.cpp:454
#25 0x000000000051e452 in map_sug_as::SugAsServer::search (this=0x7f4b9865c210, databuf=databuf@entry=0x25708e9e8)
at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_server.cpp:525
#26 0x0000000000521fc0 in get_response (databuf=0x25708e9e8) at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_work.cpp:94
#27 map_sug_as::thread_main (arg=) at baidu/mapsearch/sug-as/code/rd/src/Framework/sug_as_work.cpp:333
#28 0x00007f5862ffe1c3 in start_thread () from /opt/compiler/gcc-4.8.2/lib/libpthread.so.0
#29 0x00007f584d6a712d in clone () from /opt/compiler/gcc-4.8.2/lib/libc.so.6********

4条答案

按热度按时间

js81xvg61#

请问您的多线程是怎么设置的呢？预测是一个线程只能起一个preditor

赞(0）回复(0）举报 2022-04-21

ndh0cuux2#

主线程中调用CreatePaddlePredictor创建main_predictor, 预测线程（work线程）的predictor是从根据main_predictor clone的

int init_thread_databuf(conf_info_t &g_conf, thread_data_buf* data_buf) {
paddle::AnalysisConfig config;
config.SetModel(g_conf.semantics_model_path);
config.DisableGpu();
config.SwitchIrOptim();
auto main_predictor = paddle::CreatePaddlePredictor(config);
if (main_predictor == nullptr) {
MAP_LOG_FATAL("create semantic model failure");
return ERR_RETURN;
}

for (int i = 0; i < g_conf.thread_num; i++) {
    (data_buf+i)->semantics_predictor = std::move(main_predictor->Clone());
    if ((data_buf+i)->semantics_predictor == nullptr) {
        MAP_LOG_FATAL("create semantic model failure");
        return ERR_RETURN;
    }   
}   
return SUC_RETURN;

}

mbzjlibv3#

“线程数越少概率会小一些”，如果是直接使用main_predictor单线程预测是否也会core呢？

flseospp4#

没有直接使用main_predictor预测, 而是在多线程的环境下，发送请求是串行，请求随机分配到不同线程中去预测，相同时间应该只有一个线程在执行预测，这种情况没有出core

Paddle 训练好的模型，c++预测接口出core

4条答案

相关问题

热门标签

最新问答