Paddle GPU显存优化开启时抛出异常

bweufnob  于 2021-11-30  发布在  Java
关注(0)|答案(2)|浏览(511)

我在用PaddleServing时,使用EnableMemoryOptim开启显存优化,在启动的时候出现以下异常:

terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
  what():  Unknown data type at [/home/zhoushunjie/PaddleSeg-Serving/build/third_party/Paddle/src/extern_paddle/paddle/fluid/inference/analysis/passes/memory_optimize_pass.cc:105]
PaddlePaddle Call Stacks:
0            0x15c1621p void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 1537
1            0x31a8e5ep paddle::inference::analysis::DataTypeToSpace(paddle::framework::proto::VarType_Type) + 190
2            0x31ab879p paddle::inference::analysis::MemoryOptimizePass::CollectVarMemorySize(std::unordered_map<std::string, unsigned long, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, unsignedlong> > >*) const + 1705
3            0x31b19e5p paddle::inference::analysis::MemoryOptimizePass::RunImpl(paddle::inference::analysis::Argument*) +1429
4            0x1c83abbp paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument*) + 939
5            0x15e6a91p paddle::AnalysisPredictor::OptimizeInferenceProgram() + 97
6            0x15e811fp paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr<paddle::framework::ProgramDesc> const&) + 319
7            0x15e82a7p paddle::AnalysisPredictor::Init(std::shared_ptr<paddle::framework::Scope> const&, std::shared_ptr<paddle::framework::ProgramDesc> const&) + 343
8            0x15e86f1p std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) + 977
9            0x15e92f1p std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(paddle::AnalysisConfig const&) + 17

后来我看了一下paddle::inference::analysis::DataTypeToSpace和paddle::inference::analysis::MemoryOptimizePass::CollectVarMemorySize的实现,发现
CollectVarMemorySize函数最后计算每个类型为framework::proto::VarType::Type::VarType_Type_LOD_TENSOR的node的时候,使用DataTypeToSpace函数获取type的大小,可是DataTypeToSpace函数目前的实现并没有考虑到LOD_TENSOR类型的node,直接对这种类型的数据抛出异常,导致无法正常使用显存优化。感觉这里应该是把
node->Var()->GetType()!=framework::proto::VarType::Type::VarType_Type_LOD_TENSOR错打成node->Var()->GetType()==framework::proto::VarType::Type::VarType_Type_LOD_TENSOR了吧。

bihw5rsg

bihw5rsg1#

@joey12300 核对了代码,逻辑的确存在疑问。请问出错的环境能否提供给我们一下,谢谢!

ajsxfq5m

ajsxfq5m2#

@Shixiaowei02 在Centos 6.10 / 7, Ubuntu16.07平台均测试过。GPU库为CUDA 9.2,cuDNN 7.1.4,nccl 2.4.7。

相关问题