Paddle PPyolo2+tensorRt在win10上部署c++预测,生成序列化模型问题

pod7payv  于 2021-11-30  发布在  Java
关注(0)|答案(31)|浏览(1889)

 1)PaddlePaddle版本:2.1
   2)GPU:2080super、CUDA10.2和CUDNN7
   4)win10
   1)C++预测:version.txt文件
GIT COMMIT ID: 4ccd9a0
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
CUDA version: 10.2
CUDNN version: v7.6
CXX compiler version: 19.16.27045.0
WITH_TENSORRT: ON
TensorRT version: v7
   4)预测库来源:官方下载

问题:在win10环境下,使用官方提供检测预测库,配置好tensorRt,使用cmake生成工程后,每次使用tensorRt运行程序都会重新生成序列化模型,在ubuntu环境下没有此问题,而且在windows环境下生成序列化模型速度很慢

zbdgwd5y

zbdgwd5y1#

这些日志文件都是从win10 CMD命令窗口中拷贝过来的

91zkwejq

91zkwejq2#

谢谢,用了这个方法,貌似已经解决了这个问题。

xmd2e60i

xmd2e60i3#

设置应该掉个顺序,DeletePass应该放在所有配置的最后面
config.EnableMemoryOptim();
config.pass_builder()->DeletePass("conv_bn_fuse_pass");

bfrts1fy

bfrts1fy4#

1.8版本的问题应该:https://github.com/PaddlePaddle/Paddle/blob/75efc0ace116ca2ad2ed3b5ac5d16bcec01e2004/paddle/fluid/inference/analysis/helper.h#L246这里,自己编译的话可以修改为std::ofstream outfile(trt_serialized_path, std::ios::binary)尝试

uubf1zoe

uubf1zoe5#

Hi,paddle-bot,
怎么这个问题,没有人回复的。

谢谢!

kg7wmglp

kg7wmglp6#

Hi,paddle-bot
我也有遇到同样的问题,现在急用与项目落地,希望Paddle的技术支持尽快解决。

谢谢!

piv4azn7

piv4azn77#

// set use dynamic shape
if (this->use_dynamic_shape_) {
// set DynamicShsape for image tensor
const std::vector min_input_shape = {1, 3, this->trt_min_shape_, this->trt_min_shape_};
const std::vector max_input_shape = {1, 3, this->trt_max_shape_, this->trt_max_shape_};
const std::vector opt_input_shape = {1, 3, this->trt_opt_shape_, this->trt_opt_shape_};
const std::map<std::string, std::vector> map_min_input_shape = {{"image", min_input_shape}};
const std::map<std::string, std::vector> map_max_input_shape = {{"image", max_input_shape}};
const std::map<std::string, std::vector> map_opt_input_shape = {{"image", opt_input_shape}};

config.SetTRTDynamicShapeInfo(map_min_input_shape,
                              map_max_input_shape,
                              map_opt_input_shape);
std::cout << "TensorRT dynamic shape enabled" << std::endl;

}
}
} else {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
config.SwitchUseFeedFetchOps(false);
config.SwitchIrOptim(true);
// config.DisableGlogInfo();
// Memory optimization
config.pass_builder()->DeletePass("conv_bn_fuse_pass");
config.EnableMemoryOptim();
predictor_ = std::move(CreatePredictor(config));
}

chhqkbe1

chhqkbe18#

Are you satisfied with the resolution of your issue?

YES
No

4dc9hkyq

4dc9hkyq9#

void ObjectDetector::LoadModel(const std::string& model_dir,
const int batch_size,
const std::string& run_mode) {
paddle_infer::Config config;
std::string prog_file = model_dir + OS_PATH_SEP + "model.pdmodel";
std::string params_file = model_dir + OS_PATH_SEP + "model.pdiparams";
config.SetModel(prog_file, params_file);
if (this->use_gpu_) {
config.EnableUseGpu(200, this->gpu_id_);
config.SwitchIrOptim(true);
// use tensorrt
if (run_mode != "fluid") {
auto precision = paddle_infer::Config::Precision::kFloat32;
if (run_mode == "trt_fp32") {
precision = paddle_infer::Config::Precision::kFloat32;
}
else if (run_mode == "trt_fp16") {
precision = paddle_infer::Config::Precision::kHalf;
}
else if (run_mode == "trt_int8") {
precision = paddle_infer::Config::Precision::kInt8;
} else {
printf("run_mode should be 'fluid', 'trt_fp32', 'trt_fp16' or 'trt_int8'");
}
// set tensorrt
config.EnableTensorRtEngine(
1 << 30,
batch_size,
this->min_subgraph_size_,
precision,
true,
this->trt_calib_mode_);

// set use dynamic shape
  if (this->use_dynamic_shape_) {
    // set DynamicShsape for image tensor
    const std::vector<int> min_input_shape = {1, 3, this->trt_min_shape_, this->trt_min_shape_};
    const std::vector<int> max_input_shape = {1, 3, this->trt_max_shape_, this->trt_max_shape_};
    const std::vector<int> opt_input_shape = {1, 3, this->trt_opt_shape_, this->trt_opt_shape_};
    const std::map<std::string, std::vector<int>> map_min_input_shape = {{"image", min_input_shape}};
    const std::map<std::string, std::vector<int>> map_max_input_shape = {{"image", max_input_shape}};
    const std::map<std::string, std::vector<int>> map_opt_input_shape = {{"image", opt_input_shape}};

    config.SetTRTDynamicShapeInfo(map_min_input_shape,
                                  map_max_input_shape,
                                  map_opt_input_shape);
    std::cout << "TensorRT dynamic shape enabled" << std::endl;
  }
}

} else {
config.DisableGpu();
if (this->use_mkldnn_) {
config.EnableMKLDNN();
// cache 10 different shapes for mkldnn to avoid memory leak
config.SetMkldnnCacheCapacity(10);
}
config.SetCpuMathLibraryNumThreads(this->cpu_math_library_num_threads_);
}
config.SwitchUseFeedFetchOps(false);
config.SwitchIrOptim(true);
// config.DisableGlogInfo();
// Memory optimization
config.pass_builder()->DeletePass("conv_bn_fuse_pass");
config.EnableMemoryOptim();
predictor_ = std::move(CreatePredictor(config));
}

6za6bjd0

6za6bjd010#

看起来这句话没有生效,您能否发下config部分的代码看下

i7uaboj4

i7uaboj411#

这句话放在了 LoadModel() 函数最后面。完整日志文件如下:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0604 11:01:56.573653 10276 analysis_config.cc:424] use_dlnne_:0
I0604 11:01:56.573653 10276 analysis_config.cc:424] use_dlnne_:0
I0604 11:01:56.574651 10276 analysis_config.cc:424] use_dlnne_:0
I0604 11:01:56.574651 10276 analysis_config.cc:424] use_dlnne_:0
I0604 11:01:57.231948 10276 analysis_config.cc:424] use_dlnne_:0
I0604 11:01:57.231948 10276 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
I0604 11:01:57.281819 10276 analysis_predictor.cc:508] TensorRT subgraph engine is enabled
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [conv_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0604 11:01:57.682780 10276 graph_pattern_detector.cc:91] --- detected 101 subgraphs
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0604 11:01:57.791498 10276 graph_pattern_detector.cc:91] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0604 11:01:58.033870 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0604 11:01:59.308579 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_5959209702137142755
I0604 11:01:59.309566 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0604 11:01:59.322532 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_75932943689739073
I0604 11:01:59.323530 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 110 nodes
I0604 11:01:59.342481 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:01:59.365422 10276 engine.cc:86] Run Paddle-TRT FP16 mode
I0604 11:07:04.900202 10276 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_17584665926121430935
I0604 11:07:04.909179 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 7 nodes
I0604 11:07:04.910177 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:07:04.911175 10276 engine.cc:86] Run Paddle-TRT FP16 mode
I0604 11:07:29.153393 10276 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_16452447319886688765
I0604 11:07:29.154420 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0604 11:07:29.155388 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:07:29.156385 10276 engine.cc:86] Run Paddle-TRT FP16 mode
I0604 11:07:43.924638 10276 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_929599991364810441
I0604 11:07:43.925634 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 10 nodes
I0604 11:07:43.928625 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:07:43.930619 10276 engine.cc:86] Run Paddle-TRT FP16 mode
I0604 11:08:00.518687 10276 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_3499799567721492121
I0604 11:08:00.520730 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 10 nodes
I0604 11:08:00.521678 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:08:00.524699 10276 engine.cc:86] Run Paddle-TRT FP16 mode
I0604 11:08:17.202658 10276 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_10649716873406707892
I0604 11:08:17.204654 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0604 11:08:17.256531 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4015369397548113553
I0604 11:08:17.257525 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 7 nodes
I0604 11:08:17.258512 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:08:17.262501 10276 engine.cc:86] Run Paddle-TRT FP16 mode
I0604 11:08:37.786188 10276 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_5782693991907650800
I0604 11:08:37.787155 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0604 11:08:37.788152 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:08:37.789149 10276 engine.cc:86] Run Paddle-TRT FP16 mode
I0604 11:08:51.444615 10276 tensorrt_subgraph_pass.cc:398] Save TRT Optimized Info to ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4644331660945843395
I0604 11:08:51.445595 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0604 11:08:51.449574 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_16018133220312938016
I0604 11:08:51.450572 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 8 nodes
I0604 11:08:51.454561 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_15733399061656221822
I0604 11:08:51.455559 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0604 11:08:51.460546 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_14319932622819613100
I0604 11:08:51.460546 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0604 11:08:51.475507 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_3710896977018017843
I0604 11:08:51.476516 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0604 11:08:51.519410 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_1038259572823255562
I0604 11:08:51.520403 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0604 11:08:51.525375 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_10307500148995920653
I0604 11:08:51.525375 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0604 11:08:51.533354 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_7456650529460881910
I0604 11:08:51.533354 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 4 nodes
I0604 11:08:51.537343 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_1594808578423983610
I0604 11:08:51.537343 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 6 nodes
I0604 11:08:51.557291 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_4939479952197085245
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m
e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
I0604 11:08:51.578265 10276 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [memory_optimize_pass]e[0m
I0604 11:08:51.645059 10276 memory_optimize_pass.cc:199] Cluster name : concat_4.tmp_0 size: 19660800
I0604 11:08:51.645059 10276 memory_optimize_pass.cc:199] Cluster name : im_shape size: 8
I0604 11:08:51.645059 10276 memory_optimize_pass.cc:199] Cluster name : nearest_interp_v2_1.tmp_0 size: 6553600
I0604 11:08:51.645059 10276 memory_optimize_pass.cc:199] Cluster name : tanh_24.tmp_0 size: 3276800
I0604 11:08:51.645059 10276 memory_optimize_pass.cc:199] Cluster name : tanh_20.tmp_0 size: 3276800
I0604 11:08:51.645059 10276 memory_optimize_pass.cc:199] Cluster name : tmp_10 size: 1638400
I0604 11:08:51.646056 10276 memory_optimize_pass.cc:199] Cluster name : scale_factor size: 8
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I0604 11:08:51.802644 10276 analysis_predictor.cc:595] ======= optimize end =======
I0604 11:08:51.802644 10276 naive_executor.cc:98] --- skip [feed], feed -> scale_factor
I0604 11:08:51.802644 10276 naive_executor.cc:98] --- skip [feed], feed -> image
I0604 11:08:51.802644 10276 naive_executor.cc:98] --- skip [feed], feed -> im_shape
I0604 11:08:51.807631 10276 naive_executor.cc:98] --- skip [concat_4.tmp_0], fetch -> fetch
I0604 11:08:51.807631 10276 naive_executor.cc:98] --- skip [nearest_interp_v2_1.tmp_0], fetch -> fetch
Successfully opened the dir !
total images = 9, batch_size = 1, total steps = 9
W0604 11:08:51.819598 10276 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.1, Runtime API Version: 10.2
W0604 11:08:51.819598 10276 device_context.cc:422] device: 0, cuDNN Version: 7.6.
W0604 11:08:52.084898 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.091879 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.095870 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.098861 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.101853 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.104846 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.109833 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.120803 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.123795 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.125790 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.129779 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.132771 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.134766 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.139753 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.141748 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.143743 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.145737 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.148730 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
W0604 11:08:52.151722 10276 helper.h:80] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles

piok6c0g

piok6c0g12#

还有就是,使用cmake编译工程时 开启了 tensorRt开关添加路劲,使用vs2017工程生成完成后,在vs配置属性中,没有发现tensorRt的路径及,相关库文件

ygya80vv

ygya80vv13#

加上这个语句,后卡在这里
e[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]e[0m
e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m
e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m
e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m
e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m
e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
I0604 11:01:57.682780 10276 graph_pattern_detector.cc:91] --- detected 101 subgraphs
e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m
e[32m--- Running IR pass [squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m
I0604 11:01:57.791498 10276 graph_pattern_detector.cc:91] --- detected 107 subgraphs
e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m
I0604 11:01:58.033870 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0604 11:01:59.308579 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_5959209702137142755
I0604 11:01:59.309566 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 5 nodes
I0604 11:01:59.322532 10276 tensorrt_subgraph_pass.cc:367] Load TRT Optimized Info from ./weight/ppyolov2_r50vd_dcn_365e_coco//_opt_cache//trt_serialized_75932943689739073
I0604 11:01:59.323530 10276 tensorrt_subgraph_pass.cc:137] --- detect a sub-graph with 110 nodes
I0604 11:01:59.342481 10276 tensorrt_subgraph_pass.cc:377] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0604 11:01:59.365422 10276 engine.cc:86] Run Paddle-TRT FP16 mode

nukf8bse

nukf8bse15#

您在config配置中加上这句试试

config.pass_builder()->DeletePass("conv_bn_fuse_pass");

相关问题