Paddle c++推理的问题 / Problem with c++ inference

nmpmafwu  于 2022-11-19  发布在  其他
关注(0)|答案(7)|浏览(211)

你好,
在构建 paddle c++ 推理之后,我尝试运行 paddleOCR 和 paddleDetection 但我遇到了多个问题:

PaddleOCR:
用 cpu 运行它确实有效,但是当我尝试用 GPU 运行它时,我得到了这个:
malloc(): invalid size (unsorted) Aborted (core dumped)
我试图找到错误的来源,它似乎来自 paddle/fluid/inference/api/details/zero_copy_tensor.cc ,下面的行更具体:
auto *dev_ctx = static_cast<const paddle::platform::CUDADeviceContext *>(pool.Get(gpu_place));

关于可能导致这种情况的任何想法?以及如何解决?

PaddleDETECTION:
我收到此错误 (在 CPU 和 GPU 上):
`terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what():

C++ Traceback (most recent call last):
0 void paddle_infer::Tensor::CopyToCpuImpl(float*, void*, void ()(void), void*) const

Error Message Summary:
InvalidArgumentError: The type of data we are trying to retrieve does not match the type of data currently contained in the container.`

它来自 Paddle/paddle/phi/core/dense_tensor.cc(在另一个问题中谈到 #4174
谢谢。

系统信息

GPU: RTX 3080
OS ubuntu 20.04
GIT COMMIT ID: a40ea45
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
WITH_ROCM: OFF
WITH_ASCEND_CL: OFF
WITH_ASCEND_CXX11: OFF
CUDA version: 11.5
CUDNN version: v8.3
CXX compiler version: 7.5.0

Hello,
After building paddle c++ inference, i tried to run paddleOCR and paddleDetection but i came across multiple issues :

PaddleOCR :
running it with cpu does work but when i try to run it with GPU i get this :
malloc(): invalid size (unsorted) Aborted (core dumped)
i tried to locate the source of the bug and it seems to come from paddle/fluid/inference/api/details/zero_copy_tensor.cc , the line below to be more specific :
auto *dev_ctx = static_cast<const paddle::platform::CUDADeviceContext *>(pool.Get(gpu_place));

Any idea on what might be causing this ? and how to solve it ?

PaddleDETECTION:
i get this error (on CPU and GPU):
`terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what():

C++ Traceback (most recent call last):
0 void paddle_infer::Tensor::CopyToCpuImpl(float*, void*, void ()(void), void*) const

Error Message Summary:
InvalidArgumentError: The type of data we are trying to retrieve does not match the type of data currently contained in the container.`

It's coming from Paddle/paddle/phi/core/dense_tensor.cc (talked about in another issue #4174
Thanks.

System information

GPU: RTX 3080
OS ubuntu 20.04
GIT COMMIT ID: a40ea45
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
WITH_ROCM: OFF
WITH_ASCEND_CL: OFF
WITH_ASCEND_CXX11: OFF
CUDA version: 11.5
CUDNN version: v8.3
CXX compiler version: 7.5.0

wfveoks0

wfveoks01#

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看 官网API文档常见问题历史IssueAI社区 来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

wyyhbhjk

wyyhbhjk2#

  1. PaddleDetection中报错的原因是你的输入类型不正确,应当将输入类型转换成float32
  2. PaddleOCR的问题看起来是无法找到GPU的意思,可以确认下是否是GPU编译的结果或者cuda等环境变量安装正确
nhjlsmyf

nhjlsmyf3#

你好 @wangxinxin08,
谢谢您的答复。

  1. PaddleDetection中报错的原因是你的输入类型不正确,应当将输入类型转换成float32

我对代码进行了更深入的研究,试图查看哪个变量不是浮点数,我发现问题出在预测过程的这一部分:
deploy/cpp/src/object_detector.cc

auto inference_start = std::chrono::steady_clock::now();
for (int i = 0; i < repeats; i++) {
predictor_->Run();
// Get output tensor
out_tensor_list.clear();
output_shape_list.clear();
auto output_names = predictor_->GetOutputNames();
for (int j = 0; j < output_names.size(); j++) {
auto output_tensor = predictor_->GetOutputHandle(output_names[j]);
std::vector<int> output_shape = output_tensor->shape();
int out_num = std::accumulate(
output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
output_shape_list.push_back(output_shape);
if (output_tensor->type() == paddle_infer::DataType::INT32) {
out_bbox_num_data_.resize(out_num);
output_tensor->CopyToCpu(out_bbox_num_data_.data());
} else {
std::vector<float> out_data;
out_data.resize(out_num);
output_tensor->CopyToCpu(out_data.data());
out_tensor_list.push_back(out_data);
}
}

在“else”中,out_data 是一个浮点向量,在我的情况下它包含浮点值(全为 0?这是否指向另一个问题?)也许你可以告诉我这个变量对应什么?为什么这条线 output_tensor->CopyToCpu(out_data.data()); 即使它是浮点向量也会抛出错误?

2. PaddleOCR的问题看起来是无法找到GPU的意思,可以确认下是否是GPU编译的结果或者cuda等环境变量安装正确

如前所述,错误发生在 Paddle/fluid/inference/api/details/zero_copy_tensor.cc
这个调用: pool.Get(gpu_place); 是导致程序崩溃的原因。我检查了 CUDA 的路径,一切似乎都是正确的,我相信这与 paddleOCR 有关,因为这行代码在运行 PaddleDetection 时正确执行(所以我怀疑这是一个全球性问题)。

谢谢你。

Hello @wangxinxin08 ,
thank you for your response.

  1. PaddleDetection中报错的原因是你的输入类型不正确,应当将输入类型转换成float32
    i went a little deeper in the code trying to see what variable wasn't a float i found that the problem was in this part of the prediction process :
    deploy/cpp/src/object_detector.cc

auto inference_start = std::chrono::steady_clock::now();
for (int i = 0; i < repeats; i++) {
predictor_->Run();
// Get output tensor
out_tensor_list.clear();
output_shape_list.clear();
auto output_names = predictor_->GetOutputNames();
for (int j = 0; j < output_names.size(); j++) {
auto output_tensor = predictor_->GetOutputHandle(output_names[j]);
std::vector<int> output_shape = output_tensor->shape();
int out_num = std::accumulate(
output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
output_shape_list.push_back(output_shape);
if (output_tensor->type() == paddle_infer::DataType::INT32) {
out_bbox_num_data_.resize(out_num);
output_tensor->CopyToCpu(out_bbox_num_data_.data());
} else {
std::vector<float> out_data;
out_data.resize(out_num);
output_tensor->CopyToCpu(out_data.data());
out_tensor_list.push_back(out_data);
}
}
`

in the "else" out_data is a vector of float in my case it contains values which are floats (all 0 ? does this point to another problem ?) maybe you could tell me what this variable corresponds to ? and why does this line output_tensor->CopyToCpu(out_data.data()); throw an error even though it's a float vector ?

2. PaddleOCR的问题看起来是无法找到GPU的意思,可以确认下是否是GPU编译的结果或者cuda等环境变量安装正确

as mentioned before the error occurs in Paddle/fluid/inference/api/details/zero_copy_tensor.cc
This call : pool.Get(gpu_place); is what makes the program crash. I checked the CUDA's path and everything seems to be correct and i believe it is something related to paddleOCR because this line of code is executed correctly when running PaddleDetection (so i doubt it's a global problem).

Thank you.

eivnm1vs

eivnm1vs4#

有验证Paddle安装是否成功吗:https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html#old-version-anchor-8-%E4%B8%89%E3%80%81%E9%AA%8C%E8%AF%81%E5%AE%89%E8%A3%85

kmpatx3s

kmpatx3s5#

@MissPenguin
有验证Paddle安装是否成功吗:https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html#old-version-anchor-8-%E4%B8%89%E3%80%81%E9%AA%8C%E8%AF%81%E5%AE%89%E8%A3%85

好吧,我成功地使用 sudo make inference_lib_dist 构建了库

在 cpu 上运行 ocr 会给出正确的输出,所以我认为 paddle 已经正确构建。
构建 ppocr ( sudo ./tools/build.sh ) 也不会在终端上显示任何错误

我之前提到过导致崩溃的确切行,这是我在 gpu 和 cpu 上得到的输出。
PaddleOCR :

CPU :
mode: det
total images num: 1
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Fused 0 subgraphs into layer_norm op.
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- fused 0 pairs of fc gru patterns
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0316 13:17:40.582302 895653 fuse_pass_base.cc:57] --- detected 56 subgraphs
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
I0316 13:17:40.589912 895653 fuse_pass_base.cc:57] --- detected 1 subgraphs
--- Running IR pass [is_test_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0316 13:17:40.596658 895653 memory_optimize_pass.cc:216] Cluster name : x size: 12
I0316 13:17:40.596663 895653 memory_optimize_pass.cc:216] Cluster name : relu_6.tmp_0 size: 2048
I0316 13:17:40.596665 895653 memory_optimize_pass.cc:216] Cluster name : conv2d_120.tmp_0 size: 1024
I0316 13:17:40.596666 895653 memory_optimize_pass.cc:216] Cluster name : relu_12.tmp_0 size: 4096
I0316 13:17:40.596668 895653 memory_optimize_pass.cc:216] Cluster name : relu_13.tmp_0 size: 8192
I0316 13:17:40.596668 895653 memory_optimize_pass.cc:216] Cluster name : relu_2.tmp_0 size: 1024
I0316 13:17:40.596670 895653 memory_optimize_pass.cc:216] Cluster name : batch_norm_51.tmp_3 size: 8192
I0316 13:17:40.596671 895653 memory_optimize_pass.cc:216] Cluster name : conv2d_113.tmp_0 size: 8192
--- Running analysis [ir_graph_to_program_pass]
I0316 13:17:40.632854 895653 analysis_predictor.cc:1000] ======= optimize end =======
I0316 13:17:40.634990 895653 naive_executor.cc:101] --- skip [feed], feed -> x
I0316 13:17:40.636521 895653 naive_executor.cc:101] --- skip [sigmoid_0.tmp_0], fetch -> fetch
Detected boxes num: 2
The detection visualized image saved in ./ocr_vis.png

GPU:

mode: det
total images num: 1
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_bn_fuse_pass]
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0316 13:19:03.287709 895718 fuse_pass_base.cc:57] --- detected 56 subgraphs
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I0316 13:19:03.320012 895718 fuse_pass_base.cc:57] --- detected 4 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0316 13:19:03.323369 895718 ir_params_sync_among_devices_pass.cc:79] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I0316 13:19:03.350996 895718 memory_optimize_pass.cc:216] Cluster name : x size: 12
I0316 13:19:03.351004 895718 memory_optimize_pass.cc:216] Cluster name : relu_2.tmp_0 size: 1024
I0316 13:19:03.351006 895718 memory_optimize_pass.cc:216] Cluster name : batch_norm_50.tmp_4 size: 2048
I0316 13:19:03.351007 895718 memory_optimize_pass.cc:216] Cluster name : relu_6.tmp_0 size: 2048
I0316 13:19:03.351008 895718 memory_optimize_pass.cc:216] Cluster name : conv2d_120.tmp_0 size: 1024
I0316 13:19:03.351009 895718 memory_optimize_pass.cc:216] Cluster name : relu_12.tmp_0 size: 4096
I0316 13:19:03.351011 895718 memory_optimize_pass.cc:216] Cluster name : conv2d_124.tmp_0 size: 256
I0316 13:19:03.351012 895718 memory_optimize_pass.cc:216] Cluster name : batch_norm_48.tmp_3 size: 8192
I0316 13:19:03.351013 895718 memory_optimize_pass.cc:216] Cluster name : relu_13.tmp_0 size: 8192
--- Running analysis [ir_graph_to_program_pass]
I0316 13:19:03.380610 895718 analysis_predictor.cc:1000] ======= optimize end =======
I0316 13:19:03.382716 895718 naive_executor.cc:101] --- skip [feed], feed -> x
I0316 13:19:03.383641 895718 naive_executor.cc:101] --- skip [sigmoid_0.tmp_0], fetch -> fetch
W0316 13:19:03.415621 895718 gpu_context.cc:244] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.5
W0316 13:19:03.417965 895718 gpu_context.cc:272] device: 0, cuDNN Version: 8.3.
malloc(): invalid size (unsorted)
Aborted (core dumped)

1tu0hz3e

1tu0hz3e6#

按照我上面提供的验证方式:

验证paddle安装正确性的输出是啥样的,可以贴出来看下吗

eiee3dmh

eiee3dmh7#

按照我上面提供的验证方式:

验证paddle安装正确性的输出是啥样的,可以贴出来看下吗

这是输出:

import paddle
paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0316 15:11:13.183279 897518 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.0
W0316 15:11:13.196525 897518 device_context.cc:422] device: 0, cuDNN Version: 8.3.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

谢谢。
@MissPenguin

相关问题