Paddle 关于口罩检测模型GPU推理速度问题：2080ti的卡比1660ti的卡要慢接近一倍的时间

1qczuiv0 于 2021-11-30 发布在 Java

关注(0)|答案(5)|浏览(331)

你好，我的paddlepaddle版本为1.8.2 paddlehub版本为1.7.1,python版本3.6,再运用口罩检测模型server版的时候，在1660ti上usegpu=True, 检测时间为30ms，在2080ti上usegpu=True，检测时间反而变慢了，需要 70-80ms，输入图片大小为1280*720,检测算法里面的参数设定都是一样的。cuda版本均为10.0，cudnn均为7.6.1，排查不出原因，只能麻烦您们了/

Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github.Probably there was a similar issue submitted or resolved before.
If there is no solution,please make sure that this is a issue of models including the following details:

System information

-PaddlePaddle version （eg.1.1）or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg.Mac OS 10.14)
-Python version
-Name of Models&Dataset/details of operator
Note: You can get most of the information by running summary_env.py.

To Reproduce

Steps to reproduce the behavior

Describe your current behavior
Code to reproduce the issue
Other info / logs

来源：https://github.com/PaddlePaddle/Paddle/issues/25125

5条答案

按热度按时间

补充一下，我2080ti是四块，我观看了咱们的使用文档，说只支持单卡，所以我把另外三块都禁用了。而且如果没用GPU的话检测人脸时间为300ms左右，所以我肯定是调用了GPU进行了人脸检测，就是不知道为什么2080ti的卡比1660ti的卡要慢接近一倍的时间/

赞(0）回复(0）举报 2021-11-30

预测的代码可以给下么

赞(0）回复(0）举报 2021-11-30

@NHZlX 好的
#设置cuda设备为0
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

1. paddle 人脸检测器

mask_detector = hub.Module(name="pyramidbox_lite_server_mask")
img_rd = stack.pop()
flg.value = False
img_list.append(img_rd)
x = time.time()
result = mask_detector.face_detection(img_list,use_gpu=True,confs_threshold=0.9)

这是我的预测代码，经过我最新的测试，目前我重新安装了2080ti的显卡驱动，因为有4块 2080ti的显卡，所以我禁用了三块。输入nvidia-smi 只显示一个GPU信息，信息如下：
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 446.14 Driver Version: 446.14 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... WDDM | 00000000:82:00.0 Off | N/A |
| 27% 32C P8 21W / 250W | 271MiB / 11264MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU PID Type Process name GPU Memory |
| Usage |
|=============================================================================|
| 0 2008 C+G Insufficient Permissions N/A |
| 0 9456 C+G ...w5n1h2txyewy\SearchUI.exe N/A |
| 0 10400 C+G ...3d8bbwe\MicrosoftEdge.exe N/A |
| 0 10664 C+G ...es.TextInput.InputApp.exe N/A |
| 0 13696 C+G ...y\ShellExperienceHost.exe N/A |
| 0 17128 C+G ...lPanel\SystemSettings.exe N/A |
+-----------------------------------------------------------------------------+

然后目前输入nvcc -v查看cuda版本的话，因为我重装GPU显卡后没有重装cuda10.0显示如下：
nvcc fatal : No input files specified; use option --help for more information

这是我运行程序时，我的nvidia-smi的变化情况：
| NVIDIA-SMI 446.14 Driver Version: 446.14 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... WDDM | 00000000:82:00.0 Off | N/A |
| 27% 44C P2 66W / 250W | 1931MiB / 11264MiB | 4% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU PID Type Process name GPU Memory |
| Usage |
|=============================================================================|
| 0 2008 C+G Insufficient Permissions N/A |
| 0 9092 C ...ython\Python36\python.exe N/A |
| 0 9456 C+G ...w5n1h2txyewy\SearchUI.exe N/A |
| 0 10400 C+G ...3d8bbwe\MicrosoftEdge.exe N/A |
| 0 10664 C+G ...es.TextInput.InputApp.exe N/A |
| 0 10884 C ...ython\Python36\python.exe N/A |
| 0 13328 C ...ython\Python36\python.exe N/A |
| 0 13696 C+G ...y\ShellExperienceHost.exe N/A |
| 0 17128 C+G ...lPanel\SystemSettings.exe N/A |
| 0 17532 C ...ython\Python36\python.exe N/A |
其中python36表示我真的用到了GPU，但是人脸检测时间: 0.08497023582458496,我用1660ti跑的话只需要0.036即可完成。
问题1：目前我认为我2080ti的电脑没有把cuda10.0的环境变量加入进去，因为nvcc -v的输出结果不对，但是程序确实调用了我的2080ti这块显卡。我不明白为什么
问题2：我昨天测试环境与今天测试环境一致，除了重装了显卡驱动之外没有任何操作，昨天输入nvcc-v是显示我的cuda版本为10.0，今天输入nvcc-v是没有输出的，但是人脸检测时间都是在60~80ms，可是我的1660ti的笔记本检测时间能在30ms左右。实在不明白为什么，麻烦帮我想想办法。。我不知道是我哪步没做对导致这样的结果，或者说1660ti和2080ti这两块显卡在调用的时候会有区别吗？

赞(0）回复(0）举报 2021-11-30

hub show pyramidbox_lite_server_mask
hub show pyramidbox_lite_server

通过这个看下hub的版本是哪个，如果不是1.30，请更新下

赞(0）回复(0）举报 2021-11-30

@NHZlX 我在运行hub show pyramidbox_lite_server_mask 包括代码段的时候会报这个提示，我不确定是因为这个会影响我的速度吗
提示信息为：2020-06-22 17:20:36,918-INFO: Instantiated empty configuration.
HDFS initialization failed, please check if .hdfscli，cfg exists.
show pyramidbox_lite_server_mask 我这个版本信息为1.3.0,
show pyramidbox_lite_server这个版本信息为1.2.0

赞(0）回复(0）举报 2021-11-30

相关问题

热门标签

Java query python Node 开发语言 request Util 数据库 Table 后端算法 Logger Message Element Parser

最新问答

xxl-job 安全组扫描到执行器端口服务存在信息泄露漏洞
回答(1) 发布于 3个月前
xxl-job 不能和nacos兼容？
回答(3) 发布于 3个月前
xxl-job 任务执行完后无法结束，日志一直转圈
回答(3) 发布于 3个月前
xxl-job-admin页面上查看调度日志样式问题
回答(1) 发布于 3个月前
xxl-job 参数512字符限制能否去掉
回答(1) 发布于 3个月前