keras RTX 3080 -深度学习训练问题:“成功打开动态库libcublas.so.10”后,培训卡住

q1qsirdb  于 2023-03-18  发布在  其他
关注(0)|答案(1)|浏览(270)

在NVIDIA GeForceRTX 3080上进行3D CNN训练时,训练在“Successful opened dynamic library libcublas.so.10"后挂起,我在GTX 1650上运行相同的模型,使用相同的电脑配置,训练完成,没有任何问题,下面分享一下我使用的系统功能。

***操作系统:**Pop!_OS 22.04 LTS,带NVIDIA驱动程序- 64位
***系统内存:**32 GB
***处理器:**AMD®锐龙9 5980 hs,配备Radeon显卡× 16
***显卡:**NVIDIA公司GA 104 M [GeForce ® RTX 3080移动的版/ Max-Q 8 GB/16 GB] / NVIDIA ® GeForce ® RTX 3080笔记本电脑GPU/PCIe/SSE 2
***Python版本:**Python 3.9.7
***Keras版本:**2.4.3
***tensorflow 版本:**2.4.1
***驱动程序版本:**510.68.02
***CUDA版本:**11.6
***Linux内核:**5.15.23-76051523-通用

我在下面分享培训过程。

2022-05-09 10:16:44.283512: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Your tensorflow version is : 2.4
2022-05-09 10:17:02.344274: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-09 10:17:02.345101: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-05-09 10:17:02.383373: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-09 10:17:02.384101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-09 10:17:02.384281: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-05-09 10:17:02.404397: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-05-09 10:17:02.404554: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2022-05-09 10:17:02.416043: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2022-05-09 10:17:02.420146: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2022-05-09 10:17:02.439124: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2022-05-09 10:17:02.442663: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2022-05-09 10:17:02.476159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2022-05-09 10:17:02.476525: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-09 10:17:02.476905: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-09 10:17:02.476992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
[INFO] Processing fold #0...
2022-05-09 10:17:02.678017: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-09 10:17:02.679390: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-09 10:17:02.679567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.545GHz coreCount: 48 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-09 10:17:02.679683: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-05-09 10:17:02.679718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-05-09 10:17:02.679764: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2022-05-09 10:17:02.679779: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2022-05-09 10:17:02.679804: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2022-05-09 10:17:02.679834: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2022-05-09 10:17:02.679854: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2022-05-09 10:17:02.679877: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2022-05-09 10:17:02.680047: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-09 10:17:02.680411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-09 10:17:02.680550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2022-05-09 10:17:02.680953: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1

我该怎么做才能解决这个问题?
如果你想知道更多细节,你可以写信给我。
太感谢你了。

ssm49v7z

ssm49v7z1#

我自己发现的这个问题是有原因的

  1. Tensorflow和Keras版本是因为我安装了一些组件,他们将我的Keras回滚到了旧版本。
  2. CuDa版本不匹配或目标库未包含在PATH或
    1.它无法注册设备。

相关问题