未找到TensorFlow libdevice,为什么在搜索路径中未找到它？

icomxhvb 于 2022-12-13 发布在其他

关注(0)|答案(6)|浏览(900)

Win 10 64位21 H1; TF2.5，CUDA 11已安装在环境中（Python 3.9.5 Xeus）
我不是唯一一个看到这个错误的人;另请参见（未回答）here和here。问题不明确，建议的解决方案不清楚/似乎不起作用（例如，参见here）

问题使用TF Linear_Mixed_Effects_Models.ipynb示例（从TensorFlow github here下载），执行到达执行“预热阶段”的点，然后抛出错误：

InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]

控制台包含以下输出，显示它找到了GPU，但XLA初始化无法在指定路径中找到- existing！- libdevice

2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

有趣的是，搜索的路径包括“C：/Users/Julian/anaconda 3/envs/TF250_PY395_xeus/Library/bin”
该文件夹的内容包括所有（TF启动时成功加载的）DLL，包括cudart64_110.dll、dudnn64_8.dll...当然还有libdevice.10.bc

问题因为TF说它正在这个位置搜索这个文件，并且文件存在于那里，什么是错误的，我如何修复它？

(NB C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2不存在...环境中安装了CUDA;此路径必须是操作系统安装的最佳猜测路径）
信息：我正在设置路径

aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath

但我还将操作系统环境变量XLA_FLAGS设置为相同的字符串值...我不知道哪一个实际上还在工作，但控制台输出表明它搜索了预期的路径这一事实已经足够好了

tensorflow

来源：https://stackoverflow.com/questions/68614547/tensorflow-libdevice-not-found-why-is-it-not-found-in-the-searched-path

6条答案

按热度按时间

cetgtptt1#

以下代码对我有效。错误消息：

error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice

首先我搜索nvvm目录，然后验证libdevice目录是否存在：

$ find / -type d -name nvvm 2>/dev/null
/usr/lib/cuda/nvvm
$ cd /usr/lib/cuda/nvvm
/usr/lib/cuda/nvvm$ ls
libdevice
/usr/lib/cuda/nvvm$ cd libdevice
/usr/lib/cuda/nvvm/libdevice$ ls
libdevice.10.bc

然后我导出了环境变量：

export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda

如上面的@Insectatorious所示。这解决了错误，我能够运行代码。

赞(0）回复(0）举报 2022-12-13

p3rjfoxz2#

适用于Windows用户
步骤-1

运行（以管理员身份）

conda install -c anaconda cudatoolkit

您可以根据安装的cudaCNN /支持的版本指定cudatoolkit版本，例如：conda install -c anaconda cudatoolkit=10.2.89

步骤2

转到已安装的conada文件夹
C：\程序数据\水蟒3\库\bin

步骤3

找到“libdevice.10.bc”，复制文件

第四步

在bin内创建名为“nvvm”的文件夹
在nvvm内创建另一个名为“libdevice”的文件夹
将“libdevice.10.bc”文件粘贴到“libdevice”中

步骤-5

转到环境变量
系统变量〉新建
变量名：
XLA_标志
变量值：
程序数据库
(edit根据您的目录）

步骤-6重新启动cmd/虚拟环境

赞(0）回复(0）举报 2022-12-13

bqucvtff3#

诊断信息不明确，因而没有帮助;然而，存在一种解决方案，

已通过提供此路径下的文件（作为副本）解决此问题

C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin\nvvm\libdevice\
请注意，C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin是指定给XLA_FLAGS的路径，但它似乎不是在查找libdevice file，而是在查找\nvvm\libdevice\ path。这意味着我不能在XLA_FLAGS中设置一个不同的值来指向libdevice文件的实际位置，因为，用一句话来说，它（不仅仅）是它要查找的 file。
前面的调试信息：

2021-08-05 08:38:52.889213: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-05 08:38:52.896033: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-05 08:38:52.899128: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-05 08:38:52.902510: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-05 08:38:52.905815: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .

搜索路径中没有“CUDA”是不正确的;和FWIW，我认为在C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2中搜索时应该给出不同的错误，因为没有这样的文件夹（那里有一个旧的V10.0文件夹，但没有CUDA 11的操作系统安装）
除非TensorFlow改进了路径处理，否则在每个新的（Anaconda）python环境中都需要此类文件结构操作。
TensorFlow论坛中的完整线程here

赞(0）回复(0）举报 2022-12-13

gab6jxml4#

对于Linux用户，使用tensorflow==2.8添加以下环境变量。

XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda-11.4

赞(0）回复(0）举报 2022-12-13

w8rqjzmb5#

对于使用windows和PowerShell的用户，假设cuda的格式为C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7
环境可设置为：

$env:XLA_FLAGS="--xla_gpu_cuda_data_dir='C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7'"

此处"''"（即嵌套引号）为必填项！
我认为这可能是处理这个XLA错误的最轻的方法。

赞(0）回复(0）举报 2022-12-13

djmepvbi6#

对于那些使用miniconda的用户，只需将文件libdevice.10.bc复制到python应用程序或笔记本的根文件夹中。
它在这里使用python=3.9、cudatoolkit=11.2、cudnn=8.1.0和tensorflow=2.9来工作

赞(0）回复(0）举报 2022-12-13

我来回答

未找到TensorFlow libdevice,为什么在搜索路径中未找到它？

6条答案

相关问题

热门标签

最新问答