我有一个运行Flask应用程序的docker容器,我同时使用tensorflow和pytorch。在torch
中我可以使用GPU,但在Tensorflow中不行。nvidia-smi
输出:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off| 00000000:00:04.0 Off | 0 |
| N/A 60C P0 29W / 70W| 1146MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
字符串
我不明白为什么nvidia-smi
显示了一个cuda版本,但nvcc
不工作,我不能在python:3.9-slim
docker镜像中使用apt
安装cuda工具包。nvcc --version
输出:
bash: nvcc: command not found
型import tensorflow
输出:
2023-07-01 21:12:51.765379: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-01 21:12:51.814111: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-01 21:12:51.814886: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-01 21:12:53.284879: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
型torch
输出:
>>> import torch
>>> torch.cuda.is_available()
True
>>>
型Dockerfile
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim as base
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
WORKDIR /app
RUN apt-get update && apt-get install -y ffmpeg
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=requirements.txt \
python -m pip install -r requirements.txt
COPY . .
EXPOSE 80
CMD gunicorn 'main:app' --bind=0.0.0.0:80 --timeout=36000000 --workers=1 --threads=8
型compose.yaml
个
services:
server:
build:
context: .
ports:
- 80:80
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
型
你能帮我解决这个问题吗?
1条答案
按热度按时间ee7vknir1#
TensorFlow的安装页面指出需要
NVIDIA® GPU drivers version 450.80.02 or higher
、CUDA® Toolkit 11.8
和cuDNN SDK 8.6.0
。TensorRT
的安装是可选的,但可以提高延迟和吞吐量。您的映像不包含任何内容,因为您没有在
Dockerfile
中执行TensorFlow安装页面上提到的任何步骤。你还没有发布你的requirements.txt
。可以考虑直接使用TensorFlow (Dockerhub)/TensorFlow (NVIDIA) docker镜像。