我正在尝试运行一个Docker容器,该容器需要访问我的主机NVIDIA GPU,使用--gpus all
标志启用GPU访问。当我使用nvidia-smi
命令运行容器时,我可以看到一个活动的GPU,这表明容器可以访问GPU。但是,当我只是尝试在容器内运行TensorFlow、PyTorch或ONNX Runtime时,这些库似乎不能检测或使用GPU。
具体来说,当我使用以下命令运行容器时,在ONNX Runtime中只看到CPUExecutionProvider
,而看不到CUDAExecutionProvider
:
sudo docker run --gpus all mycontainer:latest
但是,当我使用nvidia-smi
命令运行相同的容器时,我得到了活动的GPU提示符:
sudo docker run --gpus all mycontainer:latest nvidia-smi
这是活动GPU提示符:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 44C P0 27W / N/A | 10MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
这是Dockerfile,我用它构建了mycontainer
:
FROM nvidia/cuda:11.5.0-base-ubuntu20.04
WORKDIR /home
COPY requirements.txt /home/requirements.txt
# Add the deadsnakes PPA for Python 3.10
RUN apt-get update && \
apt-get install -y software-properties-common libgl1-mesa-glx cmake protobuf-compiler && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update
# Install Python 3.10 and dev packages
RUN apt-get update && \
apt-get install -y python3.10 python3.10-dev python3-pip && \
rm -rf /var/lib/apt/lists/*
# Install virtualenv
RUN pip3 install virtualenv
# Create a virtual environment with Python 3.10
RUN virtualenv -p python3.10 venv
# Activate the virtual environment
ENV PATH="/home/venv/bin:$PATH"
# Install Python dependencies
RUN pip3 install --upgrade pip \
&& pip3 install --default-timeout=10000000 torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116 \
&& pip3 install --default-timeout=10000000 -r requirements.txt
# Copy files
COPY /src /home/src
# Set the PYTHONPATH and LD_LIBRARY_PATH environment variable to include the CUDA libraries
ENV PYTHONPATH=/usr/local/cuda-11.5/lib64
ENV LD_LIBRARY_PATH=/usr/local/cuda-11.5/lib64
# Set the CUDA_PATH and CUDA_HOME environment variable to point to the CUDA installation directory
ENV CUDA_PATH=/usr/local/cuda-11.5
ENV CUDA_HOME=/usr/local/cuda-11.5
# Set the default command
CMD ["sh", "-c", ". /home/venv/bin/activate && python main.py $@"]
我已经检查了我正在使用的TensorFlow、PyTorch和ONNX Runtime版本是否与系统上安装的CUDA版本兼容。我还确保正确设置了LD_LIBRARY_PATH
环境变量,以包含CUDA库的路径。最后,我确保在启动容器时包含--gpus all
标志。并正确配置NVIDIA Docker运行时和设备插件。尽管采取了这些步骤,但在使用TensorFlow、PyTorch或ONNX Runtime时,我仍然无法访问容器内的GPU。导致此问题的原因可能是什么?如何解决?如果您需要更多信息,请告诉我。
1条答案
按热度按时间u0njafvf1#
您应该安装
onnxruntime-gpu
以获得CUDAExecutionProvider
。