[安装]:如何使用CUDA 11.7和PyTorch 2.0.1安装最新版本的vLLM?

pxy2qtax 于 10个月前发布在其他

关注(0)|答案(1)|浏览(149)

Your current environment

The output of `python collect_env.py`

PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.31
Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.7.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100 80GB PCIe
GPU 1: NVIDIA A100 80GB PCIe
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] pytorch-lightning==2.2.1
[pip3] torch==2.0.1
[pip3] torchmetrics==1.3.2
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0
[conda] numpy 1.23.4 pypi_0 pypi
[conda] pytorch-lightning 2.2.1 pypi_0 pypi
[conda] torch 2.0.1 pypi_0 pypi
[conda] torchmetrics 1.3.2 pypi_0 pypi
[conda] torchvision 0.15.2 pypi_0 pypi
[conda] triton 2.0.0 pypi_0 pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 mlx5_0 mlx5_1 mlx5_2 mlx5_3 CPU Affinity NUMA Affinity
GPU0 X SYS NODE NODE SYS SYS 0-31,64-95 0
GPU1 SYS X SYS SYS NODE NODE 32-63,96-127 1
mlx5_0 NODE SYS X PIX SYS SYS
mlx5_1 NODE SYS PIX X SYS SYS
mlx5_2 SYS NODE SYS SYS X PIX
mlx5_3 SYS NODE SYS SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

How you are installing vllm

pip install -vvv vllm

I would like to know how to install the latest version of vLLM with CUDA 11.7 and PyTorch 2.0.1, as I want to use vLLM for inference with StarCoder2 or miniCPM. Alternatively, is there a way to make an older version of vLLM, such as 0.2.1, support miniCPM?

vllm

来源：https://github.com/vllm-project/vllm/issues/4085