当前环境信息如下:
PyTorch版本:2.3.0+cu121
是否为调试构建:否
用于构建PyTorch的CUDA版本:12.1
是否使用ROCM进行构建:否
操作系统:Ubuntu 22.04.2 LTS(x86_64)
GCC版本:(Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang版本:无法收集
CMake版本:3.29.3
Libc版本:glibc-2.35
Python版本:3.11.4(主,2022年7月5日13:45:01) [GCC 11.2.0] (64位运行时)
Python平台:Linux-5.15.0-88-generic-x86_64-with-glibc2.35
是否支持CUDA:是
CUDA运行时版本:无法收集
CUDA_MODULE_LOADING设置为:LAZY
GPU模型和配置:
GPU 0:NVIDIA A800-SXM4-80GB
GPU 1:NVIDIA A800-SXM4-80GB
GPU 2:NVIDIA A800-SXM4-80GB
GPU 3:NVIDIA A800-SXM4-80GB
GPU 4:NVIDIA A800-SXM4-80GB
GPU 5:NVIDIA A800-SXM4-80GB
GPU 6:NVIDIA A800-SXM4-80GB
GPU 7:NVIDIA A800-SXM4-80GB
Nvidia驱动版本:525.125.06
cuDNN版本:可能是以下之一:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.6.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.6.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.6.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.6.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.6.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.6.0
GPU5 NV8 NV8 NV8 NV8 NV8 X NV8 NV8 SYS SYS SYS SYS SYS 28-55 1
GPU6 NV8 NV8 NV8 NV8 NV8 NV8 X NV8 SYS SYS SYS SYS SYS 28-55 1
GPU7 NV8 NV8 NV8 NV8 NV8 NV8 NV8 X SYS SYS SYS SYS SYS 28-55 1
NIC0 PXB PXB NODE NODE SYS SYS SYS SYS X PIX NODE NODE NODE
NIC1 PXB PXB NODE NODE SYS SYS SYS SYS PIX X NODE NODE NODE
NIC2 NODE NODE PXB PXB SYS SYS SYS SYS NODE NODE X PIX NODE
NIC3 NODE NODE PXB PXB SYS SYS SYS SYS NODE NODE PIX X NODE
NIC4 NODE NODE NODE NODE SYS SYS SYS SYS NODE NODE NODE NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_2
NIC1: mlx5_3
NIC2: mlx5_4
NIC3: mlx5_5
NIC4: mlx5_bond_0
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
1条答案
按热度按时间tp5buhyn1#
我得到了[' or']。但是在tokenizer.get_vocab()中,它是'Ġor'。