你的当前环境信息如下:
The output of python collect_env.py
Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000
Nvidia driver version: 545.23.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
Stepping: 1
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 4399.58
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe >
Virtualization: VT-x
L1d cache: 1.3 MiB (40 instances)
L1i cache: 1.3 MiB (40 instances)
L2 cache: 10 MiB (40 instances)
L3 cache: 100 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: KVM: Mitigation; Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2
[pip3] torchvision==0.17.2
[pip3] triton==2.1.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.1.2 pypi_0 pypi
[conda] torchvision 0.17.2 pypi_0 pypi
[conda] triton 2.1.0 pypi_0 pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.0
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
^[[4mGPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID^[[0m
GPU0 X PIX 0-19,40-59 0 N/A
GPU1 PIX X 0-19,40-59 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
4条答案
按热度按时间wooyq4lh1#
mistralai/Mixtral-8x7B-Instruct-v0.1 更大。你能向我们展示一下不使用 vLLM 加载模型的代码吗?它可能会将许多层加载到你的 RAM 中,而不是你的 GPU。根据这个帖子,如果以 float32 格式加载 Mixtral,需要 174.49 GB。如果你无法获得更大的机器,我建议你使用 AWQ 版本:https://huggingface.co/ybelkada/Mixtral-8x7B-Instruct-v0.1-AWQ
nfg76nw02#
我正在使用H20-GPT的堆栈通过以下代码加载模型:
kgsdhlau3#
你好。抱歉如果我在混淆问题,但我遇到了相同的问题。
我在HPC上运行这个程序,请求190G的内存和2个GPU。然而,我按照下面的配置信息,只得到了2个32Gb的GPU。
环境配置:
我的系统不支持AWQ
zsohkypk4#
使用Vllm加载部分层并将剩余的存储在系统RAM中是否有方法?