当前环境
durr@learner:~/vllm$ python collect_env.py
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.5
Libc version: glibc-2.35
Python version: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40
Nvidia driver version: 555.42.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7532 32-Core Processor
CPU family: 23
Model: 49
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
BogoMIPS: 4799.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif umip rdpid arch_capabilities
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 512 KiB (8 instances)
L1i cache: 512 KiB (8 instances)
L2 cache: 4 MiB (8 instances)
L3 cache: 128 MiB (8 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT disabled
Vulnerability Spec rstack overflow: Mitigation; SMT disabled
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] transformers==4.41.2
[pip3] triton==2.3.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi
[conda] torch 2.3.0 pypi_0 pypi
[conda] transformers 4.41.2 pypi_0 pypi
[conda] triton 2.3.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.3
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV4 0-7 0 N/A
GPU1 NV4 X 0-7 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
您希望如何使用vllm?
我正在尝试使用 bartowski/Yi-34B-200K-RPMerge-exl2
的特定分支( https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2 )。具体来说,这个仓库的主分支中没有任何内容,各个分支中有各种量化。对我来说,我想要 6_5
。
文档中写道 " --revision
是需要使用的特定模型版本。它可以是分支名称、标签名称或提交ID。如果未指定,将使用默认版本。"。这听起来像是如何指定特定分支的方法,但它不起作用:
durr@learner:~/vllm$ python3 -m vllm.entrypoints.openai.api_server \
> --model "bartowski/Yi-34B-200K-RPMerge-exl2" \
> --revision "6_5"
/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
INFO 06-11 00:03:37 llm_engine.py:161] Initializing an LLM engine (v0.4.3) with config: model='bartowski/Yi-34B-200K-RPMerge-exl2', speculative_config=None, tokenizer='bartowski/Yi-34B-200K-RPMerge-exl2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=6_5, rope_scaling=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=200000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=bartowski/Yi-34B-200K-RPMerge-exl2)
Traceback (most recent call last):
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
hf_raise_for_status(response)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 315, in hf_raise_for_status
raise EntryNotFoundError(message, response) from e
huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6667f6c9-30c5bd093fb52be86831fd55;61608d57-d8ea-454d-a7ca-58416238f2ea)
Entry Not Found for url: https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2/resolve/main/tokenizer_config.json.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/durr/miniconda3/envs/venv/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/durr/miniconda3/envs/venv/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/entrypoints/openai/api_server.py", line 186, in <module>
engine = AsyncLLMEngine.from_engine_args(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 386, in from_engine_args
engine = cls(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 340, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 462, in _init_engine
return engine_class(*args, **kwargs)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 212, in __init__
self.tokenizer = self._init_tokenizer()
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 408, in _init_tokenizer
return get_tokenizer_group(self.parallel_config.tokenizer_pool_config,
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/transformers_utils/tokenizer_group/__init__.py", line 20, in get_tokenizer_group
return TokenizerGroup(**init_kwargs)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__
self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/transformers_utils/tokenizer.py", line 92, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 817, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 649, in get_tokenizer_config
resolved_config_file = cached_file(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 399, in cached_file
resolved_file = hf_hub_download(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1282, in _hf_hub_download_to_cache_dir
(url_to_download, etag, commit_hash, expected_size, head_call_error) = _get_metadata_or_catch_error(
File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1730, in _get_metadata_or_catch_error
no_exist_file_path.touch()
File "/home/durr/miniconda3/envs/venv/lib/python3.9/pathlib.py", line 1315, in touch
fd = self._raw_open(flags, mode)
File "/home/durr/miniconda3/envs/venv/lib/python3.9/pathlib.py", line 1127, in _raw_open
return self._accessor.open(self, flags, mode)
PermissionError: [Errno 13] Permission denied: '/home/durr/.cache/huggingface/hub/models--bartowski--Yi-34B-200K-RPMerge-exl2/.no_exist/6a044cb3ec9b116e41d049817f1c38e8e74a09f1/tokenizer_config.json'
我还尝试将分支名称放在 --code-revision
中(因为为什么不呢),但那里也没有效果。
搜索现有问题中的“huggingface分支”结果有15页。我翻阅了前三页左右,但没有找到多少有用的信息。这确实是一种不太容易搜索到的术语组合。
5条答案
按热度按时间dojqjjoe1#
好的,我进行了一些实验:
看起来修订值确实被使用了,但是当前的下载器假设有一些文件也在主分支上可用。它试图从
--revision
中指定的分支获取config.json
,但从主分支获取tokenizer_config.json
。snvhrwxg2#
好的,显然你需要指定所有的
--*revision
标志。--revision
似乎只设置实际权重的分支。我原以为
--revision
会设置所有各种*-revision
选项。也许未限定的--revision
应该被重命名为--weights-revision
或其他什么。我认为
--revision
也应该设置--code-revision
和--tokenizer-revision
,除非它们也在命令行上被指定,尽管这可能是某种破坏性的变化。xghobddn3#
I'm on this,
What do you think @DarkLight1337?
Making
--revision
to attend for all revisions? or just renaming it is sufficient?jk9hmnmh4#
我正在处理这个问题,
你觉得@DarkLight1337怎么看?
让
--revision
参加所有的修订吗?还是仅仅重命名就足够了?我们先重命名它。之后(在另一个PR中)我们可以引入一个新的CLI选项来为所有组件设置修订版本。
nbewdwxp5#
好的,谢谢!