vllm [用法]:如何在下载模型时指定使用Hugging Face上的特定分支？

gajydyqb 于 6个月前发布在其他

关注(0)|答案(5)|浏览(56)

当前环境

durr@learner:~/vllm$ python collect_env.py
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.5
Libc version: glibc-2.35

Python version: 3.9.19 (main, May  6 2024, 19:43:03)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40

Nvidia driver version: 555.42.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      48 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             8
On-line CPU(s) list:                0-7
Vendor ID:                          AuthenticAMD
Model name:                         AMD EPYC 7532 32-Core Processor
CPU family:                         23
Model:                              49
Thread(s) per core:                 1
Core(s) per socket:                 8
Socket(s):                          1
Stepping:                           0
BogoMIPS:                           4799.99
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif umip rdpid arch_capabilities
Virtualization:                     AMD-V
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          512 KiB (8 instances)
L1i cache:                          512 KiB (8 instances)
L2 cache:                           4 MiB (8 instances)
L3 cache:                           128 MiB (8 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; untrained return thunk; SMT disabled
Vulnerability Spec rstack overflow: Mitigation; SMT disabled
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] transformers==4.41.2
[pip3] triton==2.3.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] transformers              4.41.2                   pypi_0    pypi
[conda] triton                    2.3.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.3
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     0-7     0               N/A
GPU1    NV4      X      0-7     0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

您希望如何使用vllm?
我正在尝试使用 bartowski/Yi-34B-200K-RPMerge-exl2 的特定分支( https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2 )。具体来说，这个仓库的主分支中没有任何内容，各个分支中有各种量化。对我来说，我想要 6_5 。
文档中写道 " --revision 是需要使用的特定模型版本。它可以是分支名称、标签名称或提交ID。如果未指定，将使用默认版本。"。这听起来像是如何指定特定分支的方法，但它不起作用：

durr@learner:~/vllm$ python3 -m vllm.entrypoints.openai.api_server \
>     --model "bartowski/Yi-34B-200K-RPMerge-exl2" \
>     --revision "6_5"
/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-11 00:03:37 llm_engine.py:161] Initializing an LLM engine (v0.4.3) with config: model='bartowski/Yi-34B-200K-RPMerge-exl2', speculative_config=None, tokenizer='bartowski/Yi-34B-200K-RPMerge-exl2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=6_5, rope_scaling=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=200000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=bartowski/Yi-34B-200K-RPMerge-exl2)
Traceback (most recent call last):
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 315, in hf_raise_for_status
    raise EntryNotFoundError(message, response) from e
huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6667f6c9-30c5bd093fb52be86831fd55;61608d57-d8ea-454d-a7ca-58416238f2ea)

Entry Not Found for url: https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2/resolve/main/tokenizer_config.json.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/entrypoints/openai/api_server.py", line 186, in <module>
    engine = AsyncLLMEngine.from_engine_args(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 386, in from_engine_args
    engine = cls(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 340, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 462, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 212, in __init__
    self.tokenizer = self._init_tokenizer()
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 408, in _init_tokenizer
    return get_tokenizer_group(self.parallel_config.tokenizer_pool_config,
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/transformers_utils/tokenizer_group/__init__.py", line 20, in get_tokenizer_group
    return TokenizerGroup(**init_kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__
    self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/transformers_utils/tokenizer.py", line 92, in get_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 817, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 649, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 399, in cached_file
    resolved_file = hf_hub_download(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1282, in _hf_hub_download_to_cache_dir
    (url_to_download, etag, commit_hash, expected_size, head_call_error) = _get_metadata_or_catch_error(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1730, in _get_metadata_or_catch_error
    no_exist_file_path.touch()
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/pathlib.py", line 1315, in touch
    fd = self._raw_open(flags, mode)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/pathlib.py", line 1127, in _raw_open
    return self._accessor.open(self, flags, mode)
PermissionError: [Errno 13] Permission denied: '/home/durr/.cache/huggingface/hub/models--bartowski--Yi-34B-200K-RPMerge-exl2/.no_exist/6a044cb3ec9b116e41d049817f1c38e8e74a09f1/tokenizer_config.json'

我还尝试将分支名称放在 --code-revision 中(因为为什么不呢),但那里也没有效果。
搜索现有问题中的“huggingface分支”结果有15页。我翻阅了前三页左右，但没有找到多少有用的信息。这确实是一种不太容易搜索到的术语组合。

vllm

来源：https://github.com/vllm-project/vllm/issues/5415

5条答案

按热度按时间

dojqjjoe1#

好的，我进行了一些实验：

durr@learner:~/vllm$ python3 -m vllm.entrypoints.openai.api_server     --model "bartowski/Yi-34B-200K-RPMerge-exl2"     --revision "resolve/6_5"
/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2/resolve/resolve%2F6_5/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 399, in cached_file
    resolved_file = hf_hub_download(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1282, in _hf_hub_download_to_cache_dir
    (url_to_download, etag, commit_hash, expected_size, head_call_error) = _get_metadata_or_catch_error(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 311, in hf_raise_for_status
    raise RevisionNotFoundError(message, response) from e
huggingface_hub.utils._errors.RevisionNotFoundError: 404 Client Error. (Request ID: Root=1-6667fa9c-6e6925e966e7f2a14334fa98;0dda972e-b7af-4efb-af52-976433043987)

Revision Not Found for url: https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2/resolve/resolve%2F6_5/config.json.
Invalid rev id: resolve/6_5

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/entrypoints/openai/api_server.py", line 186, in <module>
    engine = AsyncLLMEngine.from_engine_args(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 362, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 559, in create_engine_config
    model_config = ModelConfig(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/config.py", line 129, in __init__
    self.hf_config = get_config(self.model, trust_remote_code, revision,
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/vllm/transformers_utils/config.py", line 27, in get_config
    config = AutoConfig.from_pretrained(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 934, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/home/durr/miniconda3/envs/venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 429, in cached_file
    raise EnvironmentError(
OSError: resolve/6_5 is not a valid git identifier (branch name, tag name or commit id) that exists for this model name. Check the model page at 'https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2' for available revisions.

看起来修订值确实被使用了，但是当前的下载器假设有一些文件也在主分支上可用。它试图从 --revision 中指定的分支获取 config.json ,但从主分支获取 tokenizer_config.json 。

赞(0）回复(0）举报 6个月前

snvhrwxg2#

好的，显然你需要指定所有的 --*revision 标志。--revision 似乎只设置实际权重的分支。

python3 -m vllm.entrypoints.openai.api_server \
    --model "bartowski/Yi-34B-200K-RPMerge-exl2" \
    --revision 6_5 \
    --code-revision 6_5 \
    --tokenizer-revision 6_5

我原以为 --revision 会设置所有各种 *-revision 选项。也许未限定的 --revision 应该被重命名为 --weights-revision 或其他什么。
我认为 --revision 也应该设置 --code-revision 和 --tokenizer-revision,除非它们也在命令行上被指定，尽管这可能是某种破坏性的变化。