vllm 无法在多GPU上使用gpt2-xl

zvms9eto  于 2个月前  发布在  其他
关注(0)|答案(3)|浏览(91)

感谢大家的辛勤工作!
我正在尝试使用gpt2-xl(在支持的模型中列出)与多个GPU一起使用。
然而,当我使用2个GPU时,我得到
ValueError: Total number of attention heads (25) must be divisible by tensor parallel size (2).
当我使用5个GPU时,我得到

self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 229, in from_engine_args
    engine = cls(*engine_configs,
  File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 106, in __init__
    self._init_workers_ray(placement_group)
  File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 179, in _init_workers_ray
    self._run_workers(
  File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 696, in _run_workers
    all_outputs = ray.get(all_outputs)
  File "<user>/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "<user>/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "<user>/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
    raise value.as_instanceof_cause()
  File "<user>/.local/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 32, in execute_method
    return executor(*args, **kwargs)
  File "<user>/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 68, in init_model
    self.model = get_model(self.model_config)
  File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 96, in get_model
    model = model_class(model_config.hf_config)
  File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/models/gpt2.py", line 208, in __init__
    self.transformer = GPT2Model(config)
  File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/models/gpt2.py", line 172, in __init__
    self.wte = VocabParallelEmbedding(vocab_size, self.embed_dim)
  File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/layers.py", line 107, in __init__
    VocabUtility.vocab_range_from_global_vocab_size(
  File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/utils.py", line 67, in vocab_range_from_global_vocab_size
    per_partition_vocab_size = divide(global_vocab_size, world_size)
  File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/utils.py", line 18, in divide
    ensure_divisibility(numerator, denominator)
  File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/utils.py", line 10, in ensure_divisibility
    assert numerator % denominator == 0, "{} is not divisible by {}".format(
AssertionError: 50304 is not divisible by 5

有人能帮帮我吗?

dldeef67

dldeef671#

Hi @marconaguib,很遗憾,您的案例是一个我们当前的Tensor并行支持尚未涵盖的边缘情况。我们将 vocab_size 填充为64的倍数,因为人们通常会使用2/4/8/16个GPU来构建一个模型。

rqdpfwrv

rqdpfwrv2#

好的,谢谢你!
我暂时会使用单个GPU 👍

pvcm50d1

pvcm50d13#

bummer.我有6个,但只能用4个。

相关问题