感谢大家的辛勤工作!
我正在尝试使用gpt2-xl
(在支持的模型中列出)与多个GPU一起使用。
然而,当我使用2个GPU时,我得到ValueError: Total number of attention heads (25) must be divisible by tensor parallel size (2).
当我使用5个GPU时,我得到
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 229, in from_engine_args
engine = cls(*engine_configs,
File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 106, in __init__
self._init_workers_ray(placement_group)
File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 179, in _init_workers_ray
self._run_workers(
File "<user>/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 696, in _run_workers
all_outputs = ray.get(all_outputs)
File "<user>/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "<user>/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "<user>/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
raise value.as_instanceof_cause()
File "<user>/.local/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 32, in execute_method
return executor(*args, **kwargs)
File "<user>/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 68, in init_model
self.model = get_model(self.model_config)
File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 96, in get_model
model = model_class(model_config.hf_config)
File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/models/gpt2.py", line 208, in __init__
self.transformer = GPT2Model(config)
File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/models/gpt2.py", line 172, in __init__
self.wte = VocabParallelEmbedding(vocab_size, self.embed_dim)
File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/layers.py", line 107, in __init__
VocabUtility.vocab_range_from_global_vocab_size(
File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/utils.py", line 67, in vocab_range_from_global_vocab_size
per_partition_vocab_size = divide(global_vocab_size, world_size)
File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/utils.py", line 18, in divide
ensure_divisibility(numerator, denominator)
File "<user>/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/tensor_parallel/utils.py", line 10, in ensure_divisibility
assert numerator % denominator == 0, "{} is not divisible by {}".format(
AssertionError: 50304 is not divisible by 5
有人能帮帮我吗?
3条答案
按热度按时间dldeef671#
Hi @marconaguib,很遗憾,您的案例是一个我们当前的Tensor并行支持尚未涵盖的边缘情况。我们将
vocab_size
填充为64的倍数,因为人们通常会使用2/4/8/16个GPU来构建一个模型。rqdpfwrv2#
好的,谢谢你!
我暂时会使用单个GPU 👍
pvcm50d13#
bummer.我有6个,但只能用4个。