vllm 加载合并后的Mistral 8x7b模型失败

mfpqipee  于 7个月前  发布在  其他
关注(0)|答案(5)|浏览(77)

我将一个8x7b的模型与lora适配器合并,并使用torch.save(model.state_dict(), 'path_to_model.pt')保存。然而,当我在新的合并模型上使用vllm进行推理时,我遇到了这个问题:

File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 93, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 246, in from_engine_args
    engine = cls(*engine_configs,
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 107, in __init__
    self._init_workers_ray(placement_group)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 194, in _init_workers_ray
    self._run_workers(
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
    self._run_workers_in_batch(workers, method, *args, **kwargs))
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
    all_outputs = ray.get(all_outputs)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/ray/_private/worker.py", line 2624, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(KeyError): ray::RayWorkerVllm.execute_method() (pid=2596933, ip=192.254.110.7, actor_id=afac0d35c8217a762419a5cc01000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7efd70ee22e0>)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/ray_utils.py", line 32, in execute_method
    return executor(*args, **kwargs)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/worker/worker.py", line 72, in load_model
    self.model_runner.load_model()
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 36, in load_model
    self.model = get_model(self.model_config)
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/model_loader.py", line 124, in get_model
    model.load_weights(model_config.model, model_config.download_dir,
  File "/home/zhh/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/models/mixtral.py", line 525, in load_weights
    param = params_dict[name]
KeyError: 'model.embed_tokens.weight'
xxls0lw8

xxls0lw82#

我也有同样的问题,我们有什么解决办法吗?

cidc1ykv

cidc1ykv3#

请提供一段代码,以便我们了解发生了什么。

b09cbbtk

b09cbbtk4#


# 这是一个示例代码:

from datasets import load_dataset
from tqdm import tqdm
import datasets
from trl.trainer import ConstantLengthDataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training

dataset = load_dataset('json', data_files='yourData.jsonl', split='train')
dataset_simple = dataset.train_test_split(test_size=0.2, train_size=0.8, seed=None)
train_dataset = dataset_simple["train"]
eval_dataset = dataset_simple["test"]
base_model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"

def create_prompt(sample):
    bos_token = ""
    eos_token = ""
    full_prompt = ""
    full_prompt += bos_token
    full_prompt += "### Instruction:
"
    full_prompt += "

### Input:"

    full_prompt += "
" + sample[input]
    full_prompt += "### Output:"
    full_prompt += "
" + sample[output]
    full_prompt += eos_token
    return full_prompt

nf4_config = BitsAndBytesConfig(load_in_8bit=False, load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)
model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=nf4_config, device_map='auto', use_cache=False)

def setPara(model):
    if torch.cuda.device_count() > 1:
        model.is_parallelizable = True
        model.model_parallel = True
        print("set parallel")
setPara(model)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config1 = LoraConfig()
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
args = TrainingArguments(output_dir="test_v1", max_steps=3000, per_device_train_batch_size=25, warmup_steps=0.03, logging_steps=100, save_strategy="steps", save_steps=100, evaluation_strategy="steps", eval_steps=20, learning_rate=2e-5, report_to="tensorboard", bf16=True, optim="paged_adamw_8bit", lr_scheduler_type='constant')
trainer = SFTTrainer(model=model, peft_config=peft_config, max_seq_length=1750, tokenizer=tokenizer, packing=True, formatting_func=create
dgsult0t

dgsult0t5#

我正在使用来自ghcr.io/mistralai/mistral-src/vllm:latest的镜像,这是2个挂载点旧的,我将其更改为vllm/vllm-openai:latest镜像,它可以与safetensors文件一起工作。

相关问题