llama.cpp Bug: phi-3-mini-4k-it七月更新无法加载,

owfi6suc  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(26)

发生了什么?
我正尝试像往常一样加载phi-3迷你七月更新模型,但它给我以下错误:

llama_model_load: error loading model: error loading model hyperparameters: key not found in model: phi3.attention.sliding_window
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '.\models\me\phi-3-mini-4k-it-July-5\Phi-3.1-mini-4k-instruct-Q8_0_L.gguf'
main: error: unable to load model

此外,phi-2和phi-3原始模型仍然可以正常工作!值得注意的是,我也下载了最新版本的LM Studio,但它也无法运行这个相同的模型,抛出相同的错误。

名称和版本

PS F:\ai3> .\llama.cpp\build\bin\Release\llama-cli.exe --version
版本: 3505 ( b72c20b )
使用MSVC 19.40.33811.0为x64构建

您正在看到问题的操作系统是什么?

Windows

相关日志输出

PS F:\ai3> .\llama.cpp\build\bin\Release\llama-cli.exe -m .\models\me\phi-3-mini-4k-it-July-5\Phi-3.1-mini-4k-instruct-Q8_0_L.gguf -if -p "hello"
Log start
main: build = 3505 (b72c20b8)
main: built with MSVC 19.40.33811.0 for x64
main: seed  = 1722688170
llama_model_loader: loaded meta data with 30 key-value pairs and 195 tensors from .\models\me\phi-3-mini-4k-it-July-5\Phi-3.1-mini-4k-instruct-Q8_0_L.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi3
llama_model_loader: - kv   1:                               general.name str              = Phi3
llama_model_loader: - kv   2:                        phi3.context_length u32              = 4096
llama_model_loader: - kv   3:  phi3.rope.scaling.original_context_length u32              = 4096
llama_model_loader: - kv   4:                      phi3.embedding_length u32              = 3072
llama_model_loader: - kv   5:                   phi3.feed_forward_length u32              = 8192
llama_model_loader: - kv   6:                           phi3.block_count u32              = 32
llama_model_loader: - kv   7:                  phi3.attention.head_count u32              = 32
llama_model_loader: - kv   8:               phi3.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                  phi3.rope.dimension_count u32              = 96
llama_model_loader: - kv  11:                        phi3.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 7
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32064]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32064]   = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 32000
llama_model_loader: - kv  20:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 32000
llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                      quantize.imatrix.file str              = /models/Phi-3.1-mini-4k-instruct-GGUF...
llama_model_loader: - kv  27:                   quantize.imatrix.dataset str              = /training_data/calibration_datav3.txt
llama_model_loader: - kv  28:             quantize.imatrix.entries_count i32              = 128
llama_model_loader: - kv  29:              quantize.imatrix.chunks_count i32              = 151
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:    2 tensors
llama_model_loader: - type q8_0:  128 tensors
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: phi3.attention.sliding_window
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '.\models\me\phi-3-mini-4k-it-July-5\Phi-3.1-mini-4k-instruct-Q8_0_L.gguf'
main: error: unable to load model
PS F:\ai3> .\llama.cpp\build\bin\Release\llama-cli.exe --version
version: 3505 (b72c20b8)
built with MSVC 19.40.33811.0 for x64
PS F:\ai3>

相关问题