llama.cpp Mistral-large-instruction-2407无法量化

kx5bkwkv  于 2个月前  发布在  其他
关注(0)|答案(9)|浏览(42)
(llm_venv_llamacpp) xlab@xlab:/mnt/Model/MistralAI/llm_llamacpp$ python convert_hf_to_gguf.py /mnt/Model/MistralAI/Mistral-Large-Instruct-2407 --outfile ../llm_quantized/mistral_large2_instruct_f16.gguf --outtype f16 --no-lazy
INFO:hf-to-gguf:Loading model: Mistral-Large-Instruct-2407
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 12288
INFO:hf-to-gguf:gguf: feed forward length = 28672
INFO:hf-to-gguf:gguf: head count = 96
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content'] %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set loop_messages = messages %}
{%- endif %}

{{- bos_token }}
{%- for message in loop_messages %}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
    {%- endif %}
    {%- if message['role'] == 'user' %}
        {%- if loop.last and system_message is defined %}
            {{- '[INST] ' + system_message + '\n\n' + message['content'] + '[/INST]' }}
        {%- else %}
            {{- '[INST] ' + message['content'] + '[/INST]' }}
        {%- endif %}
    {%- elif message['role'] == 'assistant' %}
        {{- ' ' + message['content'] + eos_token}}
    {%- else %}
        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}
    {%- endif %}
{%- endfor %}

INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:../llm_quantized/mistral_large2_instruct_f16.gguf: n_tensors = 0, total_size = negligible - metadata only
Writing: 0.00byte [00:00, ?byte/s]
INFO:hf-to-gguf:Model successfully exported to ../llm_quantized/mistral_large2_instruct_f16.gguf
fykwrbwg

fykwrbwg1#

@17Reset 你能试试 python install -r requirements/requirements-convert_hf_to_gguf.txt 吗?以下命令对我有效。
另外要注意,你应该排除合并后的 safetensor 文件(即 huggingface-cli download mistralai/Mistral-Large-Instruct-2407 --exclude '*consolidated*' ),否则你的下载文件夹中会有两次该模型。

$ python convert_hf_to_gguf.py /sdc1/huggingface/hub/models--mistralai--Mistral-Large-Instruct-2407/snapshots/5c9ce5b5f7a7ad62d03e8c66c719b66d586de26b/ --outfile /md0/models/mistralai/ggml-mistral-large-instruct-2407-f16.gguf --outtype f16 --no-lazy
fnx2tebb

fnx2tebb2#

操作系统:Ubuntu 24.04
Python版本:Python 3.12.3
我使用以下脚本构建了llamacpp,它可以对其他模型(如Qwen2)进行量化,没有任何错误。然而,在对mistra-large-instruction-2407进行量化时出现了上述问题。

#! /bin/sh
uv venv llm_venv_llamacpp
source llm_venv_llamacpp/bin/activate
LLAMACPP_PATH="llm_llamacpp"
if [ -d "$LLAMACPP_PATH" ];  then
echo "The directory $LLAMACPP_PATH already exists."
echo "Delete this directory first."
rm -rf $LLAMACPP_PATH
else
echo "The directory $LLAMACPP_PATH does not exist."
echo "Create directory $LLAMACPP_PATH."
fi
git clone https://github.com/ggerganov/llama.cpp $LLAMACPP_PATH
cd $LLAMACPP_PATH
make LLAMA_CUDA=1 -j $(($(nproc)/2))
uv pip install --no-cache torch
uv pip install --no-cache numpy
uv pip install --no-cache sentencepiece
uv pip install --no-cache transformers
uv pip install --no-cache gguf
uv pip install --no-cache protobuf
cd ..

我使用git clone下载了Mistra-Large-Instruct-2407,以下是模型目录,在这个目录中您是否发现了您提到的问题?

(llm_venv_llamacpp) xlab@xlab:/mnt/Model/MistralAI/Mistral-Large-Instruct-2407$ ll
total 239476580
drwxrwxr-x 2 xlab xlab       4096 Jul 25 22:55 ./
drwxrwxr-x 6 xlab xlab       4096 Aug  1 15:40 ../
-rwxrwxr-x 1 xlab xlab        598 Jul 25 22:54 config.json*
-rwxrwxr-x 1 xlab xlab 4831913664 Jul 25 19:40 consolidated-00001-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 19:34 consolidated-00002-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 19:36 consolidated-00003-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 19:39 consolidated-00004-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 19:41 consolidated-00005-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 19:39 consolidated-00006-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 19:43 consolidated-00007-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410936 Jul 25 19:52 consolidated-00008-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 19:50 consolidated-00009-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 19:50 consolidated-00010-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 19:47 consolidated-00011-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 19:53 consolidated-00012-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 19:51 consolidated-00013-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938352 Jul 25 19:58 consolidated-00014-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 19:59 consolidated-00015-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 20:02 consolidated-00016-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 20:01 consolidated-00017-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:01 consolidated-00018-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 20:08 consolidated-00019-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410928 Jul 25 20:07 consolidated-00020-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 20:10 consolidated-00021-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:11 consolidated-00022-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 20:11 consolidated-00023-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 20:17 consolidated-00024-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 20:16 consolidated-00025-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:19 consolidated-00026-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:23 consolidated-00027-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 20:19 consolidated-00028-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 20:23 consolidated-00029-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:27 consolidated-00030-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 20:26 consolidated-00031-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410936 Jul 25 20:30 consolidated-00032-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747744 Jul 25 20:29 consolidated-00033-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:31 consolidated-00034-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 20:36 consolidated-00035-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 20:40 consolidated-00036-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 20:37 consolidated-00037-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:37 consolidated-00038-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:40 consolidated-00039-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 20:45 consolidated-00040-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 20:46 consolidated-00041-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:47 consolidated-00042-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 20:44 consolidated-00043-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 20:52 consolidated-00044-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747744 Jul 25 20:52 consolidated-00045-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:52 consolidated-00046-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938368 Jul 25 20:56 consolidated-00047-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4907410944 Jul 25 20:58 consolidated-00048-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4806747752 Jul 25 20:57 consolidated-00049-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 4831938360 Jul 25 20:58 consolidated-00050-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab 3019948640 Jul 25 20:57 consolidated-00051-of-00051.safetensors*
-rwxrwxr-x 1 xlab xlab      64160 Jul 25 22:54 consolidated.safetensors.index.json*
-rwxrwxr-x 1 xlab xlab        111 Jul 25 22:54 generation_config.json*
-rwxrwxr-x 1 xlab xlab       1519 Jul 25 22:54 .gitattributes*
-rwxrwxr-x 1 xlab xlab      65559 Jul 25 22:54 model.safetensors.index.json*
-rwxrwxr-x 1 xlab xlab        204 Jul 25 22:54 params.json*
-rwxrwxr-x 1 xlab xlab       8489 Jul 25 22:54 README.md*
-rwxrwxr-x 1 xlab xlab       1073 Jul 25 22:54 test.py*
-rwxrwxr-x 1 xlab xlab     138635 Aug  1 13:26 tokenizer_config.json*
-rwxrwxr-x 1 xlab xlab    1962424 Jul 25 22:54 tokenizer.json*
-rwxrwxr-x 1 xlab xlab     587583 Jul 25 22:54 tokenizer.model*
-rwxrwxr-x 1 xlab xlab     587583 Jul 25 22:54 tokenizer.model.v3*
jbose2ul

jbose2ul3#

我看到你使用了consolidated-*-of-*.safetensors,但是我使用了model-*-of-*.safetensors。对于为什么他们有两套文件,这对我来说仍然不清楚。

rpppsulh

rpppsulh4#

非常感谢。也许这就是原因。我将测试模型*-*.safetensor。

n3h0vuf2

n3h0vuf25#

我看到你使用了consolidated-*-of-*.safetensors,但是我使用了model-*-of-*.safetensors。这仍然让我不清楚为什么他们有两套文件。
@dranger003 我的猜测是,合并后的safetensors文件是为使用mistral_inference包进行推理而准备的(在README.md的Mistral推理部分有一个模式,限制下载仅限于这些文件),而model-* safetensors文件可能仅用于使用transformers库进行推理。

p8ekf7hl

p8ekf7hl6#

我看到你使用了consolidated-*-of-*.safetensors,但是我使用了model-*-of-*.safetensors。这仍然让我不明白为什么他们有两套文件。
@dranger003 我的猜测是,consolidated-* safetensors文件是用于使用mistral_inference包进行推理的(在README.md的Mistral推理部分有一个模式,限制下载到这些文件),而model-* safetensors文件可能只用于使用transformers库进行推理。
llamacpp的gguf格式的量化是否与transformers绑定?

baubqpgj

baubqpgj7#

那这和
我看到你使用了model-*-of-*.safetensors,但是我用了consolidated-*-of-*.safetensors。对于为什么他们有两套文件仍然不清楚。
@dranger003 我的猜测是,consolidated-* safetensors文件是为使用mistral_inference包进行推理而准备的(在README.md中的Mistral推理部分有一个模式,限制下载这些文件),而model-* safetensors文件可能只用于使用transformers库进行推理。
llamacpp的gguf格式的量化是否与transformers绑定?
@17Reset 不是量化,而是模型转换 - 脚本期望模型以transformers库使用的格式出现。
模型转换脚本convert_hf_to_gguf.py寻找名称中带有"model"前缀的safetensors文件:

self.part_names = Model.get_model_part_names(self.dir_model, "model", ".safetensors")

你只有以"consolidated"开头的文件,所以它没有找到任何文件。此外,它期望文件中的Tensor具有特定的名称,consolidated文件中的Tensor名称与model文件中的Tensor名称不同,所以我不确定脚本是否能够即使你重命名了文件也能将它们转换。

fv2wmkja

fv2wmkja8#

那这和
我看到你使用了model-*-of-*.safetensors,但是我用了consolidated-*-of-*.safetensors。对于为什么他们有两套文件仍然不清楚。
@dranger003 我的猜测是,consolidated-* safetensors文件是为使用mistral_inference包进行推理而准备的(在README.md中的Mistral推理部分有一个模式,限制下载这些文件),而model-* safetensors文件可能只用于使用transformers库进行推理。
llamacpp的gguf格式的量化是否与transformers绑定?
@17Reset 不是量化,而是模型转换 - 脚本期望模型以transformers库使用的格式出现。
模型转换脚本convert_hf_to_gguf.py寻找名称中带有"model"前缀的safetensors文件:

self.part_names = Model.get_model_part_names(self.dir_model, "model", ".safetensors")

你只有以"consolidated"开头的文件,所以它没有找到任何文件。此外,它期望文件中的Tensor具有特定的名称,consolidated文件中的Tensor名称与model文件中的Tensor名称不同,所以我不确定脚本是否能够即使你重命名了文件也能将它们转换。
我明白了。非常感谢。

628mspwn

628mspwn9#

值得注意的是,合并后的文件与convert_hf_to_gguf.py一起为较小的木星模型工作,我忘了那是哪个版本了。
我希望他们能更清楚地说明它们是什么。

相关问题