ollama 加载模型 "Xiaobu Embedding v2" 时出错:错误="llama runner进程已终止:信号:段错误(核心转储)"

qncylg1j  于 22天前  发布在  其他
关注(0)|答案(7)|浏览(17)

问题:什么问题?


#### What is the issue?

ollama-1 | time=2024-08-20T02:46:33.204Z level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=25 layers.offload=0 layers.split="" memory.available="[22.2 GiB]" memory.required.full="820.5 MiB" memory.required.partial="0 B" memory.required.kv="48.0 MiB" memory.required.allocations="[820.5 MiB]" memory.weights.total="625.2 MiB" memory.weights.repeating="584.0 MiB" memory.weights.nonrepeating="41.3 MiB" memory.graph.full="128.0 MiB" memory.graph.partial="128.0 MiB"
 ollama-1 | time=2024-08-20T02:46:33.206Z level=INFO source=server.go:393 msg="starting llama server" cmd="/tmp/ollama1960294902/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-85df6dbe02a3bfb67f24400c4d56ba8bd1a8a19a14450761b65ce17fe1d5064a --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --parallel 4 --port 46451"
 ollama-1 | time=2024-08-20T02:46:33.207Z level=INFO source=sched.go:445 msg="loaded runners" count=1
 ollama-1 | time=2024-08-20T02:46:33.207Z level=INFO source=server.go:593 msg="waiting for llama runner to start responding"
 ollama-1 | time=2024-08-20T02:46:33.207Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server error"
 ollama-1 | INFO [main] build info | build=1 commit="1e6f655" tid="127020122728320" timestamp=1724121993
 ollama-1 | INFO [main] system info | n_threads=16 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 " | tid="127020122728320" timestamp=1724121993 total_threads=32
 ollama-1 | INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="46451" tid="127020122728320" timestamp=1724121993

在这段文本中,我们可以看到一些关于 OLAMA(一个用于大规模预训练语言模型的高性能分布式推理框架)的信息。以下是翻译后的文本:

ollama-1 | llm_load_print_meta: n_layer = 24
 ollama-1 | llm_load_print_meta: n_head = 16
 ollama-1 | llm_load_print_meta: n_head_kv = 16
 ollama-1 | llm_load_print_meta: n_rot = 64
 ollama-1 | llm_load_print_meta: n_swa = 0
 ollama-1 | llm_load_print_meta: n_embd_head_k = 64
 ollama-1 | llm_load_print_meta: n_embd_head_v = 64
 ollama-1 | llm_load_print_meta: n_gqa = 1
 ollama-1 | llm_load_print_meta: n_embd_k_gqa = 1024
 ollama-1 | llm_load_print_meta: n_embd_v_gqa = 1024
 ollama-1 | llm_load_print_meta: f_norm_eps = 1.0e-12
 ollama-1 | llm_load_print_meta: f_norm_rms_eps = 0.0e+00
 ollama-1 | llm_load_print_meta: f_clamp_kqv = 0.0e+00
 ollama-1 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
 ollama-1 | llm_load_print_meta: f_logit_scale = 0.0e+00
 ollama-1 | llm_load_print_meta: n_ff = 4096
 ollama-1 | llm_load
bqjvbblv

bqjvbblv1#

你能提供你获取模型的链接吗?

nr7wwzry

nr7wwzry2#

你能提供你获取模型的链接吗?
https://ollama.com/search?q=xiaobu
我尝试了这两个模型,它们都报告了相同的错误。

7vux5j2d

7vux5j2d3#

感谢您提供的链接。我能够复现这个问题,我会及时通知您。

0ve6wy6x

0ve6wy6x4#

对于我来说,llama3.1在docker上也是一样的。

00jrzges

00jrzges5#

经过一些调查,这似乎是一个特定于这个模型(小布嵌入v2)的问题。出于某种原因,llama.cpp在大约50%的时间里访问inp_embd数据时发生段错误。不确定根本原因是什么。Tensor似乎已正确初始化。
你可能会在llama.cpp GitHub上找到一些解决方法

nqwrtyyt

nqwrtyyt6#

After some investigation, it seems to be an issue specific to this model (xiaobu embedding v2). For some reason, llama.cpp segfaults accessing inp_embd data around 50% of the time. Not sure what the root cause is. The tensor seems to be initialized correctly
You might have some luck cross posting this to the llama.cpp GitHub
Thanks for the reply, I'll try to submit this to the llama.cpp GitHub.

n3schb8v

n3schb8v7#

经过一些调查,这似乎是一个特定于这个模型(小布嵌入v2)的问题。出于某种原因,llama.cpp在大约50%的时间里访问inp_embd数据时发生段错误。不确定根本原因是什么。Tensor似乎已正确初始化。
你可能会在llama.cpp GitHub上找到一些解决方法。
安装了NVIDIA GPU后,它停止报告错误。

相关问题