无法在Ollama Windows版本中加载Llava,

5lhxktic  于 2个月前  发布在  Windows
关注(0)|答案(4)|浏览(28)

问题是什么?

当我尝试加载'llava' AI模型时遇到了一个问题,但是像'Llama3'或'Phi3'这样的其他模型没有问题。以下是详细信息:

>>ollama run llava
Error: llama runner process no longer running: 1

server.log

...
clip_model_load: CLIP using CUDA backend
clip_model_load: text_encoder:   0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector:  1
clip_model_load: model size:     595.49 MB
clip_model_load: metadata size:  0.14 MB
clip_model_load: params backend buffer size =  595.49 MB (377 tensors)
cannot open model file for loading tensors
{"function":"load_model","level":"ERR","line":398,"model":"D:\\Llama\\models\\blobs\\sha256-72d6f08a42f656d36b356dbe0920675899a99ce21192fd66266fb7d82ed07539","msg":"unable to load clip model","tid":"162240","timestamp":1713942368}
time=2024-04-24T15:06:08.293+08:00 level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 1 "

操作系统

Windows

GPU

Nvidia

CPU

AMD

Ollama版本

0.1.32

xiozqbni

xiozqbni1#

2024-04-27 20:56:35 | 500 | 1.5841313s | 127.0.0.1 | POST "/api/chat"
2024-04-27 20:56:48 | 200 | 0s | 127.0.0.1 | HEAD "/"
2024-04-27 20:56:48 | 504.6μs | 523.9μs | 127.0.0.1 | POST "/api/show"
2024-04-27 20:56:48 | 504.6μs | 523.9μs | 127.0.0.1 | POST "/api/show"

xmjla07d

xmjla07d2#

0.1.33 的预发布版现已可用。我不确定这个问题的根本原因是什么,但我们在0.1.32版本的Windows上遇到了一些稳定性问题,所以这可能会得到解决。请尝试一下并告诉我们您的反馈。

zpjtge22

zpjtge223#

更新到0.1.33后,仍然无法运行。
[GIN] 2024/04/29 - 15:04:35 | 200 | 0s | 127.0.0.1 | POST "/api/show"
[GIN] 2024/04/29 - 15:04:35 | 200 | 614.6μs | 127.0.0.1 | POST "/api/show"
time=2024-04-29T15:04:35.577+08:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-29T15:04:35.620+08:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\Users\Elmin\AppData\Local\Programs\Ollama\cudart64_110.dll count=1
time=2024-04-29T15:04:35.620+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T15:04:36.634+08:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="15225.0 MiB" memory.required.full="5320.0 MiB" memory.required.partial="5320.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="3847.6 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="102.6 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="181.0 MiB"
2024年4月29日 15:04:50+08:00 日志级别:INFO
来源:memory.go:147
消息:offload to gpu
层数:real=-1,估计=41
内存可用:15225.0 MiB
内存所需全量:9812.5 MiB
内存所需部分:9812.5 MiB
内存所需键值:1600.0 MiB
权重总:6936.0 MiB
重复的权重:6807.8 MiB
非重复的权重:128.2 MiB
图的完整大小:204.0 MiB
图的部分大小:244.1 MiB

时间:2024-04-29T15:04:50.198+08:00 日志级别:INFO
来源:cpu_common.go:11
消息:CPU具有AVX2

时间:2024-04-29T15:04:50.213+08:00 日志级别:INFO
来源:server.go:290
消息:启动llama服务器
命令:C:\Users\Elmin\AppData\Local\Programs\Ollama\ollama_runners\cuda_v11.3\ollama_llama_server.exe --model D:\AI\语言模型\models\Repository\blobs\sha256-87d5b13e5157d3a67f8e10a46d8a846ec2b68c1f731e3dfe1546a585432b8fa0 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 41 --mmproj D:\AI\语言模型\models\Repository\blobs\sha256-42037f9f4c1b801eebaec1545ed144b8b0fa8259672158fb69c8c68f02cfe00c --parallel 1 --port 10111

时间:2024-04-29T15:04:50.218+08:00 日志级别:INFO
来源:sched.go:327
消息:加载运行器
计数:1

时间:2024-04-29T15:04:50.218+08:00 日志级别:INFO
来源:server.go:439
消息:等待llama运行器开始响应
{"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"34736","timestamp":1714374290}
{"build":2737,"commit":"46e12c4","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"34736","timestamp":1714374290}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":10,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LAMMAFILE = 1 | ","tid":"34736","timestamp":1714374290,"total_threads":20}
{"function":"load_model","level":"INFO","line":395,"msg":"Multi Modal Mode Enabled","tid":"34736","timestamp":1714374290}
ggml_cuda_init:GGML_CUDA_FORCE_MMQ:no
ggml_cuda_init:CUDA_USE_TENSOR_CORES:yes
ggml_cuda_init:found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
{"function":"load_model","level":"ERR","line":398,"model":"D:\AI\语言模型\models\Repository\blobs\sha256-42037f9f4c1b801eebaec

rjjhvcjd

rjjhvcjd4#

我还没有能够复现这个故障。我在Win 11系统上尝试过0.1.33版本,llava和llava:13b在Nvidia GPU上都能正常运行。

@PasserDreamer,升级到0.1.33版本是否解决了你的问题?

@Elminsst,较小的模型对你有用吗?有可能你的系统在加载模型时内存不足导致崩溃吗?你可以检查任务管理器,看看当发生这种情况时是否有其他事情在进行?也许一个防病毒产品正在终止进程?

相关问题