问题是什么?ollama run llava:34b write me a poem
Error: llama runner process has terminated: signal: aborted (core dumped) error loading model: unable to allocate backend buffer
llama_load_model_from_file: exception loading model
硬件
系统有2个独立的GPU:
- AMD RX 7600 XT (16 GB)
- nvidia 1050 TI (4 GB)
RAM: 48 GB
CPU: AMD 7600X
挣扎
我尝试操作 CUDA_VISIBLE_DEVICES
和 HIP_VISIBLE_DEVICES
环境变量。将其中一个设置为 -1
使 ollama 运行剩余的 GPU。
日志:
both.txt
amd_only.txt
nvidia_only.txt
part of both.txt
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15863.15 MiB on device 0: cudaMalloc failed: out of memory
llama_model_load: error loading model: unable to allocate backend buffer
llama_load_model_from_file: exception loading model
terminate called after throwing an instance of 'std::runtime_error'
what(): unable to allocate backend buffer
time=2024-07-20T10:45:15.144+02:00 level=ERROR source=sched.go:480 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) error loading model: unable to allocate backend buffer"
这是我尝试的每个模型都发生的情况,当两者都可用时,它们的大小大于16GB。
OS
Linux
GPU
Nvidia, AMD
CPU
AMD
Ollama版本
ollama版本是0.2.1
1条答案
按热度按时间kpbwa7wx1#
更新到0.2.7后,结果仍然相同。