ollama gemma2:9b-instruct-fp16内存不足

hgb9j2n6  于 2个月前  发布在  其他
关注(0)|答案(3)|浏览(60)

问题是什么?

我更新了Ollama到v0.2.1版本,并拉取了gemma2:9b-instruct-fp16。当我运行ollama run gemma2:9b-instruct-fp16时,它无法运行。
硬件
CPU:Intel Core i9 14900K
RAM:96 GB
GPU:Nvidia 4017 TI Super (16 GB)
Gemma2应该在CPU和GPU之间分配,因为它大约有18 GB。我可以运行gemma2:9b-instruct-q8_0和其他标签。
这是服务器日志

Jul 11 16:03:04 quorra ollama[1370859]: [GIN] 2024/07/11 - 16:03:04 | 200 |      44.854µs |       127.0.0.1 | HEAD     "/"
Jul 11 16:03:04 quorra ollama[1370859]: [GIN] 2024/07/11 - 16:03:04 | 200 |    35.54641ms |       127.0.0.1 | POST     "/api/show"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.690Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:04 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.808Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:04 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.820Z level=DEBUG source=memory.go:101 msg=evaluating library=cuda gpu_count=1 available="[15.4 GiB]"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.821Z level=DEBUG source=sched.go:251 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.821Z level=DEBUG source=memory.go:101 msg=evaluating library=cuda gpu_count=1 available="[15.4 GiB]"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.821Z level=DEBUG source=memory.go:101 msg=evaluating library=cuda gpu_count=1 available="[15.4 GiB]"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.821Z level=DEBUG source=memory.go:101 msg=evaluating library=cuda gpu_count=1 available="[15.4 GiB]"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=memory.go:101 msg=evaluating library=cuda gpu_count=1 available="[15.4 GiB]"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=server.go:98 msg="system memory" total="94.0 GiB" free=96324423680
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=memory.go:101 msg=evaluating library=cuda gpu_count=1 available="[15.4 GiB]"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=43 layers.offload=34 layers.split="" memory.available="[15.4 GiB]" memory.required.full="19.9 GiB" memory.required.partial="15.1 GiB" memory.required.kv="672.0 MiB" memory.required.allocations="[15.1 GiB]" memory.weights.total="16.2 GiB" memory.weights.repeating="14.5 GiB" memory.weights.nonrepeating="1.7 GiB" memory.graph.full="507.0 MiB" memory.graph.partial="1.2 GiB"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cpu/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cpu_avx/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cpu_avx2/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cuda_v11/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/rocm_v60101/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cpu/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cpu_avx/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cpu_avx2/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/cuda_v11/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1926017730/runners/rocm_v60101/ollama_llama_server
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=INFO source=server.go:375 msg="starting llama server" cmd="/tmp/ollama1926017730/runners/cuda_v11/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 34 --verbose --parallel 1 --port 38287"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=DEBUG source=server.go:390 msg=subprocess environment="[PATH=/home/mark/.vscode-server/cli/servers/Stable-ea1445cc7016315d0f5728f8e8b12a45dc0a7286/server/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/tmp/ollama1926017730/runners/cuda_v11:/tmp/ollama1926017730/runners CUDA_VISIBLE_DEVICES=GPU-007c9d9a-8177-bd6f-7654-45652102b937]"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=INFO source=sched.go:474 msg="loaded runners" count=1
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.822Z level=INFO source=server.go:563 msg="waiting for llama runner to start responding"
Jul 11 16:03:04 quorra ollama[1370859]: time=2024-07-11T16:03:04.823Z level=INFO source=server.go:604 msg="waiting for server to become available" status="llm server error"
Jul 11 16:03:04 quorra ollama[1414672]: INFO [main] build info | build=1 commit="a8db2a9" tid="140058610085888" timestamp=1720713784
Jul 11 16:03:04 quorra ollama[1414672]: INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | " tid="140058610085888" timestamp=1720713784 total_threads=32
Jul 11 16:03:04 quorra ollama[1414672]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="38287" tid="140058610085888" timestamp=1720713784
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: loaded meta data with 29 key-value pairs and 464 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d (version GGUF V3 (latest))
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   0:                       general.architecture str              = gemma2
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   1:                               general.name str              = gemma-2-9b-it
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   2:                      gemma2.context_length u32              = 8192
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   3:                    gemma2.embedding_length u32              = 3584
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   4:                         gemma2.block_count u32              = 42
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   5:                 gemma2.feed_forward_length u32              = 14336
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   6:                gemma2.attention.head_count u32              = 16
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   7:             gemma2.attention.head_count_kv u32              = 8
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   8:    gemma2.attention.layer_norm_rms_epsilon f32              = 0.000001
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv   9:                gemma2.attention.key_length u32              = 256
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  10:              gemma2.attention.value_length u32              = 256
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  11:                          general.file_type u32              = 1
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  12:              gemma2.attn_logit_softcapping f32              = 50.000000
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  13:             gemma2.final_logit_softcapping f32              = 30.000000
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  14:            gemma2.attention.sliding_window u32              = 4096
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = llama
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = default
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,256000]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  18:                      tokenizer.ggml.scores arr[f32,256000]  = [0.000000, 0.000000, 0.000000, 0.0000...
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  20:                tokenizer.ggml.bos_token_id u32              = 2
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  21:                tokenizer.ggml.eos_token_id u32              = 1
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  22:            tokenizer.ggml.unknown_token_id u32              = 3
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  23:            tokenizer.ggml.padding_token_id u32              = 0
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  24:               tokenizer.ggml.add_bos_token bool             = true
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  25:               tokenizer.ggml.add_eos_token bool             = false
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  26:                    tokenizer.chat_template str              = {{ bos_token }}{% if messages[0]['rol...
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  27:            tokenizer.ggml.add_space_prefix bool             = false
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - kv  28:               general.quantization_version u32              = 2
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - type  f32:  169 tensors
Jul 11 16:03:04 quorra ollama[1370859]: llama_model_loader: - type  f16:  295 tensors
Jul 11 16:03:04 quorra ollama[1370859]: llm_load_vocab: special tokens cache size = 364
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_vocab: token to piece cache size = 1.6014 MB
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: format           = GGUF V3 (latest)
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: arch             = gemma2
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: vocab type       = SPM
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_vocab          = 256000
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_merges         = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: vocab_only       = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_ctx_train      = 8192
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_embd           = 3584
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_layer          = 42
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_head           = 16
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_head_kv        = 8
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_rot            = 256
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_swa            = 4096
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_embd_head_k    = 256
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_embd_head_v    = 256
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_gqa            = 2
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_embd_k_gqa     = 2048
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_embd_v_gqa     = 2048
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_ff             = 14336
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_expert         = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_expert_used    = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: causal attn      = 1
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: pooling type     = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: rope type        = 2
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: rope scaling     = linear
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: freq_base_train  = 10000.0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: freq_scale_train = 1
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: n_ctx_orig_yarn  = 8192
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: rope_finetuned   = unknown
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: ssm_d_conv       = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: ssm_d_inner      = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: ssm_d_state      = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: ssm_dt_rank      = 0
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: model type       = 9B
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: model ftype      = F16
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: model params     = 9.24 B
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: model size       = 17.22 GiB (16.00 BPW)
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: general.name     = gemma-2-9b-it
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: BOS token        = 2 '<bos>'
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: EOS token        = 1 '<eos>'
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: UNK token        = 3 '<unk>'
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: PAD token        = 0 '<pad>'
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: LF token         = 227 '<0x0A>'
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: EOT token        = 107 '<end_of_turn>'
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_print_meta: max token length = 93
Jul 11 16:03:05 quorra ollama[1370859]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
Jul 11 16:03:05 quorra ollama[1370859]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Jul 11 16:03:05 quorra ollama[1370859]: ggml_cuda_init: found 1 CUDA devices:
Jul 11 16:03:05 quorra ollama[1370859]:   Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes
Jul 11 16:03:05 quorra ollama[1370859]: time=2024-07-11T16:03:05.073Z level=INFO source=server.go:604 msg="waiting for server to become available" status="llm server loading model"
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_tensors: ggml ctx size =    0.41 MiB
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_tensors: offloading 34 repeating layers to GPU
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_tensors: offloaded 34/43 layers to GPU
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_tensors:        CPU buffer size = 17628.31 MiB
Jul 11 16:03:05 quorra ollama[1370859]: llm_load_tensors:      CUDA0 buffer size = 12853.86 MiB
Jul 11 16:03:05 quorra ollama[1370859]: time=2024-07-11T16:03:05.574Z level=DEBUG source=server.go:615 msg="model load progress 0.41"
Jul 11 16:03:05 quorra ollama[1370859]: time=2024-07-11T16:03:05.825Z level=DEBUG source=server.go:615 msg="model load progress 0.64"
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.076Z level=DEBUG source=server.go:615 msg="model load progress 0.88"
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: n_ctx      = 2048
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: n_batch    = 512
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: n_ubatch   = 512
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: flash_attn = 0
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: freq_base  = 10000.0
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: freq_scale = 1
Jul 11 16:03:06 quorra ollama[1370859]: llama_kv_cache_init:  CUDA_Host KV buffer size =   128.00 MiB
Jul 11 16:03:06 quorra ollama[1370859]: llama_kv_cache_init:      CUDA0 KV buffer size =   544.00 MiB
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: KV self size  =  672.00 MiB, K (f16):  336.00 MiB, V (f16):  336.00 MiB
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model:  CUDA_Host  output buffer size =     0.99 MiB
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model:      CUDA0 compute buffer size =  2257.00 MiB
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model:  CUDA_Host compute buffer size =    16.01 MiB
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: graph nodes  = 1690
Jul 11 16:03:06 quorra ollama[1370859]: llama_new_context_with_model: graph splits = 108
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.328Z level=DEBUG source=server.go:615 msg="model load progress 1.00"
Jul 11 16:03:06 quorra ollama[1370859]: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
Jul 11 16:03:06 quorra ollama[1370859]:   current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826
Jul 11 16:03:06 quorra ollama[1370859]:   cublasCreate_v2(&cublas_handles[device])
Jul 11 16:03:06 quorra ollama[1370859]: GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.745Z level=INFO source=server.go:604 msg="waiting for server to become available" status="llm server error"
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.745Z level=DEBUG source=server.go:618 msg="model load completed, waiting for server to become available" status="llm server error"
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.996Z level=ERROR source=sched.go:480 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED\n  current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826\n  cublasCreate_v2(&cublas_handles[device])\nGGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !\"CUDA error\""
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.996Z level=DEBUG source=sched.go:483 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.996Z level=DEBUG source=sched.go:384 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.996Z level=DEBUG source=sched.go:400 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:06 quorra ollama[1370859]: [GIN] 2024/07/11 - 16:03:06 | 500 |  2.335390037s |       127.0.0.1 | POST     "/api/chat"
Jul 11 16:03:06 quorra ollama[1370859]: time=2024-07-11T16:03:06.996Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:06 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.118Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:07 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.118Z level=DEBUG source=server.go:1026 msg="stopping llama server"
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.118Z level=DEBUG source=sched.go:405 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.369Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:07 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.482Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:07 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.619Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:07 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.730Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:07 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.868Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:07 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:07 quorra ollama[1370859]: time=2024-07-11T16:03:07.978Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:07 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.119Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:08 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.266Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:08 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.369Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:08 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.474Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:08 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.619Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:08 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.724Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:08 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.869Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:08 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:08 quorra ollama[1370859]: time=2024-07-11T16:03:08.978Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:08 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.119Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:09 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.227Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:09 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.369Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:09 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.492Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:09 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.619Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:09 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.731Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:09 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.869Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:09 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:09 quorra ollama[1370859]: time=2024-07-11T16:03:09.979Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:09 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.119Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:10 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.228Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:10 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.369Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:10 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.478Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:10 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.619Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:10 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.730Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:10 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.869Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:10 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:10 quorra ollama[1370859]: time=2024-07-11T16:03:10.977Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:10 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.119Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:11 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.228Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:11 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.369Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:11 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.483Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:11 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.619Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:11 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.731Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:11 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.869Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:11 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:11 quorra ollama[1370859]: time=2024-07-11T16:03:11.975Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:11 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.119Z level=WARN source=sched.go:671 msg="gpu VRAM usage didn't recover within timeout" seconds=5.123350099 model=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.119Z level=DEBUG source=sched.go:409 msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.119Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.119Z level=DEBUG source=sched.go:332 msg="ignoring unload event with no pending requests"
Jul 11 16:03:12 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.234Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:12 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.368Z level=WARN source=sched.go:671 msg="gpu VRAM usage didn't recover within timeout" seconds=5.372295868 model=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.368Z level=DEBUG source=gpu.go:336 msg="updating system memory data" before.total="94.0 GiB" before.free="89.7 GiB" now.total="94.0 GiB" now.free="89.7 GiB"
Jul 11 16:03:12 quorra ollama[1370859]: CUDA driver version: 12.5
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.477Z level=DEBUG source=gpu.go:377 msg="updating cuda memory data" gpu=GPU-007c9d9a-8177-bd6f-7654-45652102b937 name="NVIDIA GeForce RTX 4070 Ti SUPER" before.total="15.6 GiB" before.free="15.4 GiB" now.total="15.6 GiB" now.free="15.4 GiB" now.used="217.2 MiB"
Jul 11 16:03:12 quorra ollama[1370859]: releasing cuda driver library
Jul 11 16:03:12 quorra ollama[1370859]: time=2024-07-11T16:03:12.619Z level=WARN source=sched.go:671 msg="gpu VRAM usage didn't recover within timeout" seconds=5.623279961 model=/usr/share/ollama/.ollama/models/blobs/sha256-76c95736fd1483b32c8ad704594349e92fa3ec947c8fea45942caa5bd28df08d
brgchamk

brgchamk1#

之前的Ollama版本也无法运行此标签。

apeeds0o

apeeds0o2#

我可以跑 gemma2:27b-instruct-fp16

>ollama ps
NAME                            ID              SIZE    PROCESSOR       UNTIL              
gemma2:27b-instruct-fp16        4a8f851205c5    58 GB   73%/27% CPU/GPU 4 minutes from now
r6l8ljro

r6l8ljro3#

升级到Ollama v0.2.4后,它仍然无法运行模型。

相关问题