ollama 无法获取信号量"错误="上下文已取消"

vbkedwbf  于 2个月前  发布在  其他
关注(0)|答案(3)|浏览(32)

问题是什么?

我正在使用AnythingLLM为RAG进行嵌入。我发现嵌入服务在几次调用中总是失败。每次错误日志都显示这个。我不确定为什么上下文被取消了。请帮忙解决一下。
以下是调试日志:

[GIN] 2024/06/11 - 11:09:07 | 200 |         2m28s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:07.830+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:07.830+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=119
DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=14500 tid="17876" timestamp=1718075347
DEBUG [update_slots] slot released | n_cache_tokens=52 n_ctx=2048 n_past=52 n_system_tokens=0 slot_id=0 task_id=14500 tid="17876" timestamp=1718075350 truncated=false
DEBUG [log_server_request] request | method="POST" params={} path="/embedding" remote_addr="127.0.0.1" remote_port=51069 status=200 tid="16400" timestamp=1718075350
DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=14503 tid="17876" timestamp=1718075350
DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=14504 tid="17876" timestamp=1718075350
[GIN] 2024/06/11 - 11:09:10 | 200 |         2m30s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:10.289+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:10.290+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=118
DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=14504 tid="17876" timestamp=1718075350
time=2024-06-11T11:09:11.910+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled"
time=2024-06-11T11:09:11.910+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=117
[GIN] 2024/06/11 - 11:09:11 | 500 |         2m32s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:11.911+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled"
time=2024-06-11T11:09:11.911+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/06/11 - 11:09:11 | 500 |         2m32s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=116
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=115
time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled"

操作系统

Windows

GPU

Nvidia

CPU

Intel

Ollama版本

0.1.41

yvt65v4c

yvt65v4c1#

我们应该改进日志消息,但是信号量用于跟踪并行请求。"context canceled"表示客户端放弃了等待请求被处理。
你将OLLAMA_NUM_PARALLEL设置为什么?当前的默认值是1,因此一次只能处理一个请求。

ecbunoof

ecbunoof2#

我们应该改进日志消息,但是信号量用于跟踪并行请求。"context canceled"表示客户端放弃了等待请求处理。
你将OLLAMA_NUM_PARALLEL设置为什么?当前默认值是1,因此一次只能处理一个请求。
感谢解释。OLLAMA_NUM_PARALLEL使用默认值1。
对于只有一个GPU的情况,可以增加OLLAMA_NUM_PARALLEL值吗?

b1payxdu

b1payxdu3#

并发/并行目前是实验性的(可选),但最终将默认启用。增加并行性将增加上下文大小,因此模型消耗的显存也会增加。您可以尝试不同的值并通过 ollama ps 查看影响,以找到适合您的模型和GPU能力的平衡点。

相关问题