vllm [Bug]: 多LoRa请求的Bug

0lvr5msh  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(22)
server
`vllm serve qwen/Qwen2-7B-Instruct --port 8000 --served-model-name gpt-3.5-turbo --disable-log-stats --tensor-parallel-size 4 --gpu-memory-utilization 0.25 --enable-lora --lora-modules lora1=saves/qwen2_lora1/lora/sft lora2=saves/qwen2_lora2/lora/sft`
client(client.py)

from openai import OpenAI
import sys
port=sys.argv[1] if len(sys.argv) > 1 else 8000
model=sys.argv[2] if len(sys.argv) > 2 else "gpt-3.5-turbo"

api_base = f'localhost:{port}/v1'
client = OpenAI(base_url=api_base, api_key="xxx")

while True:
import time
start = time.time()
# 调用 chat.completions.create 方法并启用流式接口
response = client.chat.completions.create(
model=model,
messages=[
# {"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "讲一个微小说,500字以内。"}
],
stream=True # 启用流式接口
)

# 逐步接收数据
idx = 0
for chunk in response:
    end = time.time()
    if idx == 0:
        print(f"time:{end-start}")
        print(chunk.choices[0].delta.content)
    start = time.time()
    idx += 1
time.sleep(.1)

step
  1. start server
  2. start two client
    • python client.py 8000 lora1
    • python client.py 8000 lora2
  3. stop one client

bug log
ERROR: Exception in ASGI application
Traceback ( most recent call last ):
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in **call
return await self.app(scope, receive, send)
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in **call
await super().**call(scope, receive, send)
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/starlette/applications.py", line 123, in **call
await self.middleware_stack(scope, receive, send)
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in **call
raise exc
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in **call
await self.app(scope, receive, _send)
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/starlette/middleware/cors.py", in **call
await self.app(scope, receive, send)
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/starlette/middleware/exceptions.py", in **call
async def wrap_app_handling_exceptions(self, app, conn)(scope, receive, send):
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10/site-packages/starlette//_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/jovyan/conda-env/envs/xqh_vllm/lib/python3.10
在FastAPI应用中,出现了一个名为`AsyncEngineDeadError`的异常。这个异常是由于后台循环已经出错导致的。为了解决这个问题,你可以尝试以下方法:

1. 检查你的代码中是否有未捕获的异常,这可能导致后台循环出错。确保你已经正确处理了所有可能的异常。

2. 如果你使用了异步编程,确保你的代码是线程安全的。在多线程环境中运行时,可能会出现竞争条件或其他同步问题。你可以使用Python的`asyncio`库中的`Lock`或`Semaphore`等同步原语来解决这些问题。

3. 如果问题仍然存在,尝试升级或降级FastAPI和相关依赖库的版本,以确保它们与你的应用程序兼容。
v2g6jxz6

v2g6jxz61#

看起来与LoRA无关的错误。您是否可以使用离线推理来检查LoRA是否有问题?

相关问题