如果当前的基准脚本指定了INF
到达次数,那么它将不会像这里显示的那样限制最大并发级别。
如果我们将其更改为以下内容,我们就可以限制最大并发数以达到精细控制的负载水平。
semaphore = asyncio.Semaphore(max_concurrency) # Semaphore to limit concurrency
async def make_request(request, sem):
async with sem: # Ensure only max_concurrency tasks run in parallel
prompt, prompt_len, output_len = request
request_func_input = RequestFuncInput(
model=model_id,
prompt=prompt,
api_url=api_url,
prompt_len=prompt_len,
output_len=output_len,
best_of=best_of,
use_beam_search=use_beam_search,
)
# Call the request function directly here and return its result
return await request_func(request_func_input=request_func_input, pbar=pbar)
tasks = []
for request in input_requests: # Direct iteration may replace async iteration based on design
# Enqueue task without immediately awaiting it
tasks.append(make_request(request, semaphore))
# Manage inter-arrival time
if request_rate != float("inf"):
await asyncio.sleep(1.0 / request_rate)
outputs = await asyncio.gather(*tasks) # Wait for all tasks to complete
3条答案
按热度按时间7xllpg7q1#
好的点子。公关欢迎!
tv6aics12#
请纠正我如果理解错误,但这是用于测试在模型部署上层有并发控制的情况吗?
h22fl7wq3#
我最近做了#3194,以添加前缀缓存基准测试 - @wangchen615 如果你想让我在那个PR中也包含解决这个问题的更改,请告诉我!