vllm 基准测试脚本没有限制最大并发数,

y3bcpkx1  于 6个月前  发布在  其他
关注(0)|答案(3)|浏览(58)

如果当前的基准脚本指定了INF到达次数,那么它将不会像这里显示的那样限制最大并发级别。
如果我们将其更改为以下内容,我们就可以限制最大并发数以达到精细控制的负载水平。

semaphore = asyncio.Semaphore(max_concurrency)  # Semaphore to limit concurrency

    async def make_request(request, sem):
        async with sem:  # Ensure only max_concurrency tasks run in parallel
            prompt, prompt_len, output_len = request
            request_func_input = RequestFuncInput(
                model=model_id,
                prompt=prompt,
                api_url=api_url,
                prompt_len=prompt_len,
                output_len=output_len,
                best_of=best_of,
                use_beam_search=use_beam_search,
            )
            # Call the request function directly here and return its result
            return await request_func(request_func_input=request_func_input, pbar=pbar)

    tasks = []
    for request in input_requests:  # Direct iteration may replace async iteration based on design
        # Enqueue task without immediately awaiting it
        tasks.append(make_request(request, semaphore))
        # Manage inter-arrival time
        if request_rate != float("inf"):
            await asyncio.sleep(1.0 / request_rate)

    outputs = await asyncio.gather(*tasks)  # Wait for all tasks to complete
7xllpg7q

7xllpg7q1#

好的点子。公关欢迎!

tv6aics1

tv6aics12#

请纠正我如果理解错误,但这是用于测试在模型部署上层有并发控制的情况吗?

h22fl7wq

h22fl7wq3#

我最近做了#3194,以添加前缀缓存基准测试 - @wangchen615 如果你想让我在那个PR中也包含解决这个问题的更改,请告诉我!

相关问题