vllm 基准测试脚本没有限制最大并发数,

y3bcpkx1 于 2个月前发布在其他

关注(0)|答案(3)|浏览(35)

如果当前的基准脚本指定了INF到达次数，那么它将不会像这里显示的那样限制最大并发级别。
如果我们将其更改为以下内容，我们就可以限制最大并发数以达到精细控制的负载水平。

semaphore = asyncio.Semaphore(max_concurrency)  # Semaphore to limit concurrency

    async def make_request(request, sem):
        async with sem:  # Ensure only max_concurrency tasks run in parallel
            prompt, prompt_len, output_len = request
            request_func_input = RequestFuncInput(
                model=model_id,
                prompt=prompt,
                api_url=api_url,
                prompt_len=prompt_len,
                output_len=output_len,
                best_of=best_of,
                use_beam_search=use_beam_search,
            )
            # Call the request function directly here and return its result
            return await request_func(request_func_input=request_func_input, pbar=pbar)

    tasks = []
    for request in input_requests:  # Direct iteration may replace async iteration based on design
        # Enqueue task without immediately awaiting it
        tasks.append(make_request(request, semaphore))
        # Manage inter-arrival time
        if request_rate != float("inf"):
            await asyncio.sleep(1.0 / request_rate)

    outputs = await asyncio.gather(*tasks)  # Wait for all tasks to complete

vllm

来源：https://github.com/vllm-project/vllm/issues/3127