[Bug]:使用Triton服务器支持的vllm无法正常工作,

bn31dyow 于 2个月前发布在其他

关注(0)|答案(1)|浏览(29)

从日志中可以看到，问题出在async_llm_engine.py文件的第45行。具体错误信息如下：

Unable to get power limit for GPU 0. Status:Success, value:0.000000

这意味着无法获取GPU 0的电源限制。可能的原因是Triton服务器没有正确配置GPU支持。请检查Triton服务器的配置文件(例如config.pbtxt),确保其中包含以下内容：

name: "localhost"
platform: "nvidia_tesla_k80"
max_batch_size: 0
gpu_device_id: 0

如果问题仍然存在，请尝试更新Triton服务器和NVIDIA驱动程序到最新版本。
2024-07-23 19:51:51 ERROR 07-23 14:21:51 async_llm_engine.py:45] attn_metadata = self.attn_backend.make_metadata(
2024-07-23 19:51:51 File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/flash_attn.py", line 29, in make_metadata
2024-07-23 19:51:51 return FlashAttentionMetadata(*args, **kwargs)
2024-07-23 19:51:51 TypeError: FlashAttentionMetadata.init() got an unexpected keyword argument 'is_prompt'
2024-07-23 19:51:51 The above exception was the direct cause of the following exception:
2024-07-23 19:51:51
2024-07-23 19:51:51 Traceback (most recent call last):
2024-07-23 19:51:51 File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
2024-07-23 19:51:51 self._context.run(self._callback, *self._args)
2024-07-23 19:51:51 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _raise_exception_on_finish
2024-07-23 19:51:51 raise AsyncEngineDeadError(
2024-07-23 19:51:51 vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

vllm

来源：https://github.com/vllm-project/vllm/issues/6697

1条答案

按热度按时间

ykejflvf1#

我认为这应该是一个关于NVIDIA Triton团队的问题。在我看来，他们没有正确更新版本，可能是因为

FlashAttentionMetadata.init() got an unexpected keyword argument 'is_prompt'"

赞(0）回复(0）举报 2个月前

我来回答

[Bug]:使用Triton服务器支持的vllm无法正常工作,

1条答案

相关问题

热门标签

最新问答