[Bug]:使用Triton服务器支持的vllm无法正常工作,

bn31dyow  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(29)

从日志中可以看到,问题出在async_llm_engine.py文件的第45行。具体错误信息如下:

Unable to get power limit for GPU 0. Status:Success, value:0.000000

这意味着无法获取GPU 0的电源限制。可能的原因是Triton服务器没有正确配置GPU支持。请检查Triton服务器的配置文件(例如config.pbtxt),确保其中包含以下内容:

name: "localhost"
platform: "nvidia_tesla_k80"
max_batch_size: 0
gpu_device_id: 0

如果问题仍然存在,请尝试更新Triton服务器和NVIDIA驱动程序到最新版本。
2024-07-23 19:51:51 ERROR 07-23 14:21:51 async_llm_engine.py:45] attn_metadata = self.attn_backend.make_metadata(
2024-07-23 19:51:51 File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/flash_attn.py", line 29, in make_metadata
2024-07-23 19:51:51 return FlashAttentionMetadata(*args, **kwargs)
2024-07-23 19:51:51 TypeError: FlashAttentionMetadata.init() got an unexpected keyword argument 'is_prompt'
2024-07-23 19:51:51 The above exception was the direct cause of the following exception:
2024-07-23 19:51:51
2024-07-23 19:51:51 Traceback (most recent call last):
2024-07-23 19:51:51 File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
2024-07-23 19:51:51 self._context.run(self._callback, *self._args)
2024-07-23 19:51:51 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _raise_exception_on_finish
2024-07-23 19:51:51 raise AsyncEngineDeadError(
2024-07-23 19:51:51 vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

ykejflvf

ykejflvf1#

我认为这应该是一个关于NVIDIA Triton团队的问题。在我看来,他们没有正确更新版本,可能是因为

FlashAttentionMetadata.init() got an unexpected keyword argument 'is_prompt'"

相关问题