inference BUG: qwen1.5 gptq int8 errored

cgh8pdjw  于 4个月前  发布在  其他
关注(0)|答案(2)|浏览(29)

描述bug

清晰简洁地描述bug是什么。

重现bug

为了帮助我们重现这个bug,请提供以下信息:

  1. 你的Python版本。
  2. 你使用的xinference的版本。
  3. 关键包的版本。
  4. 错误的完整堆栈信息。
  5. 最小化重现错误的代码。
2024-02-28 03:45:45,757 xinference.api.restful_api 188628 ERROR    Chat completion stream got an error: [address=0.0.0.0:43203, pid=188725] probability tensor contains either `inf`, `nan` or element < 0
Traceback (most recent call last):
  File "/new_data2/xuyeqin-data/projects/inference/xinference/api/restful_api.py", line 1257, in stream_results
    async for item in iterator:
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/backends/pool.py", line 657, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
    return await coro
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/api.py", line 431, in __xoscar_next__
    raise e
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/api.py", line 417, in __xoscar_next__
    r = await asyncio.to_thread(_wrapper, gen)
    ^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
      ^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/xoscar/api.py", line 402, in _wrapper
    return next(_gen)
  File "/new_data2/xuyeqin-data/projects/inference/xinference/core/model.py", line 257, in _to_json_generator
    for v in gen:
  File "/new_data2/xuyeqin-data/projects/inference/xinference/model/llm/utils.py", line 470, in _to_chat_completion_chunks
    for i, chunk in enumerate(chunks):
    ^^^^^^^^^^^^^^^^^
  File "/new_data2/xuyeqin-data/projects/inference/xinference/model/llm/pytorch/core.py", line 253, in generator_wrapper
    for completion_chunk, completion_usage in generate_stream(
    ^^^^^^^^^^^^^^^^^
  File "/home/xuyeqin/miniconda3/miniconda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
    ^^^^^^^^^^^^^^^^^
  File "/new_data2/xuyeqin-data/projects/inference/xinference/model/llm/pytorch/utils.py", line 214, in generate_stream
    indices = torch.multinomial(probs, num_samples=2)
    ^^^^^^^^^^^^^^^^^
RuntimeError: [address=0.0.0.0:43203, pid=188725] probability tensor contains either `inf`, `nan` or element < 0

预期行为

清晰简洁地描述你期望会发生什么。

其他上下文

在这里添加关于问题的其他上下文信息。

nwsw7zdq

nwsw7zdq1#

qwen1.5 gptq int8在torch == 2.1.2版本下工作正常,但在torch == 2.2.0版本下出现错误。

相关问题