text-generation-inference 支持Phi-3.5 MoE

ogq8wdun  于 22天前  发布在  其他
关注(0)|答案(1)|浏览(23)

功能请求

microsoft/Phi-3.5-MoE-instruct 添加对 PhiMoEForCausalLM 架构的支持。

动机

它在以下错误中失败:

2024-08-25 21:25:51.891 | INFO     | text_generation_server.utils.import_utils:<module>:75 - Detected system cuda
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 118, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 1064, in get_model
    raise NotImplementedError("sharded is not supported for AutoModel")

NotImplementedError: sharded is not supported for AutoModel
 rank=3
2024-08-25T21:25:56.550031Z ERROR text_generation_launcher: Shard 3 failed to start
2024-08-25T21:25:56.550058Z  INFO text_generation_launcher: Shutting down shards

你的贡献

我可以测试任何 PR

bmvo0sr5

bmvo0sr51#

感谢您报告此问题 @maziyarpanahi 👍
我们目前没有很多额外的带宽,但我们可能会优先添加这个模型。
另外,请注意,为了表示对某个模型的需求更多,在您的问题上获得点赞或类似的React是一个信号,让我们优先考虑某些事情 :)

相关问题