inference 请问可以将lora纳入网页部署的选择内容吗？

wwodge7n 于 7个月前发布在其他

关注(0)|答案(3)|浏览(64)

我想在openai接口使用时，可以自由选择不同lora或者原版模型

inference

来源：https://github.com/xorbitsai/inference/issues/1105

3条答案

按热度按时间

mwg9r5ms1#

我希望在网页上部署类似效果。

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server 
  --trust-remote-code 
  --max-model-len 4096 
  --model /qwen/Qwen1.5-14B-Chat 
  --enable-lora 
  --lora-modules lora1=/lora/xxx lora2=~/lora/xxx
curl --request POST 
  --url [http://localhost:8000/v1/chat/completions](http://localhost:8000/v1/chat/completions) 
  --header 'content-type: application/json' 
  --data '{
  "model": "lora2",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "China is a"
    }
  ],
  "stop_token_ids": [151645, 151644, 151643],
  "max_tokens": 5,
  "temperature": 0.7
}'

赞(0）回复(0）举报 7个月前

brtdzjyr2#

@xs818818 和 @v0.9.2 支持了 LoRa 的集成，参考文档为：https://inference.readthedocs.io/zh-cn/latest/models/lora.html。然而，目前不对 LoRa 模型进行管理，用户需要自行下载并与 LLM image 模型一起启动。

赞(0）回复(0）举报 7个月前

7ivaypg93#

现在UI里如果填了lora config,应该如何请求呢？OpenAI的接口。

赞(0）回复(0）举报 7个月前