text-generation-inference 选择错误的工具将导致服务器崩溃,

4ioopgfo 于 4个月前发布在其他

关注(0)|答案(1)|浏览(102)

系统信息

操作系统：

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

使用的模型： mistralai/Mistral-7B-Instruct-v0.3
硬件： 1 L4
尝试了最新版本的docker镜像。

信息

Docker
直接使用CLI

任务

一个官方支持的命令
我自己的修改

重现

使用以下命令启动服务器：

docker run --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:latest --model-id mistralai/Mistral-7B-Instruct-v0.3

然后发送以下调用：

import requests

conversation = [
    {"role": "user", "content": "What's the weather like in Paris?"},
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get▁the▁current▁weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the users location.",
                    },
                },
                "required": ["location", "format"],
            },
        },
    }
]

response = requests.post(
    url="http://localhost:8080/v1/chat/completions",
    json={
        "messages": conversation,
        "model": "mistralai/Mistral-7B-Instruct-v0.2",
        "temperature": 0.1,
        "tool_choice": "required",
        # "tool_prompt": "\"You will be presented with a JSON schema representing a set of tools.\nIf the user request lacks of sufficient information to make a precise tool selection: Do not invent any tool's properties, instead notify with an error message.\n\nJSON Schema:\n\"",
        "tools": tools,
        "max_tokens": 1000,
    },
)

错误：

(task, pid=12212) 2024-05-29T14:56:04.338119Z  INFO text_generation_router: router/src/main.rs:369: Connected
(task, pid=12212) 2024-05-29T14:56:04.338153Z  WARN text_generation_router: router/src/main.rs:383: Invalid hostname, defaulting to 0.0.0.0
(task, pid=12212) 2024-05-29T14:58:01.008313Z  INFO chat_completions{total_time="5.576392398s" validation_time="1.850855ms" queue_time="130.083µs" inference_time="5.574411606s" time_per_token="61.937906ms" seed="Some(14966871623831239824)"}: text_generation_router::server: router/src/server.rs:322: Success
(task, pid=12212) thread 'tokio-runtime-worker' panicked at router/src/infer.rs:407:44:
(task, pid=12212) Tool with name required not found
(task, pid=12212) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
(task, pid=12212) 2024-05-29T14:58:07.628698Z ERROR text_generation_launcher: Webserver Crashed
(task, pid=12212) 2024-05-29T14:58:07.629433Z  INFO text_generation_launcher: Shutting down shards
(task, pid=12212) 2024-05-29T14:58:07.631861Z  INFO shard-manager: text_generation_launcher: Terminating shard rank=0
(task, pid=12212) 2024-05-29T14:58:07.631937Z  INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0
(task, pid=12212) 2024-05-29T14:58:09.433647Z  INFO shard-manager: text_generation_launcher: shard terminated rank=0