[Bug]: Gptcache服务器：使用OpenAI嵌入缓存似乎无法正常工作,

wwtsj6pe 于 2个月前发布在其他

关注(0)|答案(8)|浏览(59)

当前行为

当我使用以下命令启动gptcache服务器时：
python server.py (文件来自 https://github.com/zilliztech/GPTCache/tree/main/gptcache_server ) -s 0.0.0.0 -p 8000 -of gptcache.yml -o True
服务已经启动并运行。
现在从客户端程序中，当我提出一个请求时，例如：

用户问题是 "巴基斯坦总统",我得到一个正确的回答。但是在此之后，当我提出一个新的问题时，比如
"印度的首都？",这个问题的答案是从"巴基斯坦总统"的问题中获取的。它从缓存中检索。
无论我问多少新问题，答案都是从缓存中获取的。
这是发生在我在gptcache.yaml文件中设置嵌入 "openai" 时发生的情况。
此外，无论是否嵌入，语义缓存中也存在一个问题。
对于 "巴基斯坦总统" 和 "印度总统",答案都来自缓存。

预期行为

当客户端向运行中的gptcache服务器发出请求时，它应该检查缓存中是否有精确或类似的缓存条目，如果有，则答案应来自缓存，否则答案应来自openai,并将响应存储在缓存中。

重现步骤

1) Copy the server.py from the gptcache_Server foler into a dir you want. 
2) Configure gptcache.yaml file:
embedding:
    openai
embedding_config:
    # Set embedding model params here
storage_config:
    data_dir:
        /Users/swathinarayanan/tolka_feedback_sep/gptdocker/gptcache_server/gptcache_data
    manager:
        sqlite,faiss
    vector_params:
        # Set vector storage related params here
evaluation:
    distance
evaluation_config:
    # Set evaluation metric kws here
pre_function:
    last_content
post_function:
    first
config:
    similarity_threshold: 0.8
    # Set other config here

3) Start the server:

python server.py -s 0.0.0.0 -p 8000 -of gptcache.yml -o True

4) Create a client program or API call and make request to gptcache server.

Example program:

import requests
import json
import time

def call_chat_completions_endpoint(base_url, api_key, user_question):
    # Endpoint URL
    url = f"{base_url}/v1/chat/completions"
    
    # Headers including the authorization token
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {api_key}'
    }
    
    # Request payload
    payload = {
       
        'model': 'gpt-3.5-turbo',
        'messages': [{"role": "system", "content": "You are a helpful assistant."},
                     {'role': 'user', 'content': user_question}],
        'top_k': 10,
            
        
    }
    
    # Send POST request
    start_time = time.time()
    response = requests.post(url, headers=headers, data=json.dumps(payload))
    
    # Check if the request was successful
    if response.status_code == 200:
        # Process the successful response
        print("Success:", response.json())
        print("Time Consumed: {:.2f}s".format(time.time() - start_time))
    else:
        # Handle errors
        print(f"Error: {response.status_code}, Message: {response.text}")

# Example usage
if __name__ == "__main__":
    # Define the base URL of your FastAPI application
    BASE_URL = "http://localhost:8000"
    
    # Your API key for authorization (if needed)
    API_KEY = "****************************"
    
    # User question to be sent to the chat completions endpoint
    USER_QUESTION = "what is coral reef ?"
    
    call_chat_completions_endpoint(BASE_URL, API_KEY, USER_QUESTION)

环境

MAC OS
M1 chip

其他事项？

您提供的docker构建镜像不起作用，当运行docker时，会出现以下错误：
成功安装软件包：openai
Traceback (most recent call last):
File "/usr/local/bin/gptcache_server", line 5, in
from gptcache_server.server import main
File "/usr/local/lib/python3.8/site-packages/gptcache_server/server.py", line 8, in
from gptcache.adapter import openai
File "/usr/local/lib/python3.8/site-packages/gptcache/adapter/openai.py", line 31, in
class ChatCompletion(openai.ChatCompletion, BaseCacheLLM):
File "/usr/local/lib/python3.8/site-packages/openai/lib/_old_api.py", line 39, in call
raise APIRemovedInV1(symbol=self._symbol)
openai.lib._old_api.APIRemovedInV1:
您尝试访问openai.ChatCompletion,但这在openai>=1.0.0中不再受支持。请参阅 https://github.com/openai/openai-python 以了解API的README。
您可以运行 openai migrate 以自动升级您的代码库以使用1.0.0接口。
或者，您可以将安装固定到旧版本，例如 pip install openai==0.28

GPTCache

来源：https://github.com/zilliztech/GPTCache/issues/612

8条答案

按热度按时间

uyto3xhc1#

如您所发现的，缓存的效果实际上在很大程度上取决于嵌入的选择，因为嵌入模型是提取字符串语义的关键。
关于openai版本问题，我最近会更新gptcache并将openai版本升级到1.x。

赞(0）回复(0）举报 2个月前

wa7juj8i2#

@SimFG ,感谢你的回复。我明白你的意思。我的担忧是，无论选择哪种嵌入模型，基本功能都应该是：

用户发布一个问题，从openai获取答案。
用户发布另一个与第一个问题无关的新问题，由于第一个问题和第二个问题之间没有关联，调用应该发送到openai,并将答案作为响应发送给客户端程序。但我看到的是，第一个问题的答案被从缓存中取出并发送给客户端程序。

赞(0）回复(0）举报 2个月前

ui7jx7zq3#

@swatataidbuddy

因为决定两个问题是否相似的核心因素是嵌入模型的选择。如果你想要非常准确，你的模型必须非常大。同时，嵌入模型获得的数据实际上是不准确的来判断相似性，因为模型只能识别句子的大致组成，这意味着它无法识别由于一个单词而完全不同的整个句子的语义。

赞(0）回复(0）举报 2个月前

lsmepo6l4#

ok the behaviour seems to be very inconsistent , now when i retested , earlier issue i saw is not happening, but seeing a different issue, eventhough when i ask the same question multiple times , it never gets from cache, call goes to openai.
Pls refer below:
virtual_env) swathinarayanan@Swathis-MacBook-Air tolka_feedback_sep % /Users/swathinarayanan/virtual_env/bin/python /Users/swathinarayanan/tolka_feedback_sep/testgptcacheAPI.py
Success: {'id': 'chatcmpl-8zkFc8lCl72egNxbKRLHma66rAOmj', 'object': 'chat.completion', 'created': 1709726488, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "The President of India is the head of state and the highest constitutional office in India. The current President of India (as of September 2021) is Ram Nath Kovind. The President's role is largely ceremonial, but they have certain executive powers, such as the power to appoint the Prime Minister and dissolve the Parliament. The President is elected by an Electoral College consisting of the elected members of both houses of Parliament as well as the elected members of the Legislative Assemblies of the States."}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 20, 'completion_tokens': 98, 'total_tokens': 118}, 'system_fingerprint': 'fp_2b778c6b35'}

Time Consumed: 2.81s

(virtual_env) swathinarayanan@Swathis-MacBook-Air tolka_feedback_sep % /Users/swathinarayanan/virtual_env/bin/python /Users/swathinarayanan/tolka_feedback_sep/testgptcacheAPI.py
Success: {'id': 'chatcmpl-8zkILW9ogFbVXMOrycPdyIbgCRMES', 'object': 'chat.completion', 'created': 1709726657, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "The President of India is the head of state and the supreme commander of the Indian Armed Forces. The current President of India is Ram Nath Kovind, who has been in office since July 25, 2017. The President's role is largely ceremonial, representing the nation both domestically and internationally."}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 20, 'completion_tokens': 60, 'total_tokens': 80}, 'system_fingerprint': 'fp_b9d4cef803'}

Time Consumed: 2.40s

(virtual_env) swathinarayanan@Swathis-MacBook-Air tolka_feedback_sep % /Users/swathinarayanan/virtual_env/bin/python /Users/swathinarayanan/tolka_feedback_sep/testgptcacheAPI.py
Success: {'id': 'chatcmpl-8zkIPwwvfYTXqtJgZLVUjryomNwyE', 'object': 'chat.completion', 'created': 1709726661, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'The President of India is the head of state and the supreme commander of the Indian Armed Forces. The current President of India is Ram Nath Kovind, who took office on July 25, 2017. The President is elected by an electoral college consisting of the elected members of both houses of Parliament and the elected members of the Legislative Assemblies of the States. The President serves a term of five years and can be re-elected for a maximum of two terms. The role of the President is largely ceremonial, with executive powers being exercised by the Prime Minister and the Council of Ministers.'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 20, 'completion_tokens': 117, 'total_tokens': 137}, 'system_fingerprint': 'fp_2b778c6b35'}

Time Consumed: 3.28s

Earlier this was not the case, if my first question is "president of india" and second question , "what is coral reef", i will get the answer from the cache , derived for the first question.

赞(0）回复(0）举报 2个月前

u59ebvdq5#

你使用的是哪个版本的openai?

赞(0）回复(0）举报 2个月前

9bfwbjaz6#

版本：0.28.0

赞(0）回复(0）举报 2个月前

des4xlb07#

根据openai的返回值，这似乎不是openai 0.28,就像system_fingerprint一样。

赞(0）回复(0）举报 2个月前

ygya80vv8#

我也遇到了@swatataidbuddy同样的问题。有什么解决办法吗？