langchain _split_sources of BaseQAWithSourcesChain过早截断了Gemini模型的输出，在第一次出现[来源：xyz]的情况下,

sf6xfgos 于 5个月前发布在其他

关注(0)|答案(1)|浏览(92)

检查其他资源

为这个问题添加了一个非常描述性的标题。
使用集成搜索在LangChain文档中进行了搜索。
使用GitHub搜索找到了一个类似的问题，但没有找到。
我确信这是LangChain中的一个bug,而不是我的代码。
通过更新到LangChain的最新稳定版本(或特定集成包)无法解决此bug。

示例代码

chain = RetrievalQAWithSourcesChain(
                reduce_k_below_max_tokens=True,
                max_tokens_limit=16000, 
                combine_documents_chain=load_qa_with_sources_chain(
                    ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0, callbacks=[UsageHandler()]),
                    chain_type=self.chain_type, prompt=self.prompt),
                memory=self.memory, retriever=self.vector_db.as_retriever(search_kwargs={"k": 3]}))
                
result = chain.invoke()

错误信息和堆栈跟踪(如有适用)

无响应*

描述

结果['answer']中的当前输出

"Lorem ipsum ["

预期

"Lorem ipsum [Source: xyz1]
Lorem ipsum [Source: xyz2]
Lorem ipsum [Source: xyz3]"

这是模型的输出消息
通过在 on_llm_end 回调中检查响应来验证这一点

一些要点：

我有一个提示说它应该显示来源
我也在使用其他模型(GPT 3.5, Llama3 8B),只在使用 Gemini 1.5 Flash 时遇到这个问题，可能是因为这个格式提到了来源，而这目前是不支持的

langchain

来源：https://github.com/langchain-ai/langchain/issues/23932

1条答案

按热度按时间

6rqinv9w1#

在以下函数中，我遇到了一些正则表达式问题并进行了修复：

(Soruce: ....) / [Source: ....] 会导致 ( / [ 被截断
支持多个提及 (Soruce: ....) / [Source: ....]
有时答案的主要内容会在 abv. 模式的第一个示例之后被截断
在 Resources: 处进行截断

def split_sources_with_new_format(self, answer):
    # for pattern like (Soruce: ....) or [Source: ....]
    # This also takes care of premature truncate when the pattern repeats and placed in between the answer
    source_pattern = r'[\[\(]Source:[^\]\)]+[\]\)]'
    gem_sources = re.findall(source_pattern, answer)
    if gem_sources:
        answer = re.sub(source_pattern, '', answer)
        return answer, gem_sources

    # adding \b to not take words like resources
    if re.search(r"\bSOURCES?:", answer, re.IGNORECASE):
        answer, sources = re.split(
            r"\bSOURCES?:\s*|QUESTION:\s", answer, flags=re.IGNORECASE
        )[:2]
        sources = re.split(r"\n", sources)
    else:
        sources = ""
    return answer, sources
    
# save the original 
og_split_sources = RetrievalQAWithSourcesChain._split_sources

#patch
RetrievalQAWithSourcesChain._split_sources = split_sources_with_new_format

# get your desired result
chain = RetrievalQAWithSourcesChain(....)
result = chain.invoke(...)

# revert the patch
RetrievalQAWithSourcesChain._split_sources = og_split_sources

赞(0）回复(0）举报 5个月前

我来回答

langchain _split_sources of BaseQAWithSourcesChain过早截断了Gemini模型的输出，在第一次出现[来源：xyz]的情况下,

检查其他资源

示例代码

错误信息和堆栈跟踪(如有适用)

描述

结果['answer']中的当前输出

预期

1条答案

相关问题

热门标签

最新问答