检查其他资源
- 我为这个问题添加了一个非常描述性的标题。
- 我在LangChain文档中使用集成搜索进行搜索。
- 我使用GitHub搜索找到了一个类似的问题,但没有找到。
- 我确信这是LangChain中的一个bug,而不是我的代码。
- 通过更新到LangChain的最新稳定版本(或特定集成包)无法解决此bug。
示例代码
from transformers import AutoTokenizer
from langchain_huggingface import ChatHuggingFace
from langchain_huggingface import HuggingFaceEndpoint
import requests
sample = requests.get(
"https://raw.githubusercontent.com/huggingface/blog/main/langchain.md"
).text
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-70B-Instruct")
def n_tokens(text):
return len(tokenizer(text)["input_ids"])
print(f"The number of tokens in the sample is {n_tokens(sample)}")
llm_10 = HuggingFaceEndpoint(
repo_id="meta-llama/Meta-Llama-3-70B-Instruct",
max_new_tokens=10,
cache=False,
seed=123,
)
llm_4096 = HuggingFaceEndpoint(
repo_id="meta-llama/Meta-Llama-3-70B-Instruct",
max_new_tokens=4096,
cache=False,
seed=123,
)
messages = [
(
"system",
"You are a smart AI that has to describe a given text in to at least 1000 characters.",
),
("user", f"Summarize the following text:\n\n{sample}\n"),
]
# native endpoint
response_10_native = llm_10.invoke(messages)
print(f"Native response 10: {n_tokens(response_10_native)} tokens")
response_4096_native = llm_4096.invoke(messages)
print(f"Native response 4096: {n_tokens(response_4096_native)} tokens")
# make sure the native responses are different lengths
assert len(response_10_native) < len(
response_4096_native
), f"Native response 10 should be shorter than native response 4096, 10 `max_new_tokens`: {n_tokens(response_10_native)}, 4096 `max_new_tokens`: {n_tokens(response_4096_native)}"
# chat implementation from langchain_huggingface
chat_model_10 = ChatHuggingFace(llm=llm_10)
chat_model_4096 = ChatHuggingFace(llm=llm_4096)
# chat implementation for 10 tokens
response_10 = chat_model_10.invoke(messages)
print(f"Response 10: {n_tokens(response_10.content)} tokens")
actual_response_tokens_10 = response_10.response_metadata.get(
"token_usage"
).completion_tokens
print(
f"Actual response 10: {actual_response_tokens_10} tokens (always 100 for some reason!)"
)
# chat implementation for 4096 tokens
response_4096 = chat_model_4096.invoke(messages)
print(f"Response 4096: {n_tokens(response_4096.content)} tokens")
actual_response_tokens_4096 = response_4096.response_metadata.get(
"token_usage"
).completion_tokens
print(
f"Actual response 4096: {actual_response_tokens_4096} tokens (always 100 for some reason!)"
)
# assert that the responses are different lengths, which fails because the token usage is always 100
print("-" * 20)
print(f"Output for 10 tokens: {response_10.content}")
print("-" * 20)
print(f"Output for 4096 tokens: {response_4096.content}")
print("-" * 20)
assert len(response_10.content) < len(
response_4096.content
), f"Response 10 should be shorter than response 4096, 10 `max_new_tokens`: {n_tokens(response_10.content)}, 4096 `max_new_tokens`: {n_tokens(response_4096.content)}"
这是脚本的输出:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The number of tokens in the sample is 1809
Native response 10: 11 tokens
Native response 4096: 445 tokens
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Response 10: 101 tokens
Actual response 10: 100 tokens (always 100 for some reason!)
Response 4096: 101 tokens
Actual response 4096: 100 tokens (always 100 for some reason!)
--------------------
Output for 10 tokens: The text announces the launch of a new partner package called `langchain_huggingface` in LangChain, jointly maintained by Hugging Face and LangChain. This package aims to bring the power of Hugging Face's latest developments into LangChain and keep it up-to-date. The package was created by the community, and by becoming a partner package, the time it takes to bring new features from Hugging Face's ecosystem to LangChain's users will be reduced.
The package integrates seamlessly with Lang
--------------------
Output for 4096 tokens: The text announces the launch of a new partner package called `langchain_huggingface` in LangChain, jointly maintained by Hugging Face and LangChain. This package aims to bring the power of Hugging Face's latest developments into LangChain and keep it up-to-date. The package was created by the community, and by becoming a partner package, the time it takes to bring new features from Hugging Face's ecosystem to LangChain's users will be reduced.
The package integrates seamlessly with Lang
--------------------
错误消息和堆栈跟踪(如果适用)
AssertionError:响应10应该比响应4096短,10 max_new_tokens
: 101,4096 max_new_tokens
: 101
描述
在使用langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint
与langchain_huggingface.chat_models.huggingface.ChatHuggingFace
实现时似乎存在问题。
当仅使用HuggingFaceEndpoint
时,参数max_new_tokens
得到了正确实现,但在 Package 在ChatHuggingFace(llm=...)
内部时无法正常工作。后者的实现始终返回一个100个令牌的响应,在搜索了文档和源代码后仍无法正常工作。
我已使用meta-llama/Meta-Llama-3-70B-Instruct
创建了一个可复现的例子(因为该模型也支持无服务器)。
系统信息
系统信息
操作系统:Darwin
OS版本:Darwin内核版本23.5.0:Wed May 1 20:19:05 PDT 2024;root:xnu-10063.121.3~5/RELEASE_ARM64_T8112
Python版本:3.12.3 (main,Apr 9 2024,08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)]
软件包信息
langchain_core:0.2.10
langchain:0.2.6
langchain_community:0.2.5
langsmith:0.1.82
langchain_anthropic:0.1.15
langchain_aws:0.1.7
langchain_huggingface:0.0.3
langchain_openai:0.1.9
langchain_text_splitters:0.2.2
langchainhub:0.1.20
未安装的软件包(不一定是问题)
以下软件包未找到:
langgraph
langserve
4条答案
按热度按时间xxhby3vn1#
好的,你正在比较两个不同的事物。Huggingface推理客户端返回以下对象,该对象具有
usage
属性,其类型为ChatCompletionOutputUsage。ChatCompletionOutputUsage
有三种类型的令牌使用情况:completion_tokens
:完成提示所需的令牌数量。在你的情况中,这总是固定的,因为你调用相同的提示来完成。尝试其他内容,它应该会改变。prompt_tokens
:提示中的令牌数量。total_tokens
:completion_tokens
和prompt_tokens
的总和。因此,你通过
n_tokens
函数隐式地将total_tokens
与completion_tokens
进行比较,这是不正确的。你应该比较total_tokens
属性来进行正确的比较。P.S.我仔细检查了LangChain代码,并确保
ChatHuggingFace
在没有任何修改的情况下返回正确的ChatCompletionOutputUsage
。igetnqfo2#
好的,你正在比较两个不同的事物。Huggingface推理客户端返回以下对象,该对象具有
usage
属性,类型为ChatCompletionOutputUsage。ChatCompletionOutputUsage
有三种类型的令牌使用情况:completion_tokens
:完成提示所需的令牌数量。在你的情况中,这总是固定的,因为你调用相同的提示来完成。尝试其他内容,它应该会改变。prompt_tokens
:提示中的令牌数量。total_tokens
:completion_tokens
和prompt_tokens
的总和。所以,你是通过
total_tokens
函数隐式地将n_tokens
与completion_tokens
进行比较,这是不正确的。你应该比较total_tokens
属性来进行正确的比较。P.S.我仔细检查了LangChain代码,并确保
ChatHuggingFace
在没有任何修改的情况下返回正确的ChatCompletionOutputUsage
。我认为你误解了示例代码,
n_tokens()
函数是在链内容的输出上调用的,因此completion_tokens
==n_tokens(output)
- 1。减去的1是特殊的序列结束令牌(这就是为什么输出说有101个令牌,而不是100个的原因)。问题在于ChatCompletionOutputUsage.output_tokens
应该始终小于或等于max_new_tokens
,但无论如何提供的max_new_tokens
,都是100个令牌。qmb5sa223#
我遇到了同样的问题...你找到解决方案了吗?
oymdgrw74#
我遇到了同样的问题...你找到了解决方案吗?
没有,这个问题让我觉得整个Huggingface x Langchain实现已经过时了。我一直在尝试通过LlamaCpp/Ollama使用一个与OpenAI兼容的网络服务器来解决这个问题。