llama_index [问题]:我们是否可以使用TokenCountingHandler处理不同的模型?

xvw2m8pv  于 2个月前  发布在  其他
关注(0)|答案(2)|浏览(21)

问题验证

  • 我已经在文档和discord上寻找答案。

问题

你好!
假设我使用一个多模态模型,如 gpt-4o,以及一个文本模型,如 gemini
我如何最好地使用 TokenCountingHandler 来监听来自每个模型的调用,并使用适当的分词器每次计算令牌?
似乎像这样声明两个处理程序:

mm_llm_token_counter = TokenCountingHandler(
    tokenizer=gpt_tokenizer,
    event_starts_to_ignore=[CBEventType.EMBEDDING],
    event_ends_to_ignore=[CBEventType.EMBEDDING],
)
llm_token_counter = TokenCountingHandler(
    tokenizer=gemini_tokenizer,
    event_starts_to_ignore=[CBEventType.EMBEDDING],
    event_ends_to_ignore=[CBEventType.EMBEDDING],
)

只会有一个处理程序记录调用。
非常感谢!

axzmvihb

axzmvihb1#

嘿,@paulpalmieri!我在这里帮助你解决在等待人工维护者时遇到的任何问题或疑问。让我们一起解决这个问题!
要使用 TokenCountingHandler 监听每个模型的电话并使用适当的分词器每次计算令牌数,你应该使用一个管理多个 CallbackManager 示例的单个 CallbackManager ,每个示例都用适当的分词器配置了每个模型。以下是如何实现这一点的示例:

import tiktoken
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.core import Settings

# Create token counters for different models
token_counter_gpt4o = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-4o").encode,
    verbose=False,
)

token_counter_gemini = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gemini").encode,
    verbose=False,
)

# Create a single CallbackManager with both token counters
callback_manager = CallbackManager([token_counter_gpt4o, token_counter_gemini])

Settings.callback_manager = callback_manager

document = SimpleDirectoryReader("./data").load_data()

# Use the appropriate token counter for each model
index_gpt4o = VectorStoreIndex.from_documents(
    documents,
    callback_manager=CallbackManager([token_counter_gpt4o])
)

index_gemini = VectorStoreIndex.from_documents(
    documents,
    callback_manager=CallbackManager([token_counter_gemini])
)

# Access the counts directly
print("GPT-4o Embedding Tokens: ", token_counter_gpt4o.total_embedding_token_count)
print("Gemini Embedding Tokens: ", token_counter_gemini.total_embedding_token_count)

# Reset the counts at your discretion
token_counter_gpt4o.reset_counts()
token_counter_gemini.reset_counts()

# Track prompt, completion, and total LLM tokens for GPT-4o
response_gpt4o = index_gpt4o.as_query_engine().query("What did the author do growing up?")
print(
    "GPT-4o Embedding Tokens: ",
    token_counter_gpt4o.total_embedding_token_count,
    "\n",
    "GPT-4o LLM Prompt Tokens: ",
    token_counter_gpt4o.prompt_llm_token_count,
    "\n",
    "GPT-4o LLM Completion Tokens: ",
    token_counter_gpt4o.completion_llm_token_count,
    "\n",
    "GPT-4o Total LLM Token Count: ",
    token_counter_gpt4o.total_llm_token_count,
)

# Track prompt, completion, and total LLM tokens for Gemini
response_gemini = index_gemini.as_query_engine().query("What did the author do growing up?")
print(
    "Gemini Embedding Tokens: ",
    token_counter_gemini.total_embedding_token_count,
    "\n",
    "Gemini LLM Prompt Tokens: ",
    token_counter_gemini.prompt_llm_token_count,
    "\n",
    "Gemini LLM Completion Tokens: ",
    token_counter_gemini.completion_llm_token_count,
    "\n",
    "Gemini Total LLM Token Count: ",
    token_counter_gemini.total_llm_token_count,
)

这种设置确保每个 TokenCountingHandler 都为其各自的模型 1(https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/observability/callbacks/token_counting_migration.md)2(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/callbacks/TokenCountingHandler.ipynb)3(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/callbacks/token_counting.py)4(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/callbacks/token_counting.py) 正确记录通话。

  • 要继续与 Dosu 交谈,请提及 @dosu。*
xyhw6mcr

xyhw6mcr2#

你必须将每个处理程序附加到LLM。
OpenAI(..., callback_manager=CallbackManager([mm_llm_token_counter])
Gemini(..., callback_manager=CallbackManager([llm_token_counter])

相关问题