llama_index [Bug]:尽管使用了GPT4,但请求的tokens超过了模型的上下文长度,

6ljaweal  于 2个月前  发布在  其他
关注(0)|答案(6)|浏览(36)

Bug描述

我正在实现 SentenceWindowNodeParser 方法,如 here 所述:
我尝试使用 gpt-4-turbo-previewgpt-4-0125-preview 以及应具有上下文窗口大小为 128K 的嵌入模型 text-embedding-3-large,但当我示例化一个 VectorStoreIndex 时,收到一条错误消息,指出上下文长度为 8192 个标记。

版本

0.10.19

重现步骤

在 colab 笔记本中运行以下代码

%pip install --quiet ragas
%pip install --quiet --upgrade llama-index
%pip install --quiet llama-index-vector-stores-pinecone==0.1.4
%pip install --quiet pinecone-client==3.1.0
%pip install --quiet docx2txt
%pip install --quiet python-pptx
import os
os.environ['OPENAI_API_KEY'] = "..."
os.environ['OPENAI_MODEL'] = "gpt-4-turbo-preview"
from llama_index.core import SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SemanticSplitterNodeParser, SentenceSplitter
from llama_index.core import Settings

base_dir = '/content/drive/MyDrive'
common_docs_dir = base_dir + '/DocumentsTest/Common'

common_docs = SimpleDirectoryReader(common_docs_dir).load_data()

embed_model = OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(temperature=0.0, model="gpt-4-0125-preview")

Settings.embed_model = embed_model
Settings.llm = llm

# deprecated, but I tried using this too to no avail
service_context = ServiceContext.from_defaults(
  llm=llm,
  embed_model=embed_model,
)

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

nodes = node_parser.get_nodes_from_documents(common_docs, show_progress=True)

index  = VectorStoreIndex(nodes, show_progress=True)

相关日志/回溯

Generating embeddings: 100%
 2048/2048 [00:11<00:00, 157.53it/s]
Generating embeddings: 100%
 2048/2048 [00:15<00:00, 97.88it/s]
Generating embeddings:  93%
 1900/2048 [00:30<00:01, 91.94it/s]
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 0.20506248005515848 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 0.29691314917974476 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 2.952889808871351 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 6.762608593280439 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 6.486640098747674 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
---------------------------------------------------------------------------
BadRequestError                           Traceback (most recent call last)
<ipython-input-28-78fc73788cbc> in <cell line: 16>()
     14 
     15 # index = VectorStoreIndex(nodes, model='text-embedding-3-large', show_progress=True)
---> 16 index = VectorStoreIndex(nodes, show_progress=True)

20 frames
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in __init__(self, nodes, use_async, store_nodes_override, embed_model, insert_batch_size, objects, index_struct, storage_context, callback_manager, transformations, show_progress, service_context, **kwargs)
     72 
     73         self._insert_batch_size = insert_batch_size
---> 74         super().__init__(
     75             nodes=nodes,
     76             index_struct=index_struct,

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/base.py in __init__(self, nodes, objects, index_struct, storage_context, callback_manager, transformations, show_progress, service_context, **kwargs)
     92             if index_struct is None:
     93                 nodes = nodes or []
---> 94                 index_struct = self.build_index_from_nodes(
     95                     nodes + objects  # type: ignore
     96                 )

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in build_index_from_nodes(self, nodes, **insert_kwargs)
    305             )
    306 
--> 307         return self._build_index_from_nodes(nodes, **insert_kwargs)
    308 
    309     def _insert(self, nodes: Sequence[BaseNode], **insert_kwargs: Any) -> None:

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in _build_index_from_nodes(self, nodes, **insert_kwargs)
    277             run_async_tasks(tasks)
    278         else:
--> 279             self._add_nodes_to_index(
    280                 index_struct,
    281                 nodes,

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in _add_nodes_to_index(self, index_struct, nodes, show_progress, **insert_kwargs)
    230 
    231         for nodes_batch in iter_batch(nodes, self._insert_batch_size):
--> 232             nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)
    233             new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
    234 

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in _get_node_with_embedding(self, nodes, show_progress)
    138 
    139         """
--> 140         id_to_embed_map = embed_nodes(
141             nodes, self._embed_model, show_progress=show_progress
142         )

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/utils.py in embed_nodes(nodes, embed_model, show_progress)
136             id_to_embed_map[node.node_id] = node.embedding
137 
--> 138     new_embeddings = embed_model.get_text_embedding_batch(
139         texts_to_embed, show_progress=show_progress
140     )

/usr/local/lib/python3.10/dist-packages/llama_index/core/base/embeddings/base.py in get_text_embedding_batch(self, texts, show_progress, **kwargs)
253                     payload={EventPayload.SERIALIZED: self.to_dict()},
254                 ) as event:
--> 255                     embeddings = self._get_text_embeddings(cur_batch)
256                     result_embeddings.extend(embeddings)
257                     event.on_end(

/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/openai/base.py in _get_text_embeddings(self, texts)
417         """
    418         client = self._get_client()
--> 419         return get_embeddings(
    420             client,
    421             texts,

/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in wrapped_f(*args, **kw)
    287         @functools.wraps(f)
    288         def wrapped_f(*args: t.Any, **kw: t.Any) -> t.Any:
--> 289             return self(f, *args, **kw)
    290 
    291         def retry_with(*args: t.Any, **kwargs: t.Any) -> WrappedFn:

/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in __call__(self, fn, *args, **kwargs)
    377         retry_state = RetryCallState(retry_object=self, fn=fn, args=args, kwargs=kwargs)
    378         while True:
--> 379             do = self.iter(retry_state=retry_state)
    380             if isinstance(do, DoAttempt):
    381                 try:

/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in iter(self, retry_state)
    323             retry_exc = self.retry_error_cls(fut)
    324             if self.reraise:
--> 325                 raise retry_exc.reraise()
    326             raise retry_exc from fut.exception()
    327 

/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in reraise(self)
    156     def reraise(self) -> t.NoReturn:
    157         if self.last_attempt.failed:
--> 158             raise self.last_attempt.result()
    159         raise self
    160 

/usr/lib/python3.10/concurrent/futures/_base.py in result(self, timeout)
    449                     raise CancelledError()
    450                 elif self._state == FINISHED:
--> 451                     return self.__get_result()
    452 
    453                 self._condition.wait(timeout)

/usr/lib/python3.10/concurrent/futures/_base.py in __get_result(self)
    401         if self._exception:
    402             try:
--> 403                 raise self._exception
    404             finally:
    405                 # Break a reference cycle with the exception in self._exception

/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in __call__(self, fn, *args, **kwargs)
    380             if isinstance(do, DoAttempt):
    381                 try:
--> 382                     result = fn(*args, **kwargs)
    383                 except BaseException:  # noqa: B902
    384                     retry_state.set_exception(sys.exc_info())  # type: ignore[arg-type]

/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/openai/base.py in get_embeddings(client, list_of_text, engine, **kwargs)
    178     list_of_text = [text.replace("\n", " ") for text in list_of_text]
    179 
--> 180     data = client.embeddings.create(input=list_of_text, model=engine, **kwargs).data
    181     return [d.embedding for d in data]
    182 

/usr/local/lib/python3.10/dist-packages/openai/resources/embeddings.py in create(self, input, model, dimensions, encoding_format, user, extra_headers, extra_query, extra_body, timeout)
    111             return obj
    112 
--> 113         return self._post(
    114             "/embeddings",
    115             body=maybe_transform(params, embedding_create_params.EmbeddingCreateParams),

/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in post(self, path, cast_to, body, options, files, stream, stream_cls)
   1206             method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1207         )
-> 1208         return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
   1209 
   1210     def patch(

/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in request(self, cast_to, options, remaining_retries, stream, stream_cls)
    895         stream_cls: type[_StreamT] | None = None,
    896     ) -> ResponseT | _StreamT:
--> 897         return self._request(
    898             cast_to=cast_to,
    899             options=options,

/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in _request(self, cast_to, options, remaining_retries, stream, stream_cls)
    986 
    987             log.debug("Re-raising status error")
--> 988             raise self._make_status_error_from_response(err.response) from None
    989 
    990         return self._process_response(

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
8xiog9wr

8xiog9wr1#

你遇到的错误是由于你使用的模型(gpt-4-turbo-preview, gpt-4-0125-previewtext-embedding-3-large)的最大令牌限制。这些模型的最大上下文长度为8192个令牌,而你试图处理28094个令牌,这超过了这个限制。
这个问题也在LlamaIndex仓库的previous issue中遇到过。解决方案是处理分词的方式,以不超过模型的最大令牌限制。
要解决这个问题,你需要减少请求中的令牌数量。以下是你可以采用的一些策略:

  1. 拆分输入文本:将你的输入文本拆分成小于模型令牌限制的较小块。然后你可以单独处理这些块,如果需要的话,可以聚合结果。这种方法需要仔细处理,以保持跨块的上下文和意义。
  2. 总结文本:如果适用,你可以使用文本摘要技术或模型将你的输入文本缩短为保留关键信息但符合令牌限制的形式。
  3. 选择性处理:评估你的输入文本的所有部分是否对任务至关重要。如果某些部分不太相关,你可以排除它们以减少令牌计数。
    考虑到使用SentenceWindowNodeParser方法的情况,如果你在实现中使用了它,请考虑在该方法的逻辑内应用这些策略来预处理文本,然后将其传递给模型。
    记住,在实施这些策略时,要测试对结果质量的影响,因为减少输入大小或总结内容可能会影响模型预测或嵌入的结果。
    不幸的是,我在LlamaIndex代码库中找不到gpt-4-turbo-previewgpt-4-0125-preview模型的确切最大令牌限制。然而,错误消息表明限制是8192个令牌。你可能需要与模型的文档或OpenAI API文档进行验证。
    希望这对你有所帮助!如果你有任何其他问题或需要进一步澄清,请随时提问。

参考资料

关于Dosu

此响应旨在节省时间并提供帮助。它不是精确的解决方案,而是你自己研究的起点。
帮助我变得更有用!如果这有帮助请留下一个👍,如果无关请留下一个👎。
如果你想继续对话,请用@dosu-bot开始回复。

oxf4rvwz

oxf4rvwz2#

问题来源似乎是我传入 SimpleDirectoryReader 的目录中的 Excel 文件。我在这里上传了一个:Graphics_Certification_Table_tcm943-214017.xls
为了识别出有问题的节点,我打印了文本长度大于 8192 的任何节点/文件名。似乎节点的文本中有一些二进制编码(下图)。

# code use to identify nodes that exceed length
for (index, node) in enumerate(nodes):
  if len(node.get_text()) > 8192:
    print(node)

我本以为 SimpleDirectoryReader 在嵌入过程之前会引发异常,或者尝试解码它。
一旦我从目录中删除了这些文件,我就可以正确地嵌入,但这些文件仍然没有得到处理。

bt1cpqcv

bt1cpqcv3#

@christopherhan ,来自openai的嵌入模型的最大长度为8192。

4dc9hkyq

4dc9hkyq4#

问题是,SimpleDirectoryReader无法引发错误 - 因为它不知道你将使用哪个嵌入模型以及最大上下文长度是多少。
我建议你在SentenceWindowNodeParser之后使用类似SentenceSplitter的东西,以确保最大上下文长度得到遵守。
此外,你应该更新你的llama-index包版本,因为你仍在使用ServiceContext。

vkc1a9a2

vkc1a9a25#

@christopherhan ,来自openai的嵌入模型的最大长度为8192。
我原以为收到的错误信息是指gpt-4-*模型,而不是嵌入模型。感谢澄清歧义。

mlnl4t2r

mlnl4t2r6#

我们遇到了同样的问题。原因是块大小过大,超过了嵌入模型的限制。
我们的解决方案是通过自定义分割器来减小块的大小。@christopherhan

相关问题