Bug描述
我正在实现 SentenceWindowNodeParser
方法,如 here 所述:
我尝试使用 gpt-4-turbo-preview
或 gpt-4-0125-preview
以及应具有上下文窗口大小为 128K 的嵌入模型 text-embedding-3-large
,但当我示例化一个 VectorStoreIndex 时,收到一条错误消息,指出上下文长度为 8192 个标记。
版本
0.10.19
重现步骤
在 colab 笔记本中运行以下代码
%pip install --quiet ragas
%pip install --quiet --upgrade llama-index
%pip install --quiet llama-index-vector-stores-pinecone==0.1.4
%pip install --quiet pinecone-client==3.1.0
%pip install --quiet docx2txt
%pip install --quiet python-pptx
import os
os.environ['OPENAI_API_KEY'] = "..."
os.environ['OPENAI_MODEL'] = "gpt-4-turbo-preview"
from llama_index.core import SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SemanticSplitterNodeParser, SentenceSplitter
from llama_index.core import Settings
base_dir = '/content/drive/MyDrive'
common_docs_dir = base_dir + '/DocumentsTest/Common'
common_docs = SimpleDirectoryReader(common_docs_dir).load_data()
embed_model = OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(temperature=0.0, model="gpt-4-0125-preview")
Settings.embed_model = embed_model
Settings.llm = llm
# deprecated, but I tried using this too to no avail
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
)
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
nodes = node_parser.get_nodes_from_documents(common_docs, show_progress=True)
index = VectorStoreIndex(nodes, show_progress=True)
相关日志/回溯
Generating embeddings: 100%
2048/2048 [00:11<00:00, 157.53it/s]
Generating embeddings: 100%
2048/2048 [00:15<00:00, 97.88it/s]
Generating embeddings: 93%
1900/2048 [00:30<00:01, 91.94it/s]
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 0.20506248005515848 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 0.29691314917974476 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 2.952889808871351 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 6.762608593280439 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 6.486640098747674 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
---------------------------------------------------------------------------
BadRequestError Traceback (most recent call last)
<ipython-input-28-78fc73788cbc> in <cell line: 16>()
14
15 # index = VectorStoreIndex(nodes, model='text-embedding-3-large', show_progress=True)
---> 16 index = VectorStoreIndex(nodes, show_progress=True)
20 frames
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in __init__(self, nodes, use_async, store_nodes_override, embed_model, insert_batch_size, objects, index_struct, storage_context, callback_manager, transformations, show_progress, service_context, **kwargs)
72
73 self._insert_batch_size = insert_batch_size
---> 74 super().__init__(
75 nodes=nodes,
76 index_struct=index_struct,
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/base.py in __init__(self, nodes, objects, index_struct, storage_context, callback_manager, transformations, show_progress, service_context, **kwargs)
92 if index_struct is None:
93 nodes = nodes or []
---> 94 index_struct = self.build_index_from_nodes(
95 nodes + objects # type: ignore
96 )
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in build_index_from_nodes(self, nodes, **insert_kwargs)
305 )
306
--> 307 return self._build_index_from_nodes(nodes, **insert_kwargs)
308
309 def _insert(self, nodes: Sequence[BaseNode], **insert_kwargs: Any) -> None:
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in _build_index_from_nodes(self, nodes, **insert_kwargs)
277 run_async_tasks(tasks)
278 else:
--> 279 self._add_nodes_to_index(
280 index_struct,
281 nodes,
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in _add_nodes_to_index(self, index_struct, nodes, show_progress, **insert_kwargs)
230
231 for nodes_batch in iter_batch(nodes, self._insert_batch_size):
--> 232 nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)
233 new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
234
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py in _get_node_with_embedding(self, nodes, show_progress)
138
139 """
--> 140 id_to_embed_map = embed_nodes(
141 nodes, self._embed_model, show_progress=show_progress
142 )
/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/utils.py in embed_nodes(nodes, embed_model, show_progress)
136 id_to_embed_map[node.node_id] = node.embedding
137
--> 138 new_embeddings = embed_model.get_text_embedding_batch(
139 texts_to_embed, show_progress=show_progress
140 )
/usr/local/lib/python3.10/dist-packages/llama_index/core/base/embeddings/base.py in get_text_embedding_batch(self, texts, show_progress, **kwargs)
253 payload={EventPayload.SERIALIZED: self.to_dict()},
254 ) as event:
--> 255 embeddings = self._get_text_embeddings(cur_batch)
256 result_embeddings.extend(embeddings)
257 event.on_end(
/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/openai/base.py in _get_text_embeddings(self, texts)
417 """
418 client = self._get_client()
--> 419 return get_embeddings(
420 client,
421 texts,
/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in wrapped_f(*args, **kw)
287 @functools.wraps(f)
288 def wrapped_f(*args: t.Any, **kw: t.Any) -> t.Any:
--> 289 return self(f, *args, **kw)
290
291 def retry_with(*args: t.Any, **kwargs: t.Any) -> WrappedFn:
/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in __call__(self, fn, *args, **kwargs)
377 retry_state = RetryCallState(retry_object=self, fn=fn, args=args, kwargs=kwargs)
378 while True:
--> 379 do = self.iter(retry_state=retry_state)
380 if isinstance(do, DoAttempt):
381 try:
/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in iter(self, retry_state)
323 retry_exc = self.retry_error_cls(fut)
324 if self.reraise:
--> 325 raise retry_exc.reraise()
326 raise retry_exc from fut.exception()
327
/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in reraise(self)
156 def reraise(self) -> t.NoReturn:
157 if self.last_attempt.failed:
--> 158 raise self.last_attempt.result()
159 raise self
160
/usr/lib/python3.10/concurrent/futures/_base.py in result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
452
453 self._condition.wait(timeout)
/usr/lib/python3.10/concurrent/futures/_base.py in __get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
/usr/local/lib/python3.10/dist-packages/tenacity/__init__.py in __call__(self, fn, *args, **kwargs)
380 if isinstance(do, DoAttempt):
381 try:
--> 382 result = fn(*args, **kwargs)
383 except BaseException: # noqa: B902
384 retry_state.set_exception(sys.exc_info()) # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/openai/base.py in get_embeddings(client, list_of_text, engine, **kwargs)
178 list_of_text = [text.replace("\n", " ") for text in list_of_text]
179
--> 180 data = client.embeddings.create(input=list_of_text, model=engine, **kwargs).data
181 return [d.embedding for d in data]
182
/usr/local/lib/python3.10/dist-packages/openai/resources/embeddings.py in create(self, input, model, dimensions, encoding_format, user, extra_headers, extra_query, extra_body, timeout)
111 return obj
112
--> 113 return self._post(
114 "/embeddings",
115 body=maybe_transform(params, embedding_create_params.EmbeddingCreateParams),
/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in post(self, path, cast_to, body, options, files, stream, stream_cls)
1206 method="post", url=path, json_data=body, files=to_httpx_files(files), **options
1207 )
-> 1208 return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
1209
1210 def patch(
/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in request(self, cast_to, options, remaining_retries, stream, stream_cls)
895 stream_cls: type[_StreamT] | None = None,
896 ) -> ResponseT | _StreamT:
--> 897 return self._request(
898 cast_to=cast_to,
899 options=options,
/usr/local/lib/python3.10/dist-packages/openai/_base_client.py in _request(self, cast_to, options, remaining_retries, stream, stream_cls)
986
987 log.debug("Re-raising status error")
--> 988 raise self._make_status_error_from_response(err.response) from None
989
990 return self._process_response(
BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 28094 tokens (28094 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
6条答案
按热度按时间8xiog9wr1#
你遇到的错误是由于你使用的模型(
gpt-4-turbo-preview
,gpt-4-0125-preview
和text-embedding-3-large
)的最大令牌限制。这些模型的最大上下文长度为8192个令牌,而你试图处理28094个令牌,这超过了这个限制。这个问题也在LlamaIndex仓库的previous issue中遇到过。解决方案是处理分词的方式,以不超过模型的最大令牌限制。
要解决这个问题,你需要减少请求中的令牌数量。以下是你可以采用的一些策略:
考虑到使用
SentenceWindowNodeParser
方法的情况,如果你在实现中使用了它,请考虑在该方法的逻辑内应用这些策略来预处理文本,然后将其传递给模型。记住,在实施这些策略时,要测试对结果质量的影响,因为减少输入大小或总结内容可能会影响模型预测或嵌入的结果。
不幸的是,我在LlamaIndex代码库中找不到
gpt-4-turbo-preview
和gpt-4-0125-preview
模型的确切最大令牌限制。然而,错误消息表明限制是8192个令牌。你可能需要与模型的文档或OpenAI API文档进行验证。希望这对你有所帮助!如果你有任何其他问题或需要进一步澄清,请随时提问。
参考资料
关于Dosu
此响应旨在节省时间并提供帮助。它不是精确的解决方案,而是你自己研究的起点。
帮助我变得更有用!如果这有帮助请留下一个👍,如果无关请留下一个👎。
如果你想继续对话,请用@dosu-bot开始回复。
oxf4rvwz2#
问题来源似乎是我传入
SimpleDirectoryReader
的目录中的 Excel 文件。我在这里上传了一个:Graphics_Certification_Table_tcm943-214017.xls。为了识别出有问题的节点,我打印了文本长度大于 8192 的任何节点/文件名。似乎节点的文本中有一些二进制编码(下图)。
我本以为
SimpleDirectoryReader
在嵌入过程之前会引发异常,或者尝试解码它。一旦我从目录中删除了这些文件,我就可以正确地嵌入,但这些文件仍然没有得到处理。
bt1cpqcv3#
@christopherhan ,来自openai的嵌入模型的最大长度为8192。
4dc9hkyq4#
问题是,SimpleDirectoryReader无法引发错误 - 因为它不知道你将使用哪个嵌入模型以及最大上下文长度是多少。
我建议你在SentenceWindowNodeParser之后使用类似SentenceSplitter的东西,以确保最大上下文长度得到遵守。
此外,你应该更新你的llama-index包版本,因为你仍在使用ServiceContext。
vkc1a9a25#
@christopherhan ,来自openai的嵌入模型的最大长度为8192。
我原以为收到的错误信息是指
gpt-4-*
模型,而不是嵌入模型。感谢澄清歧义。mlnl4t2r6#
我们遇到了同样的问题。原因是块大小过大,超过了嵌入模型的限制。
我们的解决方案是通过自定义分割器来减小块的大小。@christopherhan