llama_index [Bug]: MarvinMetadataExtractor 工作不正常,以及 ai_model 装饰器

mlmc2os5  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(19)

问题描述:在使用marvin(通过llama-index)从文本块中提取相关元数据时,遇到了一个问题。首先,这个问题与ai_model装饰器有关:按照notebook中的说明,似乎无法对继承自"llama_index.core.bridge.pydantic"中BaseModel实现的类进行装饰。尝试使用pydantic中的BaseModel直接解决这个问题,但这又产生了一个新的问题:在MarvinMetadataExtractor类中,以下代码似乎有问题:

from marvin import ai_model
if not issubclass(marvin_model, ai_model):
 raise ValueError("marvin_model must be a subclass of ai_model")

从第63行开始,由于ai_model不是一个类,而是一个函数,所以这个定义似乎有问题。

版本:llama-index-extractors-marvin="0.1.3"

重现步骤:

  1. 导入所需的库和模块;
  2. 设置环境变量;
  3. 定义LLM模型字符串;
  4. 使用ai_model装饰器定义RiskExtractor类;
  5. 创建MarvinMetadataExtractor示例。

相关日志/回溯:
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union

lvjbypge

lvjbypge1#

嘿 @DioTeamGit! 我来帮助你解决遇到的问题。我是一个可以协助你解决bug、回答问题并成为贡献者的机器人。告诉我如何帮助你!
要解决你在使用 MarvinMetadataExtractorai_model 装饰器时遇到的问题的步骤如下:

  1. 确保 ai_model 被正确应用: ai_model 装饰器应该应用于继承自 Pydantic 的 BaseModel 的类。确保 ai_model 装饰器被正确导入和应用。
  2. 检查 ai_model 装饰器: ai_model 装饰器应该是一个修改其修饰类的函数。如果 ai_model 不是一个类而是一个函数,那么检查 issubclass(marvin_model, ai_model) 将失败。你需要确保 ai_model 被正确使用。
  3. 修改 MarvinMetadataExtractor: 更新 MarvinMetadataExtractor 类以正确处理 ai_model 装饰器。以下是如何修改类的示例:
from typing import Type, Optional, Any, Sequence, List, Dict, Iterable
from pydantic import BaseModel, Field
from llama_index.legacy.extractors.interface import BaseExtractor
from llama_index.legacy.schema import BaseNode, TextNode
from llama_index.legacy.utils import get_tqdm_iterable

class MarvinMetadataExtractor(BaseExtractor):
    marvin_model: Type[BaseModel] = Field(
        description="The Marvin model to use for extracting custom metadata"
    )
    llm_model_string: Optional[str] = Field(
        description="The LLM model string to use for extracting custom metadata"
    )

    def __init__(
        self,
        marvin_model: Type[BaseModel],
        llm_model_string: Optional[str] = None,
        **kwargs: Any,
    ) -> None:
        import marvin
        from marvin import ai_model

        if not hasattr(marvin_model, '__ai_model__'):
            raise ValueError("marvin_model must be decorated with @ai_model")

        if llm_model_string:
            marvin.settings.llm_model = llm_model_string

        super().__init__(
            marvin_model=marvin_model, llm_model_string=llm_model_string, **kwargs
        )

    @classmethod
    def class_name(cls) -> str:
        return "MarvinEntityExtractor"

    async def aextract(self, nodes: Sequence[BaseNode]) -> List[Dict]:
        from marvin import ai_model

        ai_model_instance = self.marvin_model()
        metadata_list: List[Dict] = []

        nodes_queue: Iterable[BaseNode] = get_tqdm_iterable(
            nodes, self.show_progress, "Extracting marvin metadata"
        )
        for node in nodes_queue:
            if self.is_text_node_only and not isinstance(node, TextNode):
                metadata_list.append({})
                continue

            metadata = ai_model_instance(node.get_content())
            metadata_list.append({"marvin_metadata": metadata.dict()})
        return metadata_list
  1. 示例用法: 确保你对 MarvinMetadataExtractorai_model 装饰器的使用是正确的:
from pydantic import BaseModel, Field
import marvin
from marvin import ai_model

@ai_model
class RiskExtractor(BaseModel):
    risk: str = Field(
        ..., 
        description=(
            "a description of all the chunk parts related to risks "
            f"that can be one of the following: {Config.RISKS}."
        )
    )

metadata_extractor = MarvinMetadataExtractor(
    marvin_model=RiskExtractor, llm_model_string="gpt-4o"
)

通过确保 ai_model 装饰器被正确应用并修改 MarvinMetadataExtractor 类以检查装饰器,你应该能够解决你正在面临的问题 1(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/extractors/marvin_metadata_extractor.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/extractors/llama-index-extractors-marvin/llama_index/extractors/marvin/base.py)。

  • 要继续与 Dosu 交谈,请提及 @dosu。*

相关问题