在test-ingest-dest检查过程中,上述警告信息被重复记录。有关示例,请参阅here。此问题的目标是消除此警告。
test-ingest-dest
3b6akqbq1#
@rbiseck3 我猜你能指出为什么会出现这个问题。我认为 download_dir 的问题出在这里:https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/ingest/interfaces.py#L164我认为这里应该使用 download_dir: Optional[str] = None 而不是 download_dir: str = ""。你觉得对吗?additional_partition_args 的定义在这里,我相信它也应该被定义为 Optional[Dict[str, str]]。https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/ingest/interfaces.py#L79你能为我们确认一下吗?
download_dir
download_dir: Optional[str] = None
download_dir: str = ""
additional_partition_args
Optional[Dict[str, str]]
2023-11-08 03:49:04,303 MainProcess DEBUG options: 'num_processes': 4, 'output_dir': '/home/runner/work/unstructured/unstructured/test_unstructured_ingest/structured-output/', 'strategy': 'fast', 'verbose': True, 'reprocess': True, 'input_path': 'example-docs/fake-memo.pdf', 'work_dir': '/home/runner/work/unstructured/unstructured/test_unstructured_ingest/workdir/', 'pdf_infer_table_structure': False, 'ocr_languages': None, 'encoding': None, 'skip_infer_table_types': None, 'additional_partition_args': None, 'fields_include': ['element_id', 'text', 'type', 'metadata', 'embeddings'], 'flatten_metadata': False, 'metadata_include': [], 'metadata_exclude': [], 'partition_by_api': False, 'partition_endpoint': 'https://api.unstructured.io/general/v0/general', 'api_key': None, 'download_dir': None, 're_download': False, 'preserve_downloads': False, 'download_only': False, 'max_docs': None, 'embedding_provider': None, 'embedding_api_key': None, 'embedding_model_name': None, 'chunk_elements': False, 'chunk_multipage_sections': False, 'chunk_combine_under_n_chars': 500, 'chunk_new_after_n_chars': 1500, 'raise_on_error': False, 'permissions_application_id': None, 'permissions_client_cred': None, 'permissions_tenant': None, 'max_retries': None, 'max_retry_time': None, 'file_glob': None, 'recursive': False*** 2023-11-08 03:49:04,303 MainProcess DEBUG options: ***'key': '***', 'endpoint': '***', 'index': 'utic-test-ingest-fixtures-output-5afa4da2-7d62-4a0d-a2bf-9b625d3075d4'*** /home/runner/work/unstructured/unstructured/.venv/lib/python3.10/site-packages/dataclasses_json/core.py:187: RuntimeWarning: 'NoneType' object value of non-optional type download_dir detected when decoding CliReadConfig. warnings.warn( /home/runner/work/unstructured/unstructured/.venv/lib/python3.10/site-packages/dataclasses_json/core.py:187: RuntimeWarning: 'NoneType' object value of non-optional type additional_partition_args detected when decoding CliPartitionConfig. warnings.warn(
1条答案
按热度按时间3b6akqbq1#
@rbiseck3 我猜你能指出为什么会出现这个问题。我认为
download_dir
的问题出在这里:https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/ingest/interfaces.py#L164我认为这里应该使用
download_dir: Optional[str] = None
而不是download_dir: str = ""
。你觉得对吗?additional_partition_args
的定义在这里,我相信它也应该被定义为Optional[Dict[str, str]]
。https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/ingest/interfaces.py#L79
你能为我们确认一下吗?