unstructured CCT测量表结构准确性命令不会删除索引,

zour9fqk  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(34)

描述bug

当CCT命令measure-table-structure-accuracy-command找不到要处理的表(即文档格式错误)时,它不会删除额外的索引。

重现方法

执行

PYTHONPATH=. python unstructured/ingest/evaluate.py measure-table-structure-accuracy-command --output_dir ground_truth_text_as_html --source_dir predicted_text_as_html --output_dir output_metrics

命令。

预期行为

执行

命令。

截图

错误信息。

File "/Users/mallori/unstructured/unstructured/ingest/evaluate.py", line 276, in <module>
    main()
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/mallori/unstructured/unstructured/ingest/evaluate.py", line 236, in measure_table_structure_accuracy_command
    return measure_table_structure_accuracy(
  File "/Users/mallori/unstructured/unstructured/metrics/evaluate.py", line 375, in measure_table_structure_accuracy
    agg_df.columns = agg_headers
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 5915, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 823, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 230, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements

环境信息

Python version:  3.9.13
unstructured version:  0.13.3
unstructured-inference version:  0.7.23
pytesseract version:  0.3.10
Torch version:  2.1.0
Detectron2 version:  0.6
PaddleOCR is not installed
Libmagic version:  ==> libmagic: stable 5.45

其他上下文

在这里添加有关问题的任何其他上下文。

相关问题