描述bug
当CCT命令measure-table-structure-accuracy-command
找不到要处理的表(即文档格式错误)时,它不会删除额外的索引。
重现方法
执行
PYTHONPATH=. python unstructured/ingest/evaluate.py measure-table-structure-accuracy-command --output_dir ground_truth_text_as_html --source_dir predicted_text_as_html --output_dir output_metrics
命令。
预期行为
执行
命令。
截图
错误信息。
File "/Users/mallori/unstructured/unstructured/ingest/evaluate.py", line 276, in <module>
main()
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/mallori/unstructured/unstructured/ingest/evaluate.py", line 236, in measure_table_structure_accuracy_command
return measure_table_structure_accuracy(
File "/Users/mallori/unstructured/unstructured/metrics/evaluate.py", line 375, in measure_table_structure_accuracy
agg_df.columns = agg_headers
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 5915, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 823, in _set_axis
self._mgr.set_axis(axis, labels)
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 230, in set_axis
self._validate_set_axis(axis, new_labels)
File "/Users/mallori/opt/anaconda3/lib/python3.9/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
raise ValueError(
ValueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements
环境信息
Python version: 3.9.13
unstructured version: 0.13.3
unstructured-inference version: 0.7.23
pytesseract version: 0.3.10
Torch version: 2.1.0
Detectron2 version: 0.6
PaddleOCR is not installed
Libmagic version: ==> libmagic: stable 5.45
其他上下文
在这里添加有关问题的任何其他上下文。
1条答案
按热度按时间tjrkku2a1#
prediction_table_0_0.png.txt
ground_truth_table_0_0.png.txt