软件环境
- paddlepaddle:
- paddlepaddle-gpu: 2.5.2.post120
- paddlenlp: 2.8.0
- paddleocr: 2.6.1.3
重复问题
错误描述
文件地址不带签名时正常,带上签名后报错
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x779c0c463c90>
稳定复现步骤 & 代码
import paddlenlp, paddleocr
print("paddlenlp:"+paddlenlp.__version__)
print("paddleocr:"+paddleocr.__version__)
from pprint import pprint
from paddlenlp import Taskflow
schema = ["开票金额是多少?", "销方开户银行是什么?", "发票号码是什么?", "开票日期是哪天?"]
ie = Taskflow("information_extraction", schema=schema, model="uie-x-base")
pprint(ie({"doc": "https://xfhs-zongdui-dev.oss-cn-beijing.aliyuncs.com/2.pdf?Expires=1718704376&OSSAccessKeyId=TMP.3KhVx59XrNtt8WjorPeXMiPnHbQYGSs1WW4no7qEUnnjeEuZcYv5RbS1sYGCxr1gELgXYrNa4d76JBhWwemPj28MUovcxu&Signature=oXCHTXFoS4LD2lDtZ4Tu7lTFTAU%3D"}))
λ 969010514d8d /PaddleNLP/test python app.py
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
paddlenlp:2.8.0.post
paddleocr:2.6.1.3
[2024-06-18 00:53:28,019] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load '/root/.paddlenlp/taskflow/information_extraction/uie-x-base'.
Traceback (most recent call last):
File "/PaddleNLP/test/app.py", line 30, in <module>
pprint(ie({"doc": "https://xfhs-zongdui-dev.oss-cn-beijing.aliyuncs.com/2.pdf?Expires=1718704376&OSSAccessKeyId=TMP.3KhVx59XrNtt8WjorPeXMiPnHbQYGSs1WW4no7qEUnnjeEuZcYv5RbS1sYGCxr1gELgXYrNa4d76JBhWwemPj28MUovcxu&Signature=oXCHTXFoS4LD2lDtZ4Tu7lTFTAU%3D"}))
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/taskflow.py", line 822, in __call__
results = self.task_instance(inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/task.py", line 526, in __call__
inputs = self._preprocess(*args)
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/information_extraction.py", line 605, in _preprocess
inputs = self._check_input_text(inputs)
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/information_extraction.py", line 634, in _check_input_text
data = self._parser_map[self._ocr_lang_choice].parse(
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/utils/doc_parser.py", line 51, in parse
image = self.read_image(doc["doc"])
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/utils/doc_parser.py", line 203, in read_image
_image = np.array(ImageOps.exif_transpose(Image.open(BytesIO(image_buff)).convert("RGB")))
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3305, in open
raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x779c0c463c90>
1条答案
按热度按时间z3yyvxxp1#
稳定复现步骤 & 代码
报错信息
测试文件
2.pdf