Parallel calls to paddleOCR result in OOM

luaexgnf  于 2022-11-05  发布在  其他
关注(0)|答案(2)|浏览(97)

Parallel calls to '.ocr' function in paddle_ocr result in OOM even when the ocr object is initialized once.
Code to replicate:
paddle_ocr = PaddleOCR(lang='en',use_gpu=False)
def do_ocr(image):
paddle_lines = paddle_ocr.ocr(image)
paddle_lines = [line[1][0] for line in paddle_lines]
print(paddle_lines)

//Called via endpoint
image=
do_ocr(image)

Here the memory size increases linearly with the no of parallel http calls even though paddle_ocr was initialized at the module level.

version: tested on both 2.1.0 and 2.2.0

6tr1vspr

6tr1vspr1#

Sorry I can not reproduce the results you mention above, could you provide a python script that can reproduce your problem?

qpgpyjmq

qpgpyjmq2#

I am using the code via a fastAPI endpoint and as such cannot copy the exact code. However I am able to duplicate it using the threadpoolexecutor framework in python as follows:

from paddleocr import PaddleOCR
import imageio
paddle_ocr = PaddleOCR(lang='en', gpu=False, use_gpu=False)
from concurrent.futures import ThreadPoolExecutor as Pool
executors = Pool(40)
def do_ocr(image):
paddle_lines = paddle_ocr.ocr(image)
paddle_lines = [line[1][0] for line in paddle_lines]
print(paddle_lines)

load any random image

image = imageio.imread(' https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.3/doc/imgs_results/multi_lang/img_01.jpg ')

jobs = [image for i in range(500)]
executors.map(do_ocr,jobs)

相关问题