opencv Tesseract OCR在检测数字时遇到问题

nfs0ujit 于 2023-10-24 发布在其他

关注(0)|答案(3)|浏览(127)

我试图用tesseract在python中检测一些数字。下面你会发现我的起始图像和我可以得到它。这里是我用来得到它的代码。

import pytesseract
import cv2
import numpy as np
pytesseract.pytesseract.tesseract_cmd = "C:\\Users\\choll\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"

image = cv2.imread(r'64normalwart.png')
lower = np.array([254, 254, 254])
upper = np.array([255, 255, 255])
image = cv2.inRange(image, lower, upper)
image = cv2.bitwise_not(image)
#Uses a language that should work with minecraft text, I have tried with and without, no luck 
text = pytesseract.image_to_string(image, lang='mc')
print(text)
cv2.imwrite("Wartthreshnew.jpg", image)
cv2.imshow("Image", image)
cv2.waitKey(0)

最后，我在白色背景上看到了黑色的数字，这看起来很不错，但tesseract仍然无法检测到这些数字。我还注意到这些数字非常参差不齐，但我不知道如何解决这个问题。有人建议我如何让tesseract能够识别这些数字吗？
Starting Image
What I end up with

opencv

来源：https://stackoverflow.com/questions/68562138/tesseract-ocr-having-trouble-detecting-numbers

3条答案

按热度按时间

gcuhipw91#

您的问题在于页面分割模式。Tesseract以不同的方式分割每个图像。当您没有选择合适的PSM时，它会进入模式3，这是自动的，可能不适合您的情况。我刚刚尝试了您的图像，它与PSM 6完美配合。

df = pytesseract.image_to_string(np.array(image),lang='eng', config='--psm 6')

这些都是目前可用的PSM：

0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
            bypassing hacks that are Tesseract-specific.

赞(0）回复(0）举报 2023-10-24

svmlkihl2#

使用pytesseract.image_to_string（img，config ='--psm 8'）或尝试不同的方法来查看图像是否会被识别。有用的链接在这里Pytesseract OCR multiple config options

赞(0）回复(0）举报 2023-10-24

rbpvctlc3#

我认为tesseract默认情况下是黑名单的数字，所以我尝试tessedit_char_whitelist将我想要的字符列入白名单，但它不起作用，所以我尝试使用此配置tessedit_char_unblacklist='0123456789'取消黑名单

pytesseract.image_to_string(img, lang='eng', config='--psm 6 --oem 3 -c tessedit_char_unblacklist=0123456789')

赞(0）回复(0）举报 2023-10-24

我来回答

opencv Tesseract OCR在检测数字时遇到问题

3条答案

相关问题

热门标签

最新问答