我正在使用apple phone number recognition sample。通常它会在识别的文本周围创建一个红色的轮廓。我的似乎无法识别文本并创建红色轮廓,即使我使用了他们的代码。唯一的区别是我的视图控制器类被称为“TextScanViewController”,而他们的只是“ViewController”。我仔细检查并确保任何“ViewController”都被更改为“TextScanViewController”。我还遗漏了什么需要修改的地方吗?
下面是它应该看起来像什么(当我使用原来的苹果项目)相比,它正在做什么(应该有红色的轮廓,但没有显示他们,即使文本是完美的中心矩形)
应如下所示:
看起来像:
我正在使用5个不同的swift文件(预览视图、文本扫描视图控制器、视觉视图控制器、字符串实用程序、应用程序委托)
文本扫描视图控制器:
import UIKit
import AVFoundation
import Vision
class TextScanViewController: UIViewController {
// MARK: - UI objects
@IBOutlet weak var previewView: PreviewView!
@IBOutlet weak var cutoutView: UIView!
@IBOutlet weak var numberView: UILabel!
var maskLayer = CAShapeLayer()
// Device orientation. Updated whenever the orientation changes to a
// different supported orientation.
var currentOrientation = UIDeviceOrientation.portrait
// MARK: - Capture related objects
private let captureSession = AVCaptureSession()
let captureSessionQueue = DispatchQueue(label: "com.example.apple-samplecode.CaptureSessionQueue")
var captureDevice: AVCaptureDevice?
var videoDataOutput = AVCaptureVideoDataOutput()
let videoDataOutputQueue = DispatchQueue(label: "com.example.apple-samplecode.VideoDataOutputQueue")
// MARK: - Region of interest (ROI) and text orientation
// Region of video data output buffer that recognition should be run on.
// Gets recalculated once the bounds of the preview layer are known.
var regionOfInterest = CGRect(x: 0, y: 0, width: 1, height: 1)
// Orientation of text to search for in the region of interest.
var textOrientation = CGImagePropertyOrientation.up
// MARK: - Coordinate transforms
var bufferAspectRatio: Double!
// Transform from UI orientation to buffer orientation.
var uiRotationTransform = CGAffineTransform.identity
// Transform bottom-left coordinates to top-left.
var bottomToTopTransform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1)
// Transform coordinates in ROI to global coordinates (still normalized).
var roiToGlobalTransform = CGAffineTransform.identity
// Vision -> AVF coordinate transform.
var visionToAVFTransform = CGAffineTransform.identity
// MARK: - View controller methods
override func viewDidLoad() {
super.viewDidLoad()
// Set up preview view.
previewView.session = captureSession
// Set up cutout view.
cutoutView.backgroundColor = UIColor.gray.withAlphaComponent(0.5)
maskLayer.backgroundColor = UIColor.clear.cgColor
maskLayer.fillRule = .evenOdd
cutoutView.layer.mask = maskLayer
// Starting the capture session is a blocking call. Perform setup using
// a dedicated serial dispatch queue to prevent blocking the main thread.
captureSessionQueue.async {
self.setupCamera()
// Calculate region of interest now that the camera is setup.
DispatchQueue.main.async {
// Figure out initial ROI.
self.calculateRegionOfInterest()
}
}
}
override func viewWillTransition(to size: CGSize, with coordinator: UIViewControllerTransitionCoordinator) {
super.viewWillTransition(to: size, with: coordinator)
// Only change the current orientation if the new one is landscape or
// portrait. You can't really do anything about flat or unknown.
let deviceOrientation = UIDevice.current.orientation
if deviceOrientation.isPortrait || deviceOrientation.isLandscape {
currentOrientation = deviceOrientation
}
// Handle device orientation in the preview layer.
if let videoPreviewLayerConnection = previewView.videoPreviewLayer.connection {
if let newVideoOrientation = AVCaptureVideoOrientation(deviceOrientation: deviceOrientation) {
videoPreviewLayerConnection.videoOrientation = newVideoOrientation
}
}
// Orientation changed: figure out new region of interest (ROI).
calculateRegionOfInterest()
}
override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
updateCutout()
}
// MARK: - Setup
func calculateRegionOfInterest() {
// In landscape orientation the desired ROI is specified as the ratio of
// buffer width to height. When the UI is rotated to portrait, keep the
// vertical size the same (in buffer pixels). Also try to keep the
// horizontal size the same up to a maximum ratio.
let desiredHeightRatio = 0.15
let desiredWidthRatio = 0.6
let maxPortraitWidth = 0.8
// Figure out size of ROI.
let size: CGSize
if currentOrientation.isPortrait || currentOrientation == .unknown {
size = CGSize(width: min(desiredWidthRatio * bufferAspectRatio, maxPortraitWidth), height: desiredHeightRatio / bufferAspectRatio)
} else {
size = CGSize(width: desiredWidthRatio, height: desiredHeightRatio)
}
// Make it centered.
regionOfInterest.origin = CGPoint(x: (1 - size.width) / 2, y: (1 - size.height) / 2)
regionOfInterest.size = size
// ROI changed, update transform.
setupOrientationAndTransform()
// Update the cutout to match the new ROI.
DispatchQueue.main.async {
// Wait for the next run cycle before updating the cutout. This
// ensures that the preview layer already has its new orientation.
self.updateCutout()
}
}
func updateCutout() {
// Figure out where the cutout ends up in layer coordinates.
let roiRectTransform = bottomToTopTransform.concatenating(uiRotationTransform)
let cutout = previewView.videoPreviewLayer.layerRectConverted(fromMetadataOutputRect: regionOfInterest.applying(roiRectTransform))
// Create the mask.
let path = UIBezierPath(rect: cutoutView.frame)
path.append(UIBezierPath(rect: cutout))
maskLayer.path = path.cgPath
// Move the number view down to under cutout.
var numFrame = cutout
numFrame.origin.y += numFrame.size.height
numberView.frame = numFrame
}
func setupOrientationAndTransform() {
// Recalculate the affine transform between Vision coordinates and AVF coordinates.
// Compensate for region of interest.
let roi = regionOfInterest
roiToGlobalTransform = CGAffineTransform(translationX: roi.origin.x, y: roi.origin.y).scaledBy(x: roi.width, y: roi.height)
// Compensate for orientation (buffers always come in the same orientation).
switch currentOrientation {
case .landscapeLeft:
textOrientation = CGImagePropertyOrientation.up
uiRotationTransform = CGAffineTransform.identity
case .landscapeRight:
textOrientation = CGImagePropertyOrientation.down
uiRotationTransform = CGAffineTransform(translationX: 1, y: 1).rotated(by: CGFloat.pi)
case .portraitUpsideDown:
textOrientation = CGImagePropertyOrientation.left
uiRotationTransform = CGAffineTransform(translationX: 1, y: 0).rotated(by: CGFloat.pi / 2)
default: // We default everything else to .portraitUp
textOrientation = CGImagePropertyOrientation.right
uiRotationTransform = CGAffineTransform(translationX: 0, y: 1).rotated(by: -CGFloat.pi / 2)
}
// Full Vision ROI to AVF transform.
visionToAVFTransform = roiToGlobalTransform.concatenating(bottomToTopTransform).concatenating(uiRotationTransform)
}
func setupCamera() {
guard let captureDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: AVMediaType.video, position: .back) else {
print("Could not create capture device.")
return
}
self.captureDevice = captureDevice
// NOTE:
// Requesting 4k buffers allows recognition of smaller text but will
// consume more power. Use the smallest buffer size necessary to keep
// down battery usage.
if captureDevice.supportsSessionPreset(.hd4K3840x2160) {
captureSession.sessionPreset = AVCaptureSession.Preset.hd4K3840x2160
bufferAspectRatio = 3840.0 / 2160.0
} else {
captureSession.sessionPreset = AVCaptureSession.Preset.hd1920x1080
bufferAspectRatio = 1920.0 / 1080.0
}
guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else {
print("Could not create device input.")
return
}
if captureSession.canAddInput(deviceInput) {
captureSession.addInput(deviceInput)
}
// Configure video data output.
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]
if captureSession.canAddOutput(videoDataOutput) {
captureSession.addOutput(videoDataOutput)
// NOTE:
// There is a trade-off to be made here. Enabling stabilization will
// give temporally more stable results and should help the recognizer
// converge. But if it's enabled the VideoDataOutput buffers don't
// match what's displayed on screen, which makes drawing bounding
// boxes very hard. Disable it in this app to allow drawing detected
// bounding boxes on screen.
videoDataOutput.connection(with: AVMediaType.video)?.preferredVideoStabilizationMode = .off
} else {
print("Could not add VDO output")
return
}
// Set zoom and autofocus to help focus on very small text.
do {
try captureDevice.lockForConfiguration()
captureDevice.videoZoomFactor = 2
captureDevice.autoFocusRangeRestriction = .near
captureDevice.unlockForConfiguration()
} catch {
print("Could not set zoom level due to error: \(error)")
return
}
captureSession.startRunning()
}
// MARK: - UI drawing and interaction
func showString(string: String) {
// Found a definite number.
// Stop the camera synchronously to ensure that no further buffers are
// received. Then update the number view asynchronously.
captureSessionQueue.sync {
self.captureSession.stopRunning()
DispatchQueue.main.async {
self.numberView.text = string
self.numberView.isHidden = false
}
}
}
@IBAction func handleTap(_ sender: UITapGestureRecognizer) {
captureSessionQueue.async {
if !self.captureSession.isRunning {
self.captureSession.startRunning()
}
DispatchQueue.main.async {
self.numberView.isHidden = true
}
}
}
}
// MARK: - AVCaptureVideoDataOutputSampleBufferDelegate
extension TextScanViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// This is implemented in VisionViewController.
}
}
// MARK: - Utility extensions
extension AVCaptureVideoOrientation {
init?(deviceOrientation: UIDeviceOrientation) {
switch deviceOrientation {
case .portrait: self = .portrait
case .portraitUpsideDown: self = .portraitUpsideDown
case .landscapeLeft: self = .landscapeRight
case .landscapeRight: self = .landscapeLeft
default: return nil
}
}
}
预览视图:
import Foundation
import UIKit
import AVFoundation
class PreviewView: UIView {
var videoPreviewLayer: AVCaptureVideoPreviewLayer {
guard let layer = layer as? AVCaptureVideoPreviewLayer else {
fatalError("Expected `AVCaptureVideoPreviewLayer` type for layer. Check PreviewView.layerClass implementation.")
}
return layer
}
var session: AVCaptureSession? {
get {
return videoPreviewLayer.session
}
set {
videoPreviewLayer.session = newValue
}
}
// MARK: UIView
override class var layerClass: AnyClass {
return AVCaptureVideoPreviewLayer.self
}
}
视觉视图控制器:
import UIKit
import AVFoundation
import Vision
class VisionViewController: TextScanViewController {
var request: VNRecognizeTextRequest!
// Temporal string tracker
let numberTracker = StringTracker()
override func viewDidLoad() {
// Set up vision request before letting ViewController set up the camera
// so that it exists when the first buffer is received.
request = VNRecognizeTextRequest(completionHandler: recognizeTextHandler)
super.viewDidLoad()
}
// MARK: - Text recognition
// Vision recognition handler.
func recognizeTextHandler(request: VNRequest, error: Error?) {
var numbers = [String]()
var redBoxes = [CGRect]() // Shows all recognized text lines
var greenBoxes = [CGRect]() // Shows words that might be serials
guard let results = request.results as? [VNRecognizedTextObservation] else {
return
}
let maximumCandidates = 1
for visionResult in results {
guard let candidate = visionResult.topCandidates(maximumCandidates).first else { continue }
// Draw red boxes around any detected text, and green boxes around
// any detected phone numbers. The phone number may be a substring
// of the visionResult. If a substring, draw a green box around the
// number and a red box around the full string. If the number covers
// the full result only draw the green box.
var numberIsSubstring = true
if let result = candidate.string.extractPhoneNumber() {
let (range, number) = result
// Number may not cover full visionResult. Extract bounding box
// of substring.
if let box = try? candidate.boundingBox(for: range)?.boundingBox {
numbers.append(number)
greenBoxes.append(box)
numberIsSubstring = !(range.lowerBound == candidate.string.startIndex && range.upperBound == candidate.string.endIndex)
}
}
if numberIsSubstring {
redBoxes.append(visionResult.boundingBox)
}
}
// Log any found numbers.
numberTracker.logFrame(strings: numbers)
show(boxGroups: [(color: UIColor.red.cgColor, boxes: redBoxes), (color: UIColor.green.cgColor, boxes: greenBoxes)])
// Check if we have any temporally stable numbers.
if let sureNumber = numberTracker.getStableString() {
showString(string: sureNumber)
numberTracker.reset(string: sureNumber)
}
}
override func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
// Configure for running in real-time.
request.recognitionLevel = .fast
// Language correction won't help recognizing phone numbers. It also
// makes recognition slower.
request.usesLanguageCorrection = false
// Only run on the region of interest for maximum speed.
request.regionOfInterest = regionOfInterest
let requestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: textOrientation, options: [:])
do {
try requestHandler.perform([request])
} catch {
print(error)
}
}
}
// MARK: - Bounding box drawing
// Draw a box on screen. Must be called from main queue.
var boxLayer = [CAShapeLayer]()
func draw(rect: CGRect, color: CGColor) {
let layer = CAShapeLayer()
layer.opacity = 0.5
layer.borderColor = color
layer.borderWidth = 2
layer.frame = rect
boxLayer.append(layer)
previewView.videoPreviewLayer.insertSublayer(layer, at: 1)
}
// Remove all drawn boxes. Must be called on main queue.
func removeBoxes() {
for layer in boxLayer {
layer.removeFromSuperlayer()
}
boxLayer.removeAll()
}
typealias ColoredBoxGroup = (color: CGColor, boxes: [CGRect])
// Draws groups of colored boxes.
func show(boxGroups: [ColoredBoxGroup]) {
DispatchQueue.main.async {
let layer = self.previewView.videoPreviewLayer
self.removeBoxes()
for boxGroup in boxGroups {
let color = boxGroup.color
for box in boxGroup.boxes {
let rect = layer.layerRectConverted(fromMetadataOutputRect: box.applying(self.visionToAVFTransform))
self.draw(rect: rect, color: color)
}
}
}
}
}
字符串实用程序:
import Foundation
extension Character {
// Given a list of allowed characters, try to convert self to those in list
// if not already in it. This handles some common misclassifications for
// characters that are visually similar and can only be correctly recognized
// with more context and/or domain knowledge. Some examples (should be read
// in Menlo or some other font that has different symbols for all characters):
// 1 and l are the same character in Times New Roman
// I and l are the same character in Helvetica
// 0 and O are extremely similar in many fonts
// oO, wW, cC, sS, pP and others only differ by size in many fonts
func getSimilarCharacterIfNotIn(allowedChars: String) -> Character {
let conversionTable = [
"s": "S",
"S": "5",
"5": "S",
"o": "O",
"Q": "O",
"O": "0",
"0": "O",
"l": "I",
"I": "1",
"1": "I",
"B": "8",
"8": "B"
]
// Allow a maximum of two substitutions to handle 's' -> 'S' -> '5'.
let maxSubstitutions = 2
var current = String(self)
var counter = 0
while !allowedChars.contains(current) && counter < maxSubstitutions {
if let altChar = conversionTable[current] {
current = altChar
counter += 1
} else {
// Doesn't match anything in our table. Give up.
break
}
}
return current.first!
}
}
extension String {
// Extracts the first US-style phone number found in the string, returning
// the range of the number and the number itself as a tuple.
// Returns nil if no number is found.
func extractPhoneNumber() -> (Range<String.Index>, String)? {
// Do a first pass to find any substring that could be a US phone
// number. This will match the following common patterns and more:
// xxx-xxx-xxxx
// xxx xxx xxxx
// (xxx) xxx-xxxx
// (xxx)xxx-xxxx
// xxx.xxx.xxxx
// xxx xxx-xxxx
// xxx/xxx.xxxx
// +1-xxx-xxx-xxxx
// Note that this doesn't only look for digits since some digits look
// very similar to letters. This is handled later.
let pattern = #"""
(?x) # Verbose regex, allows comments
(?:\+1-?)? # Potential international prefix, may have -
[(]? # Potential opening (
\b(\w{3}) # Capture xxx
[)]? # Potential closing )
[\ -./]? # Potential separator
(\w{3}) # Capture xxx
[\ -./]? # Potential separator
(\w{4})\b # Capture xxxx
"""#
guard let range = self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
// No phone number found.
return nil
}
// Potential number found. Strip out punctuation, whitespace and country
// prefix.
var phoneNumberDigits = ""
let substring = String(self[range])
let nsrange = NSRange(substring.startIndex..., in: substring)
do {
// Extract the characters from the substring.
let regex = try NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
for rangeInd in 1 ..< match.numberOfRanges {
let range = match.range(at: rangeInd)
let matchString = (substring as NSString).substring(with: range)
phoneNumberDigits += matchString as String
}
}
} catch {
print("Error \(error) when creating pattern")
}
// Must be exactly 10 digits.
guard phoneNumberDigits.count == 10 else {
return nil
}
// Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
var result = ""
let allowedChars = "0123456789"
for var char in phoneNumberDigits {
char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
guard allowedChars.contains(char) else {
return nil
}
result.append(char)
}
return (range, result)
}
}
class StringTracker {
var frameIndex: Int64 = 0
typealias StringObservation = (lastSeen: Int64, count: Int64)
// Dictionary of seen strings. Used to get stable recognition before
// displaying anything.
var seenStrings = [String: StringObservation]()
var bestCount = Int64(0)
var bestString = ""
func logFrame(strings: [String]) {
for string in strings {
if seenStrings[string] == nil {
seenStrings[string] = (lastSeen: Int64(0), count: Int64(-1))
}
seenStrings[string]?.lastSeen = frameIndex
seenStrings[string]?.count += 1
print("Seen \(string) \(seenStrings[string]?.count ?? 0) times")
}
var obsoleteStrings = [String]()
// Go through strings and prune any that have not been seen in while.
// Also find the (non-pruned) string with the greatest count.
for (string, obs) in seenStrings {
// Remove previously seen text after 30 frames (~1s).
if obs.lastSeen < frameIndex - 30 {
obsoleteStrings.append(string)
}
// Find the string with the greatest count.
let count = obs.count
if !obsoleteStrings.contains(string) && count > bestCount {
bestCount = Int64(count)
bestString = string
}
}
// Remove old strings.
for string in obsoleteStrings {
seenStrings.removeValue(forKey: string)
}
frameIndex += 1
}
func getStableString() -> String? {
// Require the recognizer to see the same string at least 10 times.
if bestCount >= 10 {
return bestString
} else {
return nil
}
}
func reset(string: String) {
seenStrings.removeValue(forKey: string)
bestCount = 0
bestString = ""
}
}
应用委托:
import UIKit
@UIApplicationMain
class AppDelegate: UIResponder, UIApplicationDelegate {
var window: UIWindow?
}
1条答案
按热度按时间pkwftd7m1#
我在视图控制器上使用了错误的类. a a a a.而不是TextScanViewController,它应该被设置为Visionviewcontroller.... a.它跳过了整个类。我没有意识到类是如何继承的,它们有一个重要的顺序。我有很多东西要学,但学到了很多!:)