tensorflow iOS上的Google MLKit未检测到对象

brtdzjyr  于 2023-02-05  发布在  iOS
关注(0)|答案(1)|浏览(157)

我是谷歌MLKit的新手,想检测Android和iOS的账单/收据,我使用X1 E0 F1 X和X1 E1 F1 X。
检测由Google MLkit完成,然后由react-native-vision-camera解释
java中的android上我没有问题,账单被很好地检测到:
Image
在iOS上,对于相同的代码(但使用Objective C而不是Java),我从未检测到发票:
Image

#import <Foundation/Foundation.h>
#import <VisionCamera/FrameProcessorPlugin.h>
#import <VisionCamera/Frame.h>
#import <MLKit.h>

@interface VisionScanObjectsFrameProcessorPlugin : NSObject
+ (MLKObjectDetector*) objectDetector;
@end

@implementation VisionScanObjectsFrameProcessorPlugin

+ (MLKObjectDetector*) objectDetector {
  static MLKObjectDetector* objectDetector = nil;
  if (objectDetector == nil) {
    NSString *path = [[NSBundle mainBundle] pathForResource:@"lite-model_object_detection_mobile_object_labeler_v1_1" ofType:@"tflite"];
    MLKLocalModel *localModel = [[MLKLocalModel alloc] initWithPath:path];

    MLKCustomObjectDetectorOptions *options = [[MLKCustomObjectDetectorOptions alloc] initWithLocalModel:localModel];
    options.detectorMode = MLKObjectDetectorModeSingleImage;
    options.shouldEnableClassification = YES;
    options.classificationConfidenceThreshold = @(0.5);
    options.maxPerObjectLabelCount = 3;

    objectDetector = [MLKObjectDetector objectDetectorWithOptions:options];
  }

  return objectDetector;
}

static inline id scanObjects(Frame* frame, NSArray* arguments) {
  MLKVisionImage *image = [[MLKVisionImage alloc] initWithBuffer:frame.buffer];
  image.orientation = frame.orientation; // <-- TODO: is mirrored?

  NSError* error;
  NSArray<MLKObject*>* objects = [[VisionScanObjectsFrameProcessorPlugin objectDetector] resultsInImage:image error:&error];

  NSLog(@"Object detected : %ld", objects.count);

  NSMutableArray* results = [NSMutableArray arrayWithCapacity:objects.count];
  for (MLKObject* object in objects) {
    NSMutableArray* labels = [NSMutableArray arrayWithCapacity:object.labels.count];

    for (MLKObjectLabel* label in object.labels) {
      if (122 == label.index || 188 == label.index || 288 == label.index || 325 == label.index || 357 == label.index || 370 == label.index || 480 == label.index || 510 == label.index || 551 == label.index) {
        [labels addObject:@{
          @"index": [NSNumber numberWithFloat:label.index],
          @"label": label.text,
          @"confidence": [NSNumber numberWithFloat:label.confidence]
        }];
      }
    }

    if (labels.count != 0) {
      [results addObject:@{
        @"width": [NSNumber numberWithFloat:object.frame.size.width],
        @"height": [NSNumber numberWithFloat:object.frame.size.height],
        @"top": [NSNumber numberWithFloat:object.frame.origin.y],
        @"left": [NSNumber numberWithFloat:object.frame.origin.x],
        @"frameRotation": [NSNumber numberWithFloat:frame.orientation],
        @"labels": labels
      }];
    }
  }

  return results;
}

VISION_EXPORT_FRAME_PROCESSOR(scanObjects)

@end

我真的认为这段代码可以工作,因为我没有任何崩溃(在它工作之前^^我有),但我从来没有检测到一个文档。:/
NSLog(@"Object detected : %ld", objects.count);几乎总是return 0。**例外的是,它将return 1**检测我的电脑键盘,但这是非常非常罕见的。
在过去的4天里,我尝试了很多方法(different model、异步检测、检测前调整大小等),但还是一样:/

xdnvmnnf

xdnvmnnf1#

对于Jaroslaw K.来说,我不得不缩小边框尺寸才能让它工作。
我的模型(https://tfhub.dev/tensorflow/efficientnet/lite0/classification/2)的文档中建议这样做:
对于该模块,输入图像的大小是灵活的,但最好匹配模型训练输入,对于该模型,其为高度x宽度= 224 x 224像素。输入图像的颜色值应在[0,1]范围内,遵循常见的图像输入约定。

public static func resizeFrameToUiimage(frame: Frame) -> UIImage! {
  let targetSize = CGSize(width: 224.0, height: 224.0)

  let imageBuffer = CMSampleBufferGetImageBuffer(frame.buffer)!
  let ciimage = CIImage(cvPixelBuffer: imageBuffer)
  
  let context = CIContext(options: nil)
  let cgImage = context.createCGImage(ciimage, from: ciimage.extent)!
  let uiimage = UIImage(cgImage: cgImage)

  let widthRatio  = targetSize.width  / uiimage.size.width
  let heightRatio = targetSize.height / uiimage.size.height
      
  var newSize: CGSize
  if(widthRatio > heightRatio) {
    newSize = CGSize(width: uiimage.size.width * heightRatio, height: uiimage.size.height * heightRatio)
  } else {
    newSize = CGSize(width: uiimage.size.width * widthRatio, height: uiimage.size.height * widthRatio)
  }
      
  let rect = CGRect(origin: .zero, size: newSize)
      
  UIGraphicsBeginImageContextWithOptions(newSize, false, 1.0)
  uiimage.draw(in: rect)

  let newImage = UIGraphicsGetImageFromCurrentImageContext()
    UIGraphicsEndImageContext()

  return newImage
}

相关问题