Chinese-CLIP Zero-shot分类问题

q9rjltbz 于 9个月前发布在其他

关注(0)|答案(1)|浏览(235)

您好，
CN-clip 是一个很棒的工作！在复现 voc-2007-classification zero-shot 推理的过程中，我发现最终推理的性能与 report 的结果无法对齐。下面是我的执行结果，烦请有空时帮忙看下问题，感谢。

Params:
context_length: 52
datapath: ******/Chinese-CLIP/content/datasets/voc-2007-classification/test
dataset: voc-2007-classification
img_batch_size: 64
index:
label_file: ******/Chinese-CLIP/content/datasets/voc-2007-classification/label_cn.txt
num_workers: 4
precision: amp
resume: ******/Chinese-CLIP/content/pretrained_weights/clip_cn_vit-h-14.pt
save_dir: ******/Chinese-CLIP/eval_result//voc-2007-classification
text_model: RoBERTa-wwm-ext-large-chinese
vision_model: ViT-H-14
Loading vision model config from cn_clip/clip/model_configs/ViT-H-14.json
Loading text model config from cn_clip/clip/model_configs/RoBERTa-wwm-ext-large-chinese.json
Preparing zeroshot dataset.
224
开始从 ******/Chinese-CLIP/content/pretrained_weights/clip_cn_vit-h-14.pt 加载模型检查点。
=> 已加载检查点 ******/Chinese-CLIP/content/pretrained_weights/clip_cn_vit-h-14.pt (epoch 7 @ 40,000 steps)
构建零样本分类器
使用分类器
100%|███████████████████████████████████████████████████████████| 20/20 [00:15<00:00, 1.28it/s]
100%|████████████████| 78/78 [04:04<00:00, 3.13s/it]
torch.Size([4952, 20])
结果：
zeroshot-top1: 0.09268982229402262
完成。
测试数据来自 voc-2007-classification 从 https://github.com/OFA-Sys/Chinese-CLIP/blob/master/zeroshot_dataset.md 处下载。而论文中的性能为

。