我从here下载了完整的COCO数据集。我需要修改instances_train2017.json
文件以获得以下内容:
- 修改注解文件,使其将整个训练数据集平均划分为所有类的
1/3
- 意思是假设我有100个
class_1
的图像/注解,所以我希望修改后的注解文件在json文件中保存图像对象/dict的100/3
我已经写了这样的代码,但它需要太多的时间和结果/修改的文件是错误的:
import json
from collections import defaultdict
import random
# Path to your local COCO-format JSON annotation file
original_annotation_file = 'coco/annotations/instances_train2017.json'
output_annotation_file = 'evenly.json'
# Load the local COCO-format dataset from your JSON file
with open(original_annotation_file, 'r') as f:
coco_data = json.load(f)
class_counts = defaultdict(int)
target_class_counts = defaultdict(int)
# Calculate the target count for each class
for ann in coco_data['annotations']:
class_id = ann['category_id']
class_counts[class_id] += 1
for class_id, count in class_counts.items():
target_class_counts[class_id] = count // 3
# Create a list to hold the selected annotations
selected_annotations = []
# Iterate through the annotations and select the subset
for ann in coco_data['annotations']:
class_id = ann['category_id']
# Only include this annotation if we haven't reached the target count for this class
if class_counts[class_id] <= target_class_counts[class_id]:
selected_annotations.append(ann)
# Update the count for this class
class_counts[class_id] += 1
# Create a new COCO-format JSON data structure for the subset
subset_data = {
'info': coco_data['info'],
'licenses': coco_data['licenses'],
'categories': coco_data['categories'],
'images': coco_data['images'],
'annotations': selected_annotations
}
# Shuffle the selected annotations to mix up the classes if desired
random.shuffle(subset_data['annotations'])
# Write the subset data to a new JSON file
with open(output_annotation_file, 'w') as f:
json.dump(subset_data, f)
1条答案
按热度按时间k7fdbhmy1#
我相信如果在将示例附加到
selected_annotations
之后减少target_class_counts[class_id]
,就可以解决问题。所以替换下面的代码,它应该可以解决你的问题。但还是会很慢。我建议创建一个列表字典,其中键是
category_id
,值是标记为该键的annotations
数据现在,您可以迭代
data
并打乱每个列表,然后使用切片来获取您不想要的数据部分。