json 从COCO数据集注解文件中提取注解

eqqqjvef 于 2023-05-30 发布在其他

关注(0)|答案(2)|浏览(265)

我想在COCO数据集的一个子集上训练。对于图像，我已经创建了一个包含train2017文件夹的前30k图像的文件夹。现在我需要在一个单独的json文件中对这些30k图像（从instances_train2017.json中提取）进行注解，以便我可以训练它。
我该怎么做？

JSON

来源：https://stackoverflow.com/questions/69722538/extract-annotations-from-coco-dataset-annotation-file

2条答案

按热度按时间

93ze6v8z1#

没有简单的方法，因为所有注解的图像都在一个长的JSON文件中。我正在开发Python包，它可以帮助完成数据集准备任务，包括这个任务。
我在这个笔记本https://github.com/pylabel-project/samples/blob/main/coco_extract_subset.ipynb中创建了一个可重复的示例。您可以使用this link直接在Google Colab中打开它。
该软件包通常是这样工作的：

from pylabel import importer
dataset = importer.ImportCoco(path_to_annotations)
#Now the annotations are stored in a dataframe 
#that you can query and manipulate like any other pandas dataframe
#In this case we filter the dataframe to images in a list of images 
dataset.df = dataset.df[dataset.df.img_filename.isin(files)].reset_index()
dataset.export.ExportToCoco()

希望对你有用。请让我知道如果你有任何反馈。

赞(0）回复(0）举报 2023-05-30

k5hmc34c2#

初步说明：

COCO数据集主要是JSON文件，其中包含图像的路径和这些图像的注解。因此，如果你想拆分你的数据集，你不需要将你的图像移动到单独的文件夹，但你应该拆分JSON文件中包含的记录。从头开始并不简单，因为记录在JSON文件中具有内部依赖性。好消息是有一个名为COCOHelper的软件包可以帮助您轻松完成此任务！

快速解决方案：

您可以使用COCOHelper将COCO数据集拆分为与其自己的注解相关联的子集。它是如此简单：

ch = COCOHelper.load_json(annotations_file, img_dir=image_dir)
splitter = ProportionalDataSplitter(70, 10, 20)  # split dataset as 70-10-20% of images
ch_train, ch_val, ch_test = splitter.apply(ch)
ch_train.write_annotations_file(fname)

一个完整的工作示例：

导入+设置路径：

from pathlib import Path
from cocohelper import COCOHelper
from cocohelper.splitters.proportional import ProportionalDataSplitter

root_dir = Path('/data/robotics/oil_line_detection')
annotations_dir = root_dir / 'annotations'
annotations_file = annotations_dir / 'coco.json'
image_dir = ""

创建一个cocohelper对象，它代表你的COCO数据集：

print(f"Loading dataset: {annotations_file}")
ch = COCOHelper.load_json(annotations_file, img_dir=image_dir)

拆分数据集（例如使用随机分割数据的比例数据分割器）：

splitter = ProportionalDataSplitter(70, 10, 20)
ch_train, ch_val, ch_test = splitter.apply(ch)
dest_dir = Path("./result")  # where to save the JSON files with annotations on the subset of images

for ch, ch_name in zip([ch_train, ch_val, ch_test], ["train", "val", "test"]):
    print(f"Saving dataset: '{ch_name}'")
    fname = dest_dir / f"{ch_name}.json"
    ch.write_annotations_file(fname)

更多的例子和细节here.

赞(0）回复(0）举报 2023-05-30

我来回答

json 从COCO数据集注解文件中提取注解

2条答案

初步说明：

快速解决方案：

一个完整的工作示例：

相关问题

热门标签

最新问答