可以从Keras中的flow_from_directory自动推断class_weight吗?

qoefvg9y  于 2023-04-21  发布在  其他
关注(0)|答案(7)|浏览(146)

我有一个不平衡的多类数据集,我想使用fit_generator中的class_weight参数根据每个类的图像数量为类给予权重。我使用ImageDataGenerator.flow_from_directory从目录中加载数据集。
是否可以直接从ImageDataGenerator对象推断class_weight参数?

68bkxrlz

68bkxrlz1#

我刚刚想到了一个实现这个的方法。

from collections import Counter
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(...)

counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}                     

model.fit_generator(...,
                    class_weight=class_weights)

train_generator.classes是每个图像的类别列表。Counter(train_generator.classes)创建每个类别中图像数量的计数器。
请注意,这些权重可能不利于收敛,但您可以将其用作基于出现次数的其他类型权重的基础。

  • 这个答案的灵感来自:https://github.com/fchollet/keras/issues/1875#issuecomment-273752868*
kadbb459

kadbb4592#

或者,您可以简单地执行以下操作:

from sklearn.utils import class_weight
import numpy as np

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(train_generator.classes), 
                train_generator.classes)

然后,您可以设置(按照上面的注解):

model.fit_generator(..., class_weight=class_weights)
vyswwuz2

vyswwuz23#

我尝试了这两种解决方案,sklearn.utils.class_weight的准确性更好,虽然我不知道为什么,但它们并不都产生相同的类权重。

kqqjbcuj

kqqjbcuj4#

正如文章here中所建议的,分配类权重的一个好方法是用途:

(1 / class_count) * (total_count/2)

因此,稍微修改Fábio Perez上面建议的方法:

counter = Counter(train_generator.classes)
total = float(sum(counter.values()))
class_weight = {class_id : (1/num_images)*(total)/2.0 for class_id, num_images in counter.items()}
4bbkushb

4bbkushb5#

Pasha Dembo建议的代码运行得很好。但是,在插入model_fit生成器之前,您应该在字典中对其进行转换:

from sklearn.utils import class_weight import numpy as np

class_weights = class_weight.compute_class_weight(
           'balanced',
            np.unique(train_generator.classes), 
            train_generator.classes)

train_class_weights = dict(enumerate(class_weights))
model.fit_generator(..., class_weight=train_class_weights)

或者,您可以简单地执行以下操作:

from sklearn.utils import class_weight import numpy as np
 
 class_weights = class_weight.compute_class_weight(
                'balanced',
                 np.unique(train_generator.classes), 
                 train_generator.classes) You can then set (as per comment above):
 
 model.fit_generator(..., class_weight=class_weights)
qgzx9mmu

qgzx9mmu6#

from sklearn.utils import class_weight
import numpy as np
class_weights = dict(zip(np.unique(traingen.classes),class_weight.compute_class_weight(
                        class_weight = 'balanced',
                        classes = np.unique(traingen.classes), 
                        y = traingen.classes)))
643ylb08

643ylb087#

April 2023 version.结束使用此:

from sklearn.utils.class_weight import compute_class_weight

unique_classes = np.unique(ds_train.classes)
# "If ‘balanced’, class weights will be given by n_samples / (n_classes * np.bincount(y))."
class_weights = compute_class_weight("balanced", classes=unique_classes, y=ds_train.classes)
class_weight = {class_id: weight for class_id, weight in zip(unique_classes, class_weights)}

model.fit(..., class_weight=class_weight)

相关问题