keras 无法从flow_from_dataframe进行训练获得意外的类数

0kjbasz6  于 2023-04-12  发布在  其他
关注(0)|答案(3)|浏览(105)

我将在标签在csv文件中的图像集上训练一个模型。因此,我使用flow_from_dataframe from tf.keras并指定参数,但当涉及到class_mode时,它显示错误并显示Found 3662 validated image filenames belonging to 1 classes.-对于稀疏和分类。这是多类分类。”
“最初标签是int,所以我将其转换为字符串,然后我得到了这个输出。”

df_train=pd.read_csv(r"../input/train.csv",delimiter=',')
df_test=pd.read_csv(r"../input/test.csv",delimiter=',')
print(df_train.head())
print(df_test.head())
df_train['id_code']=df_train['id_code']+'.png'
df_train['diagnosis']=str(df_train['diagnosis'])
df_test['id_code']=df_test['id_code']+'.png'

""" output is
        id_code  diagnosis
0  000c1434d8d7          2
1  001639a390f0          4
2  0024cdab0c1e          1
3  002c21358ce6          0
4  005b95c28852          0
        id_code
0  0005cfc8afb6
1  003f0afdcd15
2  006efc72b638
3  00836aaacf06
4  009245722fa4
"""

train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

TRAINING_DIR='../input/train_images'

train_generator= train_datagen.flow_from_dataframe(
    dataframe=df_train,
    directory=TRAINING_DIR,
    x_col='id_code',
    y_col='diagnosis',
    batch_size=20,
    target_size=(1050,1050),
    class_mode='categorical'#used also sparsed
)

""" output is
Found 3662 validated image filenames belonging to 1 classes.
"""

“我希望得到"Found 3662 validated image filenames belonging to 5 classes"的输出,但实际输出是"Found 3662 validated image filenames belonging to 1 classes"

uoifb46i

uoifb46i1#

“sparse”类模式需要整数值,“categorical”需要类列的一个热编码向量。所以我会尝试:

df['diagnosis'] = df['diagnosis'].astype(str)

然后使用“稀疏”类模式。

train_generator= train_datagen.flow_from_dataframe(
    dataframe=df_train,
    directory=TRAINING_DIR,
    x_col='id_code',
    y_col='diagnosis',
    batch_size=20,
    target_size=(1050,1050),
    class_mode='sparse'
)

或者或者你可以使用one hot encoding,像这样:

pd.get_dummies(df,prefix=['diagnosis'], drop_first=True)

然后使用“categorical”class_mode:

train_generator= train_datagen.flow_from_dataframe(
    dataframe=df_train,
    directory=TRAINING_DIR,
    x_col='id_code',
    y_col=df.columns[1:],
    batch_size=20,
    target_size=(1050,1050),
    class_mode='categorical'
)
4dbbbstv

4dbbbstv2#

感谢@Simon Delecourt,我得到了这个问题的答案。
我以前

df_train['diagnosis']=df_train['diagnosis'].astype(str)

在将列diagnosis的数据类型转换为str时

sczxawaw

sczxawaw3#

我不确定一次热编码是否有帮助......当class_mode设置为“categorical”时,flow_from_dataframe()会自动对y_col中指定的目标变量执行一次热编码。

相关问题