在我上的一门课上,教授给了我们两个数据集,一个是301个晚型星系,另一个是301个早型星系,我们在Keras中建立了一个模型,这样它就可以区分它们:
input_img = Input(shape=(128,128,3))
x = Conv2D(filters = 16, kernel_size= (3,3), strides = (1,1), activation='relu', padding = 'same')(input_img)
x = MaxPooling2D((2,2),padding = 'same')(x)
x = Conv2D(filters = 32, kernel_size= (3,3), strides = (1,1), activation='relu', padding = 'same')(x)
x = MaxPooling2D((2,2),padding = 'same')(x)
x = Conv2D(filters = 64, kernel_size= (3,3), strides = (1,1), activation='relu', padding = 'same')(x)
x = MaxPooling2D((2,2),padding = 'same')(x)
x = Flatten()(x)
x = Dense(32, activation = 'relu')(x)
x = Dropout(0.3)(x)
x = Dense(16, activation = 'relu')(x)
out = Dense(1, activation = 'sigmoid')(x)
model = Model(inputs = input_img, outputs = out)
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
history = model.fit(X_train, Y_train, batch_size = 32, epochs = 20)
因为我喜欢Julia胜过Python,所以我尝试在Flux.jl中构建相同的模型,根据我在Flux Docs中读到的内容,这就是Flux模型的样子:
model2 = Chain(
Conv((3, 3), 3 => 16, relu, pad=SamePad(), stride=(1, 1)),
MaxPool((2,2), pad=SamePad()),
Conv((3, 3), 16 => 32, relu, pad=SamePad(), stride=(1, 1)),
MaxPool((2,2), pad=SamePad()),
Conv((3, 3), 32 => 64, relu, pad=SamePad(), stride=(1, 1)),
MaxPool((2,2), pad=SamePad()),
Flux.flatten,
Dense(16384 => 32, relu),
Dense(32 => 16, relu),
Dense(16 => 1),
sigmoid
)
但是当我在我认为相同的条件下训练模型时,我得到了非常不同的结果。在Keras中,20个Epoch后的最终丢失是loss: 0.0267
,而在Flux中,30个Epoch后的丢失是0.4082335f0
,所以我不知道这种丢失的差异来自哪里,因为我在两个模型中使用相同的批量大小,并且数据处理是相同的(我认为)。Python:
X1 = np.load('/home/luis/Descargas/cosmo-late.npy')
X2 = np.load('/home/luis/Descargas/cosmo-early.npy')
X = np.concatenate((X1,X2), axis = 0).astype(np.float32)/256.0
Y = np.zeros(X.shape[0])
Y[0:len(X1)] = 1
rand_ind = np.arange(0,X.shape[0])
np.random.shuffle(rand_ind)
X = X[rand_ind]
Y = Y[rand_ind]
X_train = X[50:]
Y_train = Y[50:]
X_test = X[0:50]
Y_test = Y[0:50]
朱莉娅:
X1 = npzread("./Descargas/cosmo-late.npy")
X2 = npzread("./Descargas/cosmo-early.npy")
X = cat(X1,X2,dims=1)
X = Float32.(X)./256
Y = zeros(1,size(X)[1])
Y[1,1:length(X1[:,1,1,1])] .= 1
ind = collect(1:length(Y[1,:]))
shuffle!(ind)
X = X[ind,:,:,:]
Y = Y[:,ind]
X_train = X[51:length(X[:,1,1,1]),:,:,:]
Y_train = Y[:,51:length(Y)]
X_test = X[1:50,:,:,:]
Y_test = Y[:,1:50]
X_train = permutedims(X_train, (2, 3, 4, 1))
X_test = permutedims(X_test, (2, 3, 4, 1))
Julia的训练是这样的:
train_set = Flux.DataLoader((X_train, Y_train), batchsize=32)
loss(x, y) = Flux.logitbinarycrossentropy(x, y)
opt = Flux.setup(Adam(), model2)
loss_history = Float32[]
for epoch = 1:30
Flux.train!(model2, train_set, opt) do m,x,y
err = loss(m(x), y)
ChainRules.ignore_derivatives() do
push!(loss_history, err)
end
return err
end
end
谁能帮帮我,我想不出来。
1条答案
按热度按时间sq1bmfud1#
基于我关于在使用
logitbinarycrossentropy
时跳过sigmoid
的评论,我快速测试了一些 scrapy 数据,并且在您当前的实现中,我也以0.5左右的损失结束,而在删除sigmoid
之后,我达到了更低的值。您也可以选择保留
sigmoid
并使用binarycrossentropy
来代替,尽管似乎在数值上不稳定,所以最好使用logitbinarycrossentropy
。