我目前正在调试一个基于GAN的图像到图像转换模型,该模型基于CycleGAN,或者更具体地说是DeepPhotoEnhancer。查看编写训练循环的示例,一些示例(例如官方Tensorflow教程)使用单独的优化器用于A-to-B和B-to-A生成器,而我在各种GitHub存储库中发现的其他示例使用单个优化器用于A-to-B和B-to-A生成器。
我的问题是:我是否对两个发电机网络使用单独的优化器或单一的优化器有关系吗?为什么/为什么不?
每个生成器网络的优化器示例(来自官方Tensorflow教程):
@tf.function
def train_step(real_x, real_y):
# persistent is set to True because the tape is used more than
# once to calculate the gradients.
with tf.GradientTape(persistent=True) as tape:
# Generator G translates X -> Y
# Generator F translates Y -> X.
fake_y = generator_g(real_x, training=True)
cycled_x = generator_f(fake_y, training=True)
fake_x = generator_f(real_y, training=True)
cycled_y = generator_g(fake_x, training=True)
# same_x and same_y are used for identity loss.
same_x = generator_f(real_x, training=True)
same_y = generator_g(real_y, training=True)
disc_real_x = discriminator_x(real_x, training=True)
disc_real_y = discriminator_y(real_y, training=True)
disc_fake_x = discriminator_x(fake_x, training=True)
disc_fake_y = discriminator_y(fake_y, training=True)
# calculate the loss
gen_g_loss = generator_loss(disc_fake_y)
gen_f_loss = generator_loss(disc_fake_x)
total_cycle_loss = calc_cycle_loss(real_x, cycled_x) + calc_cycle_loss(real_y, cycled_y)
# Total generator loss = adversarial loss + cycle loss
total_gen_g_loss = gen_g_loss + total_cycle_loss + identity_loss(real_y, same_y)
total_gen_f_loss = gen_f_loss + total_cycle_loss + identity_loss(real_x, same_x)
disc_x_loss = discriminator_loss(disc_real_x, disc_fake_x)
disc_y_loss = discriminator_loss(disc_real_y, disc_fake_y)
# Calculate the gradients for generator and discriminator
generator_g_gradients = tape.gradient(total_gen_g_loss, generator_g.trainable_variables)
generator_f_gradients = tape.gradient(total_gen_f_loss, generator_f.trainable_variables)
discriminator_x_gradients = tape.gradient(disc_x_loss, discriminator_x.trainable_variables)
discriminator_y_gradients = tape.gradient(disc_y_loss, discriminator_y.trainable_variables)
# Apply the gradients to the optimizer
generator_g_optimizer.apply_gradients(zip(generator_g_gradients, generator_g.trainable_variables))
generator_f_optimizer.apply_gradients(zip(generator_f_gradients, generator_f.trainable_variables))
discriminator_x_optimizer.apply_gradients(zip(discriminator_x_gradients, discriminator_x.trainable_variables))
discriminator_y_optimizer.apply_gradients(zip(discriminator_y_gradients, discriminator_y.trainable_variables))
两个发电机网络的单个优化器示例(来自LynnHo on GitHub):
@tf.function
def train_G(A, B):
with tf.GradientTape() as t:
A2B = G_A2B(A, training=True)
B2A = G_B2A(B, training=True)
A2B2A = G_B2A(A2B, training=True)
B2A2B = G_A2B(B2A, training=True)
A2A = G_B2A(A, training=True)
B2B = G_A2B(B, training=True)
A2B_d_logits = D_B(A2B, training=True)
B2A_d_logits = D_A(B2A, training=True)
A2B_g_loss = g_loss_fn(A2B_d_logits)
B2A_g_loss = g_loss_fn(B2A_d_logits)
A2B2A_cycle_loss = cycle_loss_fn(A, A2B2A)
B2A2B_cycle_loss = cycle_loss_fn(B, B2A2B)
A2A_id_loss = identity_loss_fn(A, A2A)
B2B_id_loss = identity_loss_fn(B, B2B)
G_loss = (A2B_g_loss + B2A_g_loss) + (A2B2A_cycle_loss + B2A2B_cycle_loss) * args.cycle_loss_weight + (A2A_id_loss + B2B_id_loss) * args.identity_loss_weight
G_grad = t.gradient(G_loss, G_A2B.trainable_variables + G_B2A.trainable_variables)
G_optimizer.apply_gradients(zip(G_grad, G_A2B.trainable_variables + G_B2A.trainable_variables))
return A2B, B2A, {'A2B_g_loss': A2B_g_loss,
'B2A_g_loss': B2A_g_loss,
'A2B2A_cycle_loss': A2B2A_cycle_loss,
'B2A2B_cycle_loss': B2A2B_cycle_loss,
'A2A_id_loss': A2A_id_loss,
'B2B_id_loss': B2B_id_loss}
1条答案
按热度按时间j91ykkif1#
我不是Tensorflow的用户,但也许它们应该类似于PyTorch。根据官方的PyTorch CycleGAN repo,使用单个优化器或使用两个不同的优化器之间没有区别,但如果使用单个优化器,只是简化了代码。
没有区别的原因是因为即使你使用单个优化器,参数也是独立更新的。优化器所做的只是更新它的跟踪,相应的权重。这个想法将帮助你理解为什么没有区别。
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues?q=is%3Aissue+optimizer+is%3Aclosed
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/177
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/1381
祝你好运