下面是我的自定义LR调度程序,它子类化了tensorflow.keras.optimizers.schedules.LearningRateSchedule,得到了错误TypeError: Cannot convert -0.5 to EagerTensor of dtype int64
。真的很困惑为什么Eagertensor与这个自定义类的返回调用的简单平方反比计算有关。
class lr_schedule(tensorflow.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, dim_embed, warmup_steps):
self.dim_embed = dim_embed
self.warmup_steps = warmup_steps
def __call__(self, step):
return (self.dim_embed ** -0.5) * min((step ** -0.5), step * (self.warmup_steps ** -1.5))
与此错误不特别相关,但这是一个自定义LR调度程序,它复制了在“Attention is All You Need”论文中使用的预热调度程序。
编辑:下面是我的简短可复制代码:
x_train = np.random.normal(size=(32, 512, 512))
batch_size = 32
H, W = x_train.shape
rows, cols = np.indices((H, W), sparse=True)
padding_mask_init = np.zeros((H, W, W), dtype=np.bool_)
padding_mask_init[rows, 1:, cols] = 1
padding_mask = padding_mask_init[:batch_size]
embed_dim = 512
dense_dim = 2048
num_heads = 2
shape = (batch_size, embed_dim, ghost_dim=1) #(32, 512, 512)
decoder_inputs = layers.Input(batch_input_shape=shape, dtype=tensorflow.float16)
mha_1 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
mha_2 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
layernorm_1 = layers.LayerNormalization()
Z = decoder_inputs
Z = mha_1(query=Z, value=Z, key=Z, use_causal_mask=True, attention_mask=padding_mask)
Z = layernorm_1(Z + decoder_inputs)
Z = mha_2(query=Z, value=decoder_inputs, key=decoder_inputs, attention_mask=padding_mask)
outputs = layers.TimeDistributed(keras.layers.Dense(embed_dim, activation="softmax"))(Z)
model = keras.Model(decoder_inputs, outputs)
model.compile(loss="mean_squared_error", optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule(embed_dim, 3000),beta_1=0.9,beta_2=0.98,epsilon=1.0e-9), metrics=["accuracy"])
history = model.fit(dataset, epochs=200, validation_data=val_dataset)
1条答案
按热度按时间bvuwiixz1#
我昨天才看到这个。这是一个类型强制问题,因为传入
__call__
的step
的值是int64,所以数学上是将所有内容转换为int64。对于您的特定情况,这可能会解决它: