我已经使用Francois Chollet提供的模板here在keras中实现了一个Transformer编码器。在我训练模型之后,我使用model.save
保存它,但是当我再次加载它进行推理时,我发现权重似乎又是随机的,因此我的模型失去了所有的推理能力。
我已经在SO和Github上看到了类似的问题,并应用了以下建议,但仍然遇到了同样的问题:
1.在类上使用@tf.keras.utils.register_keras_serializable()
装饰器。
1.确保**kwargs
在init调用中
1.确保自定义层具有get_config
和from_config
方法。
1.使用custom_object_scope
加载模型。
下面是一个最小可重复性的例子来重现这个问题。我如何改变它,使模型权重正确保存?
import numpy as np
from tensorflow import keras
import tensorflow as tf
from tensorflow.keras import layers
from keras.models import load_model
from keras.utils import custom_object_scope
@tf.keras.utils.register_keras_serializable()
class TransformerEncoder(layers.Layer):
def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
super().__init__(**kwargs)
self.embed_dim = embed_dim
self.dense_dim = dense_dim
self.num_heads = num_heads
self.attention = layers.MultiHeadAttention(
num_heads=num_heads, key_dim=embed_dim)
self.dense_proj = keras.Sequential(
[
layers.Dense(dense_dim, activation="relu"),
layers.Dense(embed_dim),
]
)
self.layernorm_1 = layers.LayerNormalization()
self.layernorm_2 = layers.LayerNormalization()
def call(self, inputs, mask=None):
if mask is not None:
mask = mask[:, tf.newaxis, :]
attention_output = self.attention(
inputs, inputs, attention_mask=mask)
proj_input = self.layernorm_1(inputs + attention_output)
proj_output = self.dense_proj(proj_input)
return self.layernorm_2(proj_input + proj_output)
def get_config(self):
config = super().get_config()
config.update({
"embed_dim": self.embed_dim,
"num_heads": self.num_heads,
"dense_dim": self.dense_dim,
})
return config
@classmethod
def from_config(cls, config):
return cls(**config)
# Create simple model:
encoder = TransformerEncoder(embed_dim=2, dense_dim=2, num_heads=1)
inputs = keras.Input(shape=(2, 2), batch_size=None, name="test_inputs")
x = encoder(inputs)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="linear")(x)
model = keras.Model(inputs, outputs)
# Fit the model and save it:
np.random.seed(42)
X = np.random.rand(10, 2, 2)
y = np.ones(10)
model.compile(optimizer=keras.optimizers.Adam(), loss="mean_squared_error")
model.fit(X, y, epochs=2, batch_size=1)
model.save("./test_model")
# Load the saved model:
with custom_object_scope({
'TransformerEncoder': TransformerEncoder
}):
loaded_model = load_model("./test_model")
print(model.weights[0].numpy())
print(loaded_model.weights[0].numpy())
2条答案
按热度按时间rvpgvaaj1#
权重被保存(您可以在加载模型后用
load_weights
加载它们)。问题是您在__init__
中创建了新层。您需要从它们的配置重新创建它们,例如:输出量:
tjvv9vkg2#
the secrete is how it works you can try it with the model.get_weights() but I sample in the layer.get_weight() that is because easiliy see.
Sample: Custom layer with random initial values, result in small of randoms number changed when run it couple of time.
Output: The same layer initialized continues of the call() process
Sample: Re-called every time tell the layer to reset the initial value.
Output: The model.call() result in differnt not continues.
Sample: We included layer-initialized values requirements, suppose to start at the same initial for all actions.
Output: The same results are re-produced every time.
Sample: Implementation, they keep edit/delete it but I am telling how it is implemented, that I event add the gif video.