我正在尝试使用自定义梯度编写自定义损失函数。虽然我尚未实现梯度,但Tensorflow在处理损失函数的输出时遇到困难(是因为形状吗?)。以下是错误消息:tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [1,3], In[1]: [64,2] [Op:MatMul]
以下是不完整的“培训”循环:
def main():
inputs = tf.keras.Input(shape=(2,))
x1 = tf.keras.layers.Dense(64, activation="relu")(inputs)
x2 = tf.keras.layers.Dense(64, activation="relu")(x1)
outputs = tf.keras.layers.Dense(2)(x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="pulse_model")
# Input: lists of 2 floats
# Output: lists of 2 complex numbers
data = gen_hadamard_data(10)
# Arbitrary batch size
data = data.batch(batch_size=3)
epochs = 2
for epoch in range(epochs):
print(f"\nStart of Epoch {epoch}")
for step, (x_batch_train, y_batch_train) in enumerate(data):
print(f"{x_batch_train=}")
print(f"{y_batch_train=}")
with tf.GradientTape() as tape:
logits = model(tf.constant(x_batch_train.numpy().tolist()), training=True)
loss_fn = make_fidelity_cost(x_batch_train)
loss_value = loss_fn(logits, y_batch_train)
grads = tape.gradient(loss_value, model.trainable_weights)
print(f"Prediction: {logits}")
print(f"Loss value: {loss_value}")
print(f"Gradients: {grads}")
下面是损失函数:
def make_fidelity_cost(initial_states, backend=FakeArmonk()):
@tf.custom_gradient
def fidelity_cost(y_pred, y_actual):
fidelity_list = []
for in_state, pred, actual in zip(initial_states.numpy(),
y_pred.numpy(),
y_actual.numpy()):
init_state = [np.cos(in_state[0] / 2),
np.exp(in_state[1] * 1.j) * np.sin(in_state[0] / 2)]
job = run_gaussian(duration=16,
amp=pred[0],
sigma=pred[1],
init_state=init_state,
backend=backend)
result = job.result()
sv = result.get_statevector()
actual_sv = Statevector(actual.tolist() + [0])
# This is the actual calculation that gets returned as the loss
# state_fidelity returns a scalar
fidelity_list.append(state_fidelity(sv, actual_sv))
def grad(upstream):
# Don't know what I need to do here quite yet
print(f"{upstream=}")
return upstream, upstream
return tf.Variable([fidelity_list]), grad
return fidelity_cost
一些注意事项:
- 我以前也发布过这方面的文章,但意识到它实际上是不可读的,所以我把它简化为正在发生的事情的基本内容
- 主要的损失输出来自
state_fidelity
,这里的输出是一个标量,它被附加到一个列表中,然后这个列表作为返回值被传递给tf.constant - 虽然for循环中的代码可能来自不太常见的库,但唯一重要的一行是状态保真度附加到保真度列表的位置
- 我不确定函数所需的矩阵大小,因此如果有人也能教我,我将不胜感激
1条答案
按热度按时间qaxu7uf21#
我试图按照你提供的代码,发现有一些我做的概念,真实的的部分和想象的部分可以单独处理和矢量化。
It is also using blur and contrast application