我是LSTM的新手,遇到了一个问题。我试图用7个特征以4个时间步长预测一个变量。我正在使用PyTorch。
数据
从我的初始 Dataframe (traindf),我创建了每个特征和目标(Y)的Tensor:
featureX_train = torch.tensor(traindf.featureX[:test].values).view(-1, 4, 1)
Y_train = torch.tensor(traindf.Y[:test].values).view(-1, 4, 1)
...
featureX_test = torch.tensor(traindf.featureX[test:].values).view(-1, 4, 1)
Y_test = torch.tensor(traindf.Y[test:].values).view(-1, 4, 1)
我将所有特征Tensor连接到一个X_train和一个X_test中,所有Tensor都是float32:
x一个一个一个一个x一个一个二个x
最后,我得到了一个训练和测试数据集:
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
预览我的数据:
print(train_dataset[0])
print(test_dataset[0])
(tensor([[ 7909.0000, 8094.0000, 9119.0000, 8666.0000, 17599.0000, 13657.0000,
10158.0000],
[ 7909.0000, 8073.0000, 9119.0000, 8636.0000, 17609.0000, 13975.0000,
10109.0000],
[ 7939.5000, 8083.5000, 9166.5000, 8659.5000, 18124.5000, 13971.0000,
10142.0000],
[ 7951.0000, 8064.0000, 9201.0000, 8663.0000, 17985.0000, 13967.0000,
10076.0000]]), tensor([[41.],
[41.],
[41.],
[41.]]))
(tensor([[ 8411.0000, 8530.0000, 9439.0000, 9101.0000, 17368.0000, 14174.0000,
11111.0000],
[ 8460.0000, 8651.5000, 9579.5000, 9355.5000, 17402.0000, 14509.0000,
11474.5000],
[ 8436.0000, 8617.0000, 9579.0000, 9343.0000, 17318.0000, 14288.0000,
11404.0000],
[ 8519.0000, 8655.0000, 9580.0000, 9348.0000, 17566.0000, 14640.0000,
11404.0000]]), tensor([[59.],
[59.],
[59.],
[59.]]))
应用LSTM模型
我的LSTM模型:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
# x = self.linear(x[:, -1, :])
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=32, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
model.train()
当我尝试:
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
print(loss)
我得到(我假设正确)Loss: tensor(1318.9419, grad_fn=<MseLossBackward0>)
但是,当我运行时:
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
# Now apply backward pass
loss.backward()
optimizer.step()
print(loss)
我得到:tensor(nan, grad_fn=<MseLossBackward0>)
尝试正常化
我试过将数据标准化:
mean = X.mean()
std = X.std()
X_normalized = (X - mean) / std
Y_pred = model(X_normalized)
但是它产生了相同的结果。为什么我在这样的循环中应用了loss.backward()
之后产生了'nan'?我该如何解决这个问题?提前感谢!
1条答案
按热度按时间s4chpxco1#
我的X_train包含很少的nan值。通过移除具有nan值的矩阵,我解决了这个问题: