pytorch LSTM:反向传递时,在for循环中计算MSELoss返回NAN

0pizxfdo  于 2022-12-23  发布在  其他
关注(0)|答案(1)|浏览(330)

我是LSTM的新手,遇到了一个问题。我试图用7个特征以4个时间步长预测一个变量。我正在使用PyTorch。

数据

从我的初始 Dataframe (traindf),我创建了每个特征和目标(Y)的Tensor:

featureX_train = torch.tensor(traindf.featureX[:test].values).view(-1, 4, 1)
Y_train = torch.tensor(traindf.Y[:test].values).view(-1, 4, 1)
...
featureX_test = torch.tensor(traindf.featureX[test:].values).view(-1, 4, 1)
Y_test = torch.tensor(traindf.Y[test:].values).view(-1, 4, 1)

我将所有特征Tensor连接到一个X_train和一个X_test中,所有Tensor都是float32:
x一个一个一个一个x一个一个二个x
最后,我得到了一个训练和测试数据集:

train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

预览我的数据:

print(train_dataset[0])
print(test_dataset[0])
(tensor([[ 7909.0000,  8094.0000,  9119.0000,  8666.0000, 17599.0000, 13657.0000,
         10158.0000],
        [ 7909.0000,  8073.0000,  9119.0000,  8636.0000, 17609.0000, 13975.0000,
         10109.0000],
        [ 7939.5000,  8083.5000,  9166.5000,  8659.5000, 18124.5000, 13971.0000,
         10142.0000],
        [ 7951.0000,  8064.0000,  9201.0000,  8663.0000, 17985.0000, 13967.0000,
         10076.0000]]), tensor([[41.],
        [41.],
        [41.],
        [41.]]))
(tensor([[ 8411.0000,  8530.0000,  9439.0000,  9101.0000, 17368.0000, 14174.0000,
         11111.0000],
        [ 8460.0000,  8651.5000,  9579.5000,  9355.5000, 17402.0000, 14509.0000,
         11474.5000],
        [ 8436.0000,  8617.0000,  9579.0000,  9343.0000, 17318.0000, 14288.0000,
         11404.0000],
        [ 8519.0000,  8655.0000,  9580.0000,  9348.0000, 17566.0000, 14640.0000,
         11404.0000]]), tensor([[59.],
        [59.],
        [59.],
        [59.]]))

应用LSTM模型

我的LSTM模型:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size)
        self.linear = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        x, _ = self.lstm(x)
        # x = self.linear(x[:, -1, :])
        x = self.linear(x)
        return x

model = LSTMModel(input_size=7, hidden_size=32, output_size=1)

loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
  
model.train()

当我尝试:

for X, Y in train_loader:
    optimizer.zero_grad()
    
    Y_pred = model(X)
    
    loss = loss_fn(Y_pred, Y)

print(loss)

我得到(我假设正确)Loss: tensor(1318.9419, grad_fn=<MseLossBackward0>)
但是,当我运行时:

for X, Y in train_loader:
    optimizer.zero_grad()
    
    Y_pred = model(X)

    loss = loss_fn(Y_pred, Y)
    
    # Now apply backward pass
    loss.backward()
    
    optimizer.step()

print(loss)

我得到:tensor(nan, grad_fn=<MseLossBackward0>)

尝试正常化

我试过将数据标准化:

mean = X.mean()
    std = X.std()
    X_normalized = (X - mean) / std

    Y_pred = model(X_normalized)

但是它产生了相同的结果。为什么我在这样的循环中应用了loss.backward()之后产生了'nan'?我该如何解决这个问题?提前感谢!

s4chpxco

s4chpxco1#

我的X_train包含很少的nan值。通过移除具有nan值的矩阵,我解决了这个问题:

mask = torch.isnan(X_train).any(dim=1).any(dim=1)
X_train = X_train[~mask]

# Do the same for Y_train as it needs to be the same size
Y_train = Y_train[~mask]

# Create the TensorDataset for the training set
train_dataset = TensorDataset(X_train, Y_train)

相关问题