PyTorch -正确的方式来计算GPU上的损失?

cld4siwp  于 2023-01-26  发布在  其他
关注(0)|答案(1)|浏览(188)

使用PyTorch CUDA处理损失值的正确方法是什么?例如:
1.我应该将损失值存储在GPU中吗?
1.如何将损失值移动到GPU?
1.如何更新GPU上的损失值?

内部__初始化__():

  1. self.device = torch.device('cuda')
  2. self.model = self.model.to(device)
  3. self.total_loss = torch.Tensor([0]).to(device)

对于每个批次:

  1. self.loss1 = torch.Tensor(y_true - y_pred)
  2. self.loss2 = 0.5 # some other loss
  3. self.total_loss = self.loss1 + self.loss2
  4. self.total_loss.backward()
b4wnujal

b4wnujal1#

TL;DR无论如何,您的损失可能是GPU。

你需要把所有用来计算损失的数据手动放到GPU上,主要包括模型输入和地面真实数据输出,通常你使用数据加载器加载这些数据,然后把它们移到GPU上,就像本PyTorch教程中演示的那样。
现在,对于您的情况,让我们看看当不将Tensor移动到GPU时会发生什么,还让我们看看哪些Tensor已经在GPU上。

  1. import torch
  2. # this is just for demo
  3. model = torch.nn.Linear(2, 1)
  4. x = torch.zeros((1, 2))
  5. y_true = torch.ones((1, 1))
  6. # Here the interesting stuff starts...
  7. # We can not query the model for its device directly,
  8. # but in this case we can check the weight matrix.
  9. model.weight.device # prints device(type="cpu") => model is on CPU
  10. x.device # => CPU
  11. y_true.device # => CPU
  12. # lets move the model to GPU
  13. model.to("cuda")
  14. model.weight.device # device(type="cuda", index=0) => GPU
  15. # what happens if we now input x into the model?
  16. x.device # still CPU
  17. model(x) # throws error: Expected all tensors to be on the same device...
  18. # so we clearly need to move x to GPU too
  19. x = x.to("cuda") # note that model.to modifies the model, but Tensor.to returns a new tensor. The old tensor remains on CPU.
  20. x.device # GPU
  21. y_pred = model(x) # no error this time
  22. # now on which device is y_pred sitting?
  23. y_pred.device # GPU
  24. # perhaps unsurprisingly, we computed y_pred using an input that
  25. # is on GPU and a model on GPU and we got a tensor on GPU
  26. # now you can figure what happens when you compute a loss using
  27. # a model on GPU and inputs on GPU.

回到你的例子

  1. # I continue to use the model from above that is already on GPU
  2. # I will also continue to use y_pred
  3. total_loss = torch.tensor([0]).to("cuda") # actually this doesn't matter
  4. loss1 = y_true - y_pred # whops, y_true is still on GPU
  5. y_true = y_true.to("cuda")
  6. loss1 = y_true - y_pred # you subtract two tensors, you get a tensor, no need to create a new one
  7. loss1.device # GPU
  8. # what happens if we waste resources and create a new tensor?
  9. torch.tensor(y_true - y_pred).device # we get a warning, but the result is still on GPU
  10. loss2 = 0.5 # clearly not a tensor, but could be
  11. total_loss = loss1 + loss2 # Here, we set total_loss to a new value, so no need to initialise it above
  12. total_loss.device # GPU

所以没有必要把你的损失转移到GPU上,如果你处理得当,它已经放在GPU上了。

展开查看全部

相关问题