PyTorch -正确的方式来计算GPU上的损失?

cld4siwp  于 2023-01-26  发布在  其他
关注(0)|答案(1)|浏览(159)

使用PyTorch CUDA处理损失值的正确方法是什么?例如:
1.我应该将损失值存储在GPU中吗?
1.如何将损失值移动到GPU?
1.如何更新GPU上的损失值?

内部__初始化__():

self.device = torch.device('cuda')
self.model = self.model.to(device)
self.total_loss = torch.Tensor([0]).to(device)

对于每个批次:

self.loss1 = torch.Tensor(y_true - y_pred)
self.loss2 = 0.5 # some other loss
self.total_loss = self.loss1 + self.loss2
self.total_loss.backward()
b4wnujal

b4wnujal1#

TL;DR无论如何,您的损失可能是GPU。

你需要把所有用来计算损失的数据手动放到GPU上,主要包括模型输入和地面真实数据输出,通常你使用数据加载器加载这些数据,然后把它们移到GPU上,就像本PyTorch教程中演示的那样。
现在,对于您的情况,让我们看看当不将Tensor移动到GPU时会发生什么,还让我们看看哪些Tensor已经在GPU上。

import torch

# this is just for demo
model = torch.nn.Linear(2, 1)
x = torch.zeros((1, 2))
y_true = torch.ones((1, 1))

# Here the interesting stuff starts...
# We can not query the model for its device directly, 
# but in this case we can check the weight matrix.
model.weight.device  # prints device(type="cpu") => model is on CPU
x.device  # => CPU
y_true.device  # => CPU

# lets move the model to GPU
model.to("cuda")
model.weight.device  # device(type="cuda", index=0) => GPU

# what happens if we now input x into the model?
x.device  # still CPU
model(x)  # throws error: Expected all tensors to be on the same device...

# so we clearly need to move x to GPU too
x = x.to("cuda")  # note that model.to modifies the model, but Tensor.to returns a new tensor. The old tensor remains on CPU.
x.device  # GPU
y_pred = model(x)  # no error this time

# now on which device is y_pred sitting?
y_pred.device  # GPU
# perhaps unsurprisingly, we computed y_pred using an input that
# is on GPU and a model on GPU and we got a tensor on GPU

# now you can figure what happens when you compute a loss using 
# a model on GPU and inputs on GPU.

回到你的例子

# I continue to use the model from above that is already on GPU
# I will also continue to use y_pred
total_loss = torch.tensor([0]).to("cuda")  # actually this doesn't matter

loss1 = y_true - y_pred  # whops, y_true is still on GPU
y_true = y_true.to("cuda")
loss1 = y_true - y_pred  # you subtract two tensors, you get a tensor, no need to create a new one
loss1.device  # GPU

# what happens if we waste resources and create a new tensor?
torch.tensor(y_true - y_pred).device  # we get a warning, but the result is still on GPU

loss2 = 0.5  # clearly not a tensor, but could be
total_loss = loss1 + loss2  # Here, we set total_loss to a new value, so no need to initialise it above
total_loss.device  # GPU

所以没有必要把你的损失转移到GPU上,如果你处理得当,它已经放在GPU上了。

相关问题