Pytorch-Scarf包运行时错误：预期所有Tensor都在同一个设备上，但发现至少有两个设备，CUDA：0和CPU

bnl4lu3b 于 2023-06-23 发布在其他

关注(0)|答案(1)|浏览(178)

RuntimeError：预期所有Tensor都在同一个设备上，但发现至少有两个设备，cuda：0和cpu！
当我运行this github repo中的示例notebook时，就会出现这种情况。
下面是代码：

batch_size = 128

epochs = 1000  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)

model = SCARF(  input_dim=train_ds.shape[1],  emb_dim=16,  corruption_rate=0.6,  ).to(device)  optimizer = Adam(model.parameters(), lr=0.001)  ntxent_loss = NTXent()

loss_history = []

for epoch in range(1, epochs + 1):  epoch_loss = train_epoch(model, ntxent_loss, train_loader, optimizer, device, epoch)  loss_history.append(epoch_loss)

这里是确切的错误：

RuntimeError Traceback (most recent call last)  Cell In [7], line 7  4 loss_history = []  6 for epoch in range(1, epochs + 1):  ----> 7 epoch_loss = train_epoch(model, ntxent_loss, train_loader, optimizer, device, epoch)  8 loss_history.append(epoch_loss)

File ~/pytorch-scarf/example/../example/utils.py:23, in train_epoch(model, criterion, train_loader, optimizer, device, epoch)  20 emb_anchor, emb_positive = model(anchor, positive)  22 # compute loss  ---> 23 loss = criterion(emb_anchor, emb_positive)  24 loss.backward()  26 # update model weights

File /opt/tljh/user/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)  1126 # If we don't have any hooks, we want to skip the rest of the logic in  1127 # this function, and just call forward.  1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks  1129 or _global_forward_hooks or _global_forward_pre_hooks):  -> 1130 return forward_call(*input, **kwargs)  1131 # Do not call functions when jit is used  1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/pytorch-scarf/example/../scarf/loss.py:39, in NTXent.forward(self, z_i, z_j)  37 mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float()  38 numerator = torch.exp(positives / self.temperature)  ---> 39 denominator = mask * torch.exp(similarity / self.temperature)  41 all_losses = -torch.log(numerator / torch.sum(denominator, dim=1))  42 loss = torch.sum(all_losses) / (2 * batch_size)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

当我在只有CPU的机器上运行代码时，我不会得到同样的错误。由于数据是如何创建的，我无法确认它是什么Tensor类型（也许这就是问题所在）。我已经确认了emb_锚和emb_positive在传入criterion（）之前都是cuda（正如this post所建议的那样，这是一个可能的解决方案）

pytorch

来源：https://stackoverflow.com/questions/76434294/pytorch-scarf-package-runtimeerror-expected-all-tensors-to-be-on-the-same-devic

1条答案

按热度按时间

laik7k3q1#

问题出现在scarf/loss.py文件中。您应该替换以下行：

mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float()

与

mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float().to(z_i.device)

作者忘记将遮罩Tensor移动到z_i.device

赞(0）回复(0）举报 2023-06-23

我来回答

Pytorch-Scarf包运行时错误：预期所有Tensor都在同一个设备上，但发现至少有两个设备，CUDA：0和CPU

这里是确切的错误：

1条答案

相关问题

热门标签

最新问答