在带GPU的服务器上运行this code时,我一直收到以下错误:
RuntimeError: CUDA out of memory. Tried to allocate 10.99 GiB (GPU 0; 10.76 GiB total capacity; 707.86 MiB already allocated; 2.61 GiB free; 726.00 MiB reserved in total by PyTorch)
我添加了一个垃圾收集器。我试着将批处理大小设置得非常小(从10000到10),现在错误已更改为:
(main.py:2595652): Gdk-CRITICAL**: 11:16:04.013: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
2022-06-07 11:16:05.909522: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
File "main.py", line 194, in <module>
**psm = psm.cuda()**
File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 637, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 530, in _apply
module._apply(fn)
File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 530, in _apply
module._apply(fn)
File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 552, in _apply
param_applied = fn(param)
File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 637, in <lambda>
return self._apply(lambda t: t.cuda(device))
**RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.**
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
这是PMS的一部分。我复制了它,因为错误行显示psm = psm.cuda()
class PSM(nn.Module):
def __init__(self, n_classes, k, fr, num_feat_map=64, p=0.3, shar_channels=3):
super(PSM, self).__init__()
self.shar_channels = shar_channels
self.num_feat_map = num_feat_map
self.encoder = Encoder(k, fr, num_feat_map, p, shar_channels)
self.decoder = Decoder(n_classes, p)
def __call__(self, x):
return self.forward(x)
def forward(self, x):
encodes = []
outputs = []
for device in x:
encode = self.encoder(device)
outputs.append(self.decoder(encode.cuda()))
encodes.append(encode)
# Add shared channel
shared_encode = torch.mean(torch.stack(encodes), 2).permute(1,0,2).cuda()
outputs.append(self.decoder(shared_encode))
return torch.mean(torch.stack(outputs), 0)
1条答案
按热度按时间s6fujrry1#
这对我很有效:
1.我在终端上运行了
nvidia -smi
,发现GPU不那么忙碌。1.然后,将
torch.cuda.set_device(1)
添加到我的代码中对我很有效,因为设备1不那么忙碌。