pytorch 使用TorchScript模型并遇到问题:RuntimeError:预期所有Tensor都在同一设备上,但发现至少两个设备,cuda:0和cpu

uelo1irk  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(428)

我在python上训练了一个基于ALEBF的模型,出于整体效率的考虑,我选择在c中进行推理。我在python中选择torch.jit.trace来保存模型,并在c中加载相应的.pt文件。然而,我在模型推理时遇到了标题中的问题。
首先是C++代码:

if (torch::cuda::is_available()) {
    n_model = torch::jit::load("/home/lzh/Storage4/lzh/deepmodel/model_scripted.pt",torch::kCUDA);
    std::cout << torch::cuda::device_count() << std::endl;
} else {

    std::cerr << "No CUDA devices available, cannot move model to GPU." << std::endl;
}
torch::Tensor inputs = torch::from_blob(fre, {1, 4,300, 201}, torch::kFloat).to(torch::kCUDA);
std::cout << inputs.device() << std::endl;
textInput.input_ids.to(torch::kCUDA);
textInput.attention_mask.to(torch::kCUDA);
torch::Tensor out_tensor = n_model.forward({inputs,textInput.input_ids,textInput.attention_mask}).toTensor();

字符串
问题是:

The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models/model_somatic.py", line 14, in forward
    cls_head = self.cls_head
    ALBEF = self.ALBEF
    _0 = (ALBEF).forward(image, input_ids, attention_mask, )
          ~~~~~~~~~~~~~~ <--- HERE
    return (cls_head).forward(_0, )
class ALBEF(Module):
  File "code/__torch__/models/model_somatic.py", line 35, in forward
    _5 = torch.ones([_3, int(_4)], dtype=4, layout=None, device=torch.device("cpu"), pin_memory=False)
    encoder_attention_mask = torch.to(_5, dtype=4, layout=0, device=torch.device("cpu"))
    _6 = (text_encoder).forward(input_ids, attention_mask, _1, encoder_attention_mask, )
          ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _7 = torch.slice(_6, 0, 0, 9223372036854775807)
    input = torch.slice(torch.select(_7, 1, 0), 1, 0, 9223372036854775807)
  File "code/__torch__/models/xbert.py", line 19, in forward
    cls = self.cls
    bert0 = self.bert
    _0 = (bert0).forward(input_ids, attention_mask, argument_3, encoder_attention_mask, )
          ~~~~~~~~~~~~~~ <--- HERE
    _1 = (cls).forward(weight, _0, )
    return _0
  File "code/__torch__/models/xbert.py", line 50, in forward
    _8 = torch.to(encoder_extended_attention_mask, 6)
    attention_mask1 = torch.mul(torch.rsub(_8, 1.), CONSTANTS.c3)
    _9 = (embeddings).forward(input_ids, input, )
          ~~~~~~~~~~~~~~~~~~~ <--- HERE
    _10 = (encoder).forward(_9, attention_mask0, argument_3, attention_mask1, )
    return _10
  File "code/__torch__/models/xbert.py", line 78, in forward
    input0 = torch.slice(_12, 1, 0, _11)
    _13 = (word_embeddings).forward(input_ids, )
    _14 = (token_type_embeddings).forward(input, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    embeddings = torch.add(_13, _14)
    _15 = (position_embeddings).forward(input0, )
  File "code/__torch__/torch/nn/modules/sparse/___torch_mangle_164.py", line 10, in forward
    input: Tensor) -> Tensor:
    weight = self.weight
    return torch.embedding(weight, input)
           ~~~~~~~~~~~~~~~ <--- HERE

Traceback of TorchScript, original code (most recent call last):
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/functional.py(2044): embedding
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/sparse.py(158): forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/lzh/ALBEF/models/xbert.py(207): forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/lzh/ALBEF/models/xbert.py(1046): forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/lzh/ALBEF/models/xbert.py(1400): forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/lzh/ALBEF/models/model_somatic.py(47): forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/lzh/ALBEF/models/model_somatic.py(90): forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/home/lzh/miniconda3/envs/albef/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
/home/lzh/ALBEF/checkpoint.py(46): main
/home/lzh/ALBEF/checkpoint.py(76): <module>
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)


奇怪的是,我在python中加载相应的文件,也遇到了这个问题。

image = torch.rand(16,4,300,201)
text1 =  torch.rand(16,25).long()
text2 = torch.rand(16, 25).long()

traced_script_module = torch.jit.trace(model, (image,text1,text2))
traced_script_module.save('model_scripted.pt')
device=torch.device("cuda:0")
text = torch.ones((1,25))
text = text.long().to(device)
image = torch.ones((1,4,300,201)).to(device)
model = torch.jit.load('model_scripted.pt', map_location=torch.device('cuda'))
model.eval()
for param in model.parameters():
   if param.device.type == 'cuda':
      print('cuda')
print(image.device)
print(text.device)
out = model(image,text,text)


参数的输出是cuda和cuda:0.错误输出和c++一样.我在代码中使用了链接中提到的方法将模型加载到gpu上,但仍然不起作用. text我该怎么办?这个问题困扰了我很长一段时间。

mepcadol

mepcadol1#

我通过首先检查模型代码没有指定创建Tensor的设备来解决这个问题;然后在保存时,通过在保存模型之前将代码放在cuda上来解决这个问题。

model.to(device)
image = torch.rand(1,4,300,201).to(device)
text1 =  torch.rand(1,25).long().to(device)
text2 = torch.rand(1, 25).long().to(device)
traced_script_module = torch.jit.trace(model, (image,text1,text2))

字符串

相关问题