pytorch_lightning Default process group is not initialized

学习nanodet训练自己的数据，

用了pytorch_lightning，默认带多gpu分布式训练。

以下内容转自：https://blog.csdn.net/m0_37568067/article/details/109785209

问题原因：非分布式训练使用了分布式训练的设置

两种解决办法：

1、在tools/train.py 中加入

import torch.distributed as dist

dist.init_process_group('gloo', init_method='file:///temp/somefile', rank=0, world_size=1)

2、在configs/base/models 的文件中，首行norm_cfg = dict(type='SyncBN', requires_grad=True), 'SyncBN'是采用distributed的训练方法，在单GPU non-distributed训练中使用会出现上述错误，

改为type='BN' 即可.
————————————————

我的报错代码：

验证的时候，采集了所有results时，报错了。

nanodet/trainer/task.py

all_results = gather_results(results)

解决方法，改为了：

all_results = results

不用dist收集results。

本机：

F:\XXX\detect\nanodet\nanodet-main

pytorch_lightning Default process group is not initialized

相关文章