Distributed package doesn't have NCCL built in
问题描述:
python在windows环境下dist.init_process_group(backend, rank, world_size)处报错‘RuntimeError: Distributed package doesn’t have NCCL built in’,具体信息如下:
File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 531, in init_process_group
timeout=timeout)
File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 625, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in
原因分析:
windows不支持NCCL backend
解决方案:
在dist.init_process_group语句之前添加backend=‘gloo’,也就是在windows中使用GLOO替代NCCL。
————————————————
版权声明:本文为CSDN博主「StarCap」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/StarCap/article/details/120070425
insightface训练代码:
try:
world_size =1# int(os.environ["WORLD_SIZE"])
rank =0# int(os.environ["RANK"])
# distributed.init_process_group("nccl")
distributed.init_process_group("gloo")
except KeyError:
world_size = 1
rank = 0
distributed.init_process_group(
backend="nccl",
init_method="tcp://127.0.0.1:12584",
rank=rank,
world_size=world_size,
)
版权说明 : 本文为转载文章, 版权归原作者所有 版权申明
原文链接 : https://blog.csdn.net/jacke121/article/details/124652511
内容来源于网络,如有侵权,请联系作者删除!