如何在Pytorch中正确重置GPU内存?

hrysbysz  于 2023-04-06  发布在  其他
关注(0)|答案(1)|浏览(199)

交叉发布自:https://discuss.pytorch.org/t/how-to-reset-gpu-memory-usage-when-figuring-out-max-batch-size/171348 .
我有一个函数,它可以执行二进制搜索,以找到模型可以支持给定GPU的最大批量大小。下面是日志的示例:

Batch size 1 succeeded. Increasing to 2...
Batch size 2 succeeded. Increasing to 4...
Batch size 4 succeeded. Increasing to 8...
Batch size 8 succeeded. Increasing to 16...
Batch size 16 succeeded. Increasing to 32...
Batch size 32 succeeded. Increasing to 64...
Batch size 64 succeeded. Increasing to 128...
Batch size 128 succeeded. Increasing to 256...
Batch size 256 succeeded. Increasing to 512...
Batch size 512 failed. Binary searching...
# We start with the bounds as (256 - 50) to 512 to detect the bug
# detailed later in this post
Batch size 359 failed. New bounds: [206, 359]
Batch size 282 failed. New bounds: [206, 282]
Batch size 244 failed. New bounds: [206, 244]
Batch size 225 failed. New bounds: [206, 225]
Batch size 215 failed. New bounds: [206, 215]
Batch size 210 failed. New bounds: [206, 210]
Batch size 208 failed. New bounds: [206, 208]
Batch size 207 failed. New bounds: [206, 207]

但是,请注意这些日志中的一些奇怪之处:最初,批量大小为256的日志成功了,但是,在执行二进制搜索时,我们看到较小的批量大小随后失败了。
我认为这表明我这边有某种bug,GPU内存没有被正确回收。
现在,在每次调用执行向前/向后传递的函数之前,我调用:torch.cuda.empty_cache(),但我认为这可能还不够。
要重置GPU内存状态,还应该执行哪些操作?
参考代码:

def binary_search_batch_size(cfg: Settings):
    def _is_cuda_oom(e: RuntimeError):
        """Determines if error is CUDA Out of Memory and if adaptive_grad_accum is enabled."""
        return 'CUDA out of memory' in str(e)

    cfg = Settings.parse_obj(cfg)
    model = ModelWrapper(model=cfg.model.model(), out_dim=cfg.model.output_dim).cuda()
    optimizer = cfg.optimizer.create_optimizer(model)

    def run():
        with torch.cuda.amp.autocast(enabled=True):
            torch.cuda.empty_cache()
            run_batch(model, optimizer, shape)

    batch_size = 1
    max_batch_size = 1
    while True:
        try:
            batch_size = max_batch_size
            shape = (batch_size, 3, 224, 224)
            run()
            max_batch_size *= 2
            print(f"Batch size {batch_size} succeeded. Increasing to {max_batch_size}...")
        except RuntimeError as e:
            if not _is_cuda_oom(e):
                raise e
            print(f"Batch size {batch_size} failed. Binary searching...")
            # the 50 acts as a bullshit check to make sure we haven't regressed somehow
            low = batch_size // 2 - 50
            high = batch_size
            while low + 1 < high:
                batch_size = (low + high) // 2
                shape = (batch_size, 3, 224, 224)
                try:
                    run()
                    low = batch_size
                    print(f"Batch size {batch_size} succeeded. New bounds: [{low}, {high}]")
                except RuntimeError as e:
                    if not _is_cuda_oom(e):
                        raise e
                    high = batch_size
                    print(f"Batch size {batch_size} failed. New bounds: [{low}, {high}]")
            max_batch_size = low
            break
    return max_batch_size

def run_batch(model, optimizer, shape):
    batch = dict(
        aug1=torch.randint(0, 256, shape, dtype=torch.uint8).cuda(),
        aug2=torch.randint(0, 256, shape, dtype=torch.uint8).cuda(),
    )
    
    image_logits1, image_logits2 = model(batch)
    loss_val = loss(image_logits1, image_logits2, 0)
    optimizer.zero_grad()
    loss_val.backward()
    optimizer.step()
mwecs4sa

mwecs4sa1#

我不确定,但看起来你的代码每次都启动一个新的tf会话。如果是这样,你应该在开始下一个会话之前清除每个会话中的数据。这段代码可以做到这一点。

from numba import cuda 

def clear_GPU(gpu_index):
    cuda.select_device(gpu_index) 
    cuda.close()

安装numba(“pip install numba”)...最后我尝试conda给我的问题,所以使用pip.这真的是一个方便,numba乡亲已经采取了麻烦,以正确执行一些低级别的CUDA方法,并避免副作用,所以我想你可以做同样的,如果你有时间.

相关问题