Paddle Use parallel executor and reader op, reset when pass finish, will cause GPU memory insufficient

zxlwwiss  于 2021-11-30  发布在  Java
关注(0)|答案(2)|浏览(254)

some code snippet:

data_file_handle = fluid.layers.open_files(
            filenames=file_node_list,
            shapes=[[-1] + dshape, (-1, 1)],
            lod_levels=[0, 0],
            dtypes=["float32", "int64"],
            thread_num=args.gpus,
            pass_num=1)
...
for pass_id in range(pass_count):
    while True:
        try:
                loss, = exe.run([avg_loss.name])
        except fluid.core.EOFException as eof:
                break
    data_file_handle.reset()
        ....
W0714 02:28:37.348963   153 system_allocator.cc:102] Cannot malloc 19595.8 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0.9
W0714 02:28:37.349356   153 malloc.cc:119] Cannot allocate 102760448 bytes in GPU 2, available 3108044800 bytes
W0714 02:28:37.349367   153 malloc.cc:121] total 24032378880
W0714 02:28:37.349370   153 malloc.cc:122] GpuMinChunkSize 256
W0714 02:28:37.386611   153 system_allocator.cc:102] Cannot malloc 19595.8 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0.9
W0714 02:28:37.386898   153 malloc.cc:119] Cannot allocate 102760448 bytes in GPU 6, available 3108044800 bytes
W0714 02:28:37.386906   153 malloc.cc:121] total 24032378880
W0714 02:28:37.386910   153 malloc.cc:122] GpuMinChunkSize 256
W0714 02:28:37.422869   153 system_allocator.cc:102] Cannot malloc 19595.8 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0.9
W0714 02:28:37.423153   153 malloc.cc:119] Cannot allocate 102760448 bytes in GPU 1, available 3108044800 bytes
W0714 02:28:37.423161   153 malloc.cc:121] total 24032378880
W0714 02:28:37.423164   153 malloc.cc:122] GpuMinChunkSize 256
W0714 02:28:37.456121   153 system_allocator.cc:102] Cannot malloc 19595.8 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0.9
W0714 02:28:37.456400   153 malloc.cc:119] Cannot allocate 102760448 bytes in GPU 0, available 3108044800 bytes
W0714 02:28:37.456409   153 malloc.cc:121] total 24032378880
W0714 02:28:37.456413   153 malloc.cc:122] GpuMinChunkSize 256
W0714 02:28:37.489141   153 system_allocator.cc:102] Cannot malloc 19595.8 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0.9
W0714 02:28:37.489362   153 malloc.cc:119] Cannot allocate 102760448 bytes in GPU 5, available 3108044800 bytes
W0714 02:28:37.489369   153 malloc.cc:121] total 24032378880
W0714 02:28:37.489372   153 malloc.cc:122] GpuMinChunkSize 256
aamkag61

aamkag611#

这种写法是期望每跑完一个batch都reset一下吗?

相关问题