Tensorflow：GPU设备之间的内存增长不能不同|如何将多GPU与tensorflow 结合使用

jaql4c8m 于 2022-12-27 发布在其他

关注(0)|答案(2)|浏览(236)

我正尝试在集群中的GPU节点上运行keras代码。GPU节点每个节点有4个GPU。我确保GPU节点中的所有4个GPU都可供我使用。我运行下面的代码让tensorflow使用GPU：

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        print(e)

输出中列出了可用的4个GPU。但是，我在运行代码时得到了以下错误：

Traceback (most recent call last):
  File "/BayesOptimization.py", line 20, in <module>
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/framework/config.py", line 439, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1368, in list_logical_devices
    self.ensure_initialized()
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 511, in ensure_initialized
    config_str = self.config.SerializeToString()
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1015, in config
    gpu_options = self._compute_gpu_options()
  File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1074, in _compute_gpu_options
    raise ValueError("Memory growth cannot differ between GPU devices")
ValueError: Memory growth cannot differ between GPU devices

代码不应该列出所有可用的gpu并为每个gpu设置内存增长吗？
我目前使用的是tensorflow库和python 3.97：

tensorflow                2.4.1           gpu_py39h8236f22_0
tensorflow-base           2.4.1           gpu_py39h29c2da4_0
tensorflow-estimator      2.4.1              pyheb71bc4_0
tensorflow-gpu            2.4.1                h30adc30_0

知道是什么问题以及如何解决吗？提前感谢！

tensorflow

来源：https://stackoverflow.com/questions/71319195/tensorflow-memory-growth-cannot-differ-between-gpu-devices-how-to-use-multi-g

2条答案

按热度按时间

pbwdgjma1#

只尝试：os.environ[“CUDA_VISIBLE_DEVICES”]=“0”，而不是tf.config.experimental.set_memory_growth。这对我来说很有效。

赞(0）回复(0）举报 2022-12-27

vddsk6oq2#

在多GPU设备的情况下，内存增长应在所有可用GPU中保持不变。请为所有GPU设置为true或保持为false。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

Tensorflow GPU文档

赞(0）回复(0）举报 2022-12-27

我来回答

Tensorflow：GPU设备之间的内存增长不能不同|如何将多GPU与tensorflow 结合使用

2条答案

相关问题

热门标签

最新问答