tensorflow GPU 最大池化梯度操作尚无确定性的 XLA 实现,

w8rqjzmb 于 4个月前发布在其他

关注(0)|答案(8)|浏览(61)

问题类型

特性请求

你是否在TensorFlow Nightly版本中复现了这个bug?

是的

源代码

二进制文件

TensorFlow版本

tf 2.16

自定义代码

是的

OS平台和发行版

Linux Ubuntu 22.04

移动设备

无响应*

Python版本

3.9.19

Bazel版本

无响应*

GCC/编译器版本

无响应*

CUDA/cuDNN版本

12.4/8.9.7.29

GPU型号和内存

NVIDIA GeForce RTX 3090

当前行为？

当设置TF确定性时，会在MaxPooling2D()处抛出运行时异常。

重现问题的独立代码

When TF deterministic was set, runtime exception was thrown at MaxPooling2D().

相关日志输出

Traceback (most recent call last):
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-3dda39ff370e>", line 1, in <module>
    runfile('/mnt/projects/Projects/Test_Classification/train_model.py', wdir='/mnt/projects/Projects/Test_Classification')
  File "/opt/pycharm-community-2024.1/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/opt/pycharm-community-2024.1/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/mnt/projects/Projects/Test_Classification/train_model.py", line 956, in <module>
    history = model.fit(x_train, y_train,
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/ws/miniconda3/envs/tf216/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node gradient_tape/functional_1_1/max_pooling2d_4_1/MaxPool2d/MaxPoolGrad defined at (most recent call last):
<stack traces unavailable>
GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
	 [[{{node gradient_tape/functional_1_1/max_pooling2d_4_1/MaxPool2d/MaxPoolGrad}}]]
	tf2xla conversion failed while converting __inference_one_step_on_data_13588[]. Run with TF_DUMP_GRAPH_PREFIX=/path/to/dump/dir and --vmodule=xla_compiler=2 to obtain a dump of the compiled functions.
	 [[StatefulPartitionedCall]] [Op:__inference_one_step_on_iterator_14045]

tensorflow

来源：https://github.com/tensorflow/tensorflow/issues/69417

8条答案

按热度按时间

8wigbo561#

@wx0608,

请问您能否分享一个可复现的代码/Colab片段，以支持您的陈述，从而使问题更容易理解？谢谢！

赞(0）回复(0）举报 4个月前

nukf8bse2#

这个问题已经过期，因为它已经开放了7天，没有活动。如果没有进一步的活动发生，它将被关闭。谢谢。

赞(0）回复(0）举报 5个月前

njthzxwz3#

我正在使用带有最近 Backbone 网络的DeepLabV3+,当我想要获得可重复的结果时，遇到了相同的错误。我也放了这些代码：

SEED=123
tf.keras.utils.set_random_seed(SEED)
os.environ['PYTHONHASHSEED'] = str(SEED)
random.seed(SEED)
tf.random.set_seed(SEED)
np.random.seed(SEED)
os.environ['TF_DETERMINISTIC_OPS'] = '1'
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
tf.config.experimental.enable_op_determinism()

从这里我知道了TF_CUDNN_DETERMINISTIC。当设置为'true'或'1'时，这会为tf.nn.max_pool*d和tf.keras.layers.MaxPool*D选择确定性梯度算法，但仍然遇到这个错误：GPU MaxPool gradient ops do not yet have a deterministic XLA implementation。

之前我在上采样层中也遇到了同样的问题，我们应该使用双线性插值而不是最近邻插值，但是在使用最大池化层时，我不知道应该做哪些更改。请留下任何可能有帮助的信息。

赞(0）回复(0）举报 5个月前

w7t8yxp54#

wx0608 发现任何问题吗？

赞(0）回复(0）举报 5个月前

bnl4lu3b5#

这个问题已经过期，因为它已经开放了7天，没有活动。如果没有进一步的活动发生，它将被关闭。谢谢。

赞(0）回复(0）举报 5个月前

carvr3hs6#

Same here. I guess, as enable_op_determinism() documentation states:

Certain ops will raise an `UnimplementedError` because they do not yet have a
  deterministic implementation. Additionally, due to bugs, some ops might be
  nondeterministic and not raise an `UnimplementedError`. If you encounter such
  ops, please [file an issue](https://github.com/tensorflow/tensorflow/issues).

Can you please implement MaxPool2D deterministic?
AveragePooling2D layer does not have that problem. My Setting:

tf.config.experimental.enable_op_determinism()

to make code deterministic, causes:

tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
         [[{{node gradient_tape/test-Actor/enc_max_2/MaxPool/MaxPoolGrad}}]] [Op:__inference__train_10443]

using that Tensorflow code:

def encoder_block(x, filters, pool_size):
            x = conv_block(x, filters)
            p = KL.MaxPooling2D(pool_size, name=self._unique_name('enc_max'))(x)
            return x, p

tensorflow: 2.11.0
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__9_20:32:38_PDT_2021
Cuda compilation tools, release 11.3, V11.3.122
Build cuda_11.3.r11.3/compiler.30059648_0

赞(0）回复(0）举报 5个月前

wyyhbhjk7#

看起来在禁用XLA时，MaxPooling2D的错误消失了：

@TF.function(jit_compile=False)
 def _train()

赞(0）回复(0）举报 5个月前

rsl1atfo8#

这个问题已经过期，因为它已经开放了7天，没有活动。如果没有进一步的活动发生，它将被关闭。谢谢。

赞(0）回复(0）举报 5个月前

我来回答

tensorflow GPU 最大池化梯度操作尚无确定性的 XLA 实现,

问题类型

你是否在TensorFlow Nightly版本中复现了这个bug?

源代码

TensorFlow版本

自定义代码

OS平台和发行版

移动设备

Python版本

Bazel版本

GCC/编译器版本

CUDA/cuDNN版本

GPU型号和内存

当前行为？

重现问题的独立代码

相关日志输出

8条答案

相关问题

热门标签

最新问答