当运行tensorflow.python.ops.gen_nn_ops.max_pool_grad_with_argmax时发生崩溃,

qkf9rpyu 于 6个月前发布在 Python

关注(0)|答案(6)|浏览(174)

问题类型

Bug

你是否在TF nightly版本中复现了这个bug?

否

来源

source

Tensorflow版本

2.11.0

自定义代码

是

OS平台和发行版

无响应*

移动设备

22.04

Python版本

3.9

Bazel版本

无响应*

GCC/编译器版本

无响应*

CUDA/cuDNN版本

Cuda编译工具，版本11.5,V11.5.119

GPU型号和内存大小

无响应*

当前行为？

When .max_pool_grad_with_argmax is given negative integer tensor, it crashes.

重现问题的独立代码

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import gen_nn_ops
try:
  try:
    with tf.device('/CPU'):
      arg_0_tensor = tf.constant(-105687333925307, shape=[2, 3, 3, 1], dtype=tf.float32,)
      arg_0 = tf.identity(arg_0_tensor)
      arg_1_tensor = tf.random.uniform([2, 2, 2, 1], dtype=tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_2_tensor = tf.random.uniform([2, 2, 2, 1], minval=-256, maxval=257, dtype=tf.int64)
      arg_2 = tf.identity(arg_2_tensor)
      ksize_0 = 1
      ksize_1 = 2
      ksize_2 = 2
      ksize_3 = 1
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides_0 = 1
      strides_1 = 1
      strides_2 = 1
      strides_3 = 1
      strides = [strides_0,strides_1,strides_2,strides_3,]
      padding = "VALID"
      include_batch_in_index = False
      out = gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
  try:
    with tf.device('/GPU:0'):
      arg_0 = tf.identity(arg_0_tensor)
      arg_0 = tf.cast(arg_0, tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_1 = tf.cast(arg_1, tf.float32)
      arg_2 = tf.identity(arg_2_tensor)
      arg_2 = tf.cast(arg_2, tf.int64)
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides = [strides_0,strides_1,strides_2,strides_3,]
      gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
except Exception as e:
  print("Error:"+str(e))

6条答案

按热度按时间

rryofs0p1#

你好@nimashiri!
感谢你在gen_nn_ops.max_pool_grad_with_argmax上报告了这个bug。
@SuryanarayanaY !
你能看一下这个问题吗？附件中的gist是2.10,版本号为2.11和nightly版本。
谢谢！

赞(0）回复(0）举报 6个月前

cl25kdpy2#

你好@nimashiri,

在CPU运行时和运行时日志中观察到的这种行为，首先是错误：E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected,然后是警告：F tensorflow/core/kernels/maxpooling_op.cc:1076] Check failed: grad_out_index >= output_start && grad_out_index < output_end Invalid output gradient index: 120, 0, 18。

查看错误信息，它试图检查GPU,因此我尝试在GPU运行时使用相同的代码，在2.11v和nightly中也没有观察到崩溃。

需要检查为什么这个操作仅支持GPU。你能分享你的想法吗？

赞(0）回复(0）举报 6个月前

dwbf0jvd3#

你好@nimashiri,

在CPU运行时和运行时日志中观察到的上述行为，我首先观察到了一个错误：E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected ,然后是警告：F tensorflow/core/kernels/maxpooling_op.cc:1076] Check failed: grad_out_index >= output_start && grad_out_index < output_end Invalid output gradient index: 120, 0, 18 。查看错误后，它试图检查GPU,因此我尝试在GPU运行时使用相同的代码，在2.11v和nightly中也没有观察到崩溃。

需要检查为什么这个操作仅支持GPU。你能分享你的想法吗？

遗憾的是，在这方面我没有任何想法。

赞(0）回复(0）举报 6个月前

5ssjco0h4#

这个：

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import gen_nn_ops
try:
  try:
    with tf.device('/CPU'):
      arg_0_tensor = tf.random.uniform([2, 3, 3, 1], dtype=tf.float32)
      arg_0 = tf.identity(arg_0_tensor)
      arg_1_tensor = tf.random.uniform([2, 2, 2, 1], dtype=tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_2_tensor = tf.random.uniform([2, 2, 2, 1], minval=-256, maxval=257, dtype=tf.int64)
      arg_2 = tf.identity(arg_2_tensor)
      ksize_0 = 1
      ksize_1 = 2
      ksize_2 = 2
      ksize_3 = 1
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides_0 = 1
      strides_1 = 1
      strides_2 = 1
      strides_3 = 1
      strides = [strides_0,strides_1,strides_2,strides_3,]
      padding = "VALID"
      include_batch_in_index = False
      out = gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
  try:
    with tf.device('/GPU:0'):
      arg_0 = tf.identity(arg_0_tensor)
      arg_0 = tf.cast(arg_0, tf.float32)
      arg_1 = tf.identity(arg_1_tensor)
      arg_1 = tf.cast(arg_1, tf.float32)
      arg_2 = tf.identity(arg_2_tensor)
      arg_2 = tf.cast(arg_2, tf.int64)
      ksize = [ksize_0,ksize_1,ksize_2,ksize_3,]
      strides = [strides_0,strides_1,strides_2,strides_3,]
      gen_nn_ops.max_pool_grad_with_argmax(arg_0,arg_1,arg_2,ksize=ksize,strides=strides,padding=padding,include_batch_in_index=include_batch_in_index,)
  except Exception as e:
    print("Error:"+str(e))
except Exception as e:
  print("Error:"+str(e))

赞(0）回复(0）举报 6个月前

ki1q1bka5#

@learning-to-play

赞(0）回复(0）举报 6个月前

tktrz96b6#

在tf-nightly(2.15.0-dev20231003)中仍然存在一个问题。附上屏幕截图供参考。