tensorflow 非正常状态:预测时发生GpuLaunchKernel错误,但训练运行顺利

2guxujil  于 2022-12-27  发布在  其他
关注(0)|答案(1)|浏览(166)

I am training a 3D U-net to do multi-label (4 classes) semantic segmentation. Training with model.fit() runs just fine with no errors and I see that the model is learning. However, when I try to run model.predict() I get the following error:

85/85 - 56s
2022-12-22 18:26:24.265485: F tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc:165] Non-OK-status: GpuLaunchKernel( concat_variable_kernel<T, IntType, true>, config.block_count, config.thread_per_block, smem_usage, gpu_device.stream(), input_ptrs, output_scan, static_cast<IntType>(output->dimension(0)), static_cast<IntType>(output->dimension(1)), output->data()) status: Internal: invalid configuration argument
/cm/local/apps/slurm/var/spool/job5510720/slurm_script: line 14:  1945 Aborted

下面是我的代码的简化版:

import tensorflow as tf
from keras.models import Model
from keras.models import load_model
from tensorflow.keras.optimizers import Adam, SGD
from keras.layers import Conv3D, MaxPooling3D, Conv3DTranspose, UpSampling3D, Concatenate

def unet(input_shape,filters,kernel,model_name):

    strides_1 = (1,1,1)
    strides_2 = (2,2,2)
    ins = Input(shape=input_shape,name='input_1')

    encode1a = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same', name='encode1a', strides=strides_1)(x)
    encode1b = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same', name='encode1b', strides=strides_1)(encode1a)
    pool1 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool1')(encode1b)

    encode2a = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same', name='encode2a', strides=strides_1)(pool1)
    encode2b = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same', name='encode2b', strides=strides_1)(encode2a)
    pool2 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool2')(encode2b)

    encode3a = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same', name='encode3a', strides=strides_1)(pool2)
    encode3b = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same', name='encode3b', strides=strides_1)(encode3a)
    pool3 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool3')(encode3b)

    # Bottleneck
    #--------------------------
    bottom_a = Conv3D(filters=8*filters, kernel_size=kernel, activation='relu', padding='same')(pool3)
    bottom_b = Conv3D(filters=8*filters, kernel_size=kernel, activation='relu', padding='same')(bottom_a)

    # Decoding 
    #--------------------------
    up2   = Concatenate(axis=4)([Conv3DTranspose(filters=4*filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(bottom_b), encode3b])
    decode2a = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same',name='decode1a')(up2)
    decode2b = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same',name='decode1b')(decode2a)

    up3   = Concatenate(axis=4)([Conv3DTranspose(filters=2*filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(decode2b), encode2b])
    decode1a = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same',name='decode2a')(up3)
    decode1b = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same',name='decode2b')(decode1a)

    up4   = Concatenate(axis=4)([Conv3DTranspose(filters=filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(decode1b), encode1b])
    decode0a = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same',name='decode3a')(up4)
    decode0b = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same',name='decode3b')(decode0a)

    # Output
    flatten = Convolution3D(filters=4, kernel_size=(1,1,1), activation='softmax')(decode0b)
    model = Model(inputs=ins, outputs=flatten, name=model_name)
    return model

FILTERS = 32
KERNEL = (3,3,3)
MODEL_NAME = 'multi-unet-test'
LR = 3e-3

strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
with strategy.scope():
    model = nets.unet((None,None,None,1),FILTERS,KERNEL,model_name=MODEL_NAME)
    model.compile(optimizer=nets.Adam(lr=LR),loss=tf.keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])
model.summary()

X_train, Y_train = load_dataset_all(FILE_DEN,FILE_MSK,SUBGRID)
# this is a function for loading input and mask fields
# outputs shapes of [256,128,128,128,4]

history = model.fit(X_train, Y_train, batch_size = 4, epochs = 50, verbose = 2, shuffle = True, validation_split = 0.2)
model.save(MODEL_NAME)

# Load and predict 
# this is actually in another script but I'm putting this all in one go:
model = load_model(MODEL_NAME)
model.compile(loss=model.loss,optimizer=model.optimizer,metrics=['accuracy'])
# load test data:
X_test = load_dataset()
Y_test = model.predict(X_test, batch_size = 4, verbose = 2)

在Google上搜索并查看堆栈溢出的其他问题后,人们似乎提出了两种解决方案:调整批量大小,使样本数可以被它整除,并切换到不同版本的TF/CUDA。最初,我的X_test的形状为[343,128,128,128,4],但我砍掉了3个样本,使其成为[340,128,128,128,4],这样它就可以被我的批量大小4整除。
第一个测试使用的是tf 2.4.1版和CUDA 11.6版,我在Colab上用tf 2.9.2版和CUDA 11.2版尝试了相同的代码,得到了相同的错误,所以我怀疑这是问题所在。
任何建议或帮助将不胜感激。如果有任何其他信息我可以提供。
谢谢你!!!

zvms9eto

zvms9eto1#

我也遇到过同样的问题,现在已经解决了。我做了一些改动,错误信息变成了“Split on GPU requires input size〈max int 32”,所以我不太确定是什么问题。我只是想给予你一个我做了改动的列表,也许其中一个会有帮助:

  • 将输入和标签的dtype更改为bool(之前无意中使用了float)
  • 无论如何,我使用批量大小1
  • 我减少了ConvLayers中过滤器的数量
  • 我使用的是tf 2.6 / cudnn 8.2.1 / cudatoolkit 11.3.1

一般来说,我无法理解错误信息(“invalid configuration argument”),但我认为这可能是内存问题?我的模型比你的模型还小,但我们的数组很大(我的输入是128 x128 x128,标签是512 x512 x512)。
希望这能帮上忙。

相关问题