I am training a 3D U-net to do multi-label (4 classes) semantic segmentation. Training with model.fit() runs just fine with no errors and I see that the model is learning. However, when I try to run model.predict() I get the following error:
85/85 - 56s
2022-12-22 18:26:24.265485: F tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc:165] Non-OK-status: GpuLaunchKernel( concat_variable_kernel<T, IntType, true>, config.block_count, config.thread_per_block, smem_usage, gpu_device.stream(), input_ptrs, output_scan, static_cast<IntType>(output->dimension(0)), static_cast<IntType>(output->dimension(1)), output->data()) status: Internal: invalid configuration argument
/cm/local/apps/slurm/var/spool/job5510720/slurm_script: line 14: 1945 Aborted
下面是我的代码的简化版:
import tensorflow as tf
from keras.models import Model
from keras.models import load_model
from tensorflow.keras.optimizers import Adam, SGD
from keras.layers import Conv3D, MaxPooling3D, Conv3DTranspose, UpSampling3D, Concatenate
def unet(input_shape,filters,kernel,model_name):
strides_1 = (1,1,1)
strides_2 = (2,2,2)
ins = Input(shape=input_shape,name='input_1')
encode1a = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same', name='encode1a', strides=strides_1)(x)
encode1b = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same', name='encode1b', strides=strides_1)(encode1a)
pool1 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool1')(encode1b)
encode2a = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same', name='encode2a', strides=strides_1)(pool1)
encode2b = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same', name='encode2b', strides=strides_1)(encode2a)
pool2 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool2')(encode2b)
encode3a = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same', name='encode3a', strides=strides_1)(pool2)
encode3b = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same', name='encode3b', strides=strides_1)(encode3a)
pool3 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool3')(encode3b)
# Bottleneck
#--------------------------
bottom_a = Conv3D(filters=8*filters, kernel_size=kernel, activation='relu', padding='same')(pool3)
bottom_b = Conv3D(filters=8*filters, kernel_size=kernel, activation='relu', padding='same')(bottom_a)
# Decoding
#--------------------------
up2 = Concatenate(axis=4)([Conv3DTranspose(filters=4*filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(bottom_b), encode3b])
decode2a = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same',name='decode1a')(up2)
decode2b = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same',name='decode1b')(decode2a)
up3 = Concatenate(axis=4)([Conv3DTranspose(filters=2*filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(decode2b), encode2b])
decode1a = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same',name='decode2a')(up3)
decode1b = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same',name='decode2b')(decode1a)
up4 = Concatenate(axis=4)([Conv3DTranspose(filters=filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(decode1b), encode1b])
decode0a = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same',name='decode3a')(up4)
decode0b = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same',name='decode3b')(decode0a)
# Output
flatten = Convolution3D(filters=4, kernel_size=(1,1,1), activation='softmax')(decode0b)
model = Model(inputs=ins, outputs=flatten, name=model_name)
return model
FILTERS = 32
KERNEL = (3,3,3)
MODEL_NAME = 'multi-unet-test'
LR = 3e-3
strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
with strategy.scope():
model = nets.unet((None,None,None,1),FILTERS,KERNEL,model_name=MODEL_NAME)
model.compile(optimizer=nets.Adam(lr=LR),loss=tf.keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])
model.summary()
X_train, Y_train = load_dataset_all(FILE_DEN,FILE_MSK,SUBGRID)
# this is a function for loading input and mask fields
# outputs shapes of [256,128,128,128,4]
history = model.fit(X_train, Y_train, batch_size = 4, epochs = 50, verbose = 2, shuffle = True, validation_split = 0.2)
model.save(MODEL_NAME)
# Load and predict
# this is actually in another script but I'm putting this all in one go:
model = load_model(MODEL_NAME)
model.compile(loss=model.loss,optimizer=model.optimizer,metrics=['accuracy'])
# load test data:
X_test = load_dataset()
Y_test = model.predict(X_test, batch_size = 4, verbose = 2)
在Google上搜索并查看堆栈溢出的其他问题后,人们似乎提出了两种解决方案:调整批量大小,使样本数可以被它整除,并切换到不同版本的TF/CUDA。最初,我的X_test的形状为[343,128,128,128,4],但我砍掉了3个样本,使其成为[340,128,128,128,4],这样它就可以被我的批量大小4整除。
第一个测试使用的是tf 2.4.1版和CUDA 11.6版,我在Colab上用tf 2.9.2版和CUDA 11.2版尝试了相同的代码,得到了相同的错误,所以我怀疑这是问题所在。
任何建议或帮助将不胜感激。如果有任何其他信息我可以提供。
谢谢你!!!
1条答案
按热度按时间zvms9eto1#
我也遇到过同样的问题,现在已经解决了。我做了一些改动,错误信息变成了“Split on GPU requires input size〈max int 32”,所以我不太确定是什么问题。我只是想给予你一个我做了改动的列表,也许其中一个会有帮助:
一般来说,我无法理解错误信息(“invalid configuration argument”),但我认为这可能是内存问题?我的模型比你的模型还小,但我们的数组很大(我的输入是128 x128 x128,标签是512 x512 x512)。
希望这能帮上忙。