Paddle 模型报错,求解决呀

omtl5h9j  于 2022-04-21  发布在  Java
关注(0)|答案(17)|浏览(353)

陈明威:
if is_classify:
pyreader = fluid.layers.py_reader(
capacity=50,
shapes=[[-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],
[-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],
[-1, args.max_seq_len, 1], [-1,args.paragram_conut,args.paragram_max_len, 1],
[-1, 1], [-1, 1]],
dtypes=[ 'int64', 'int64', 'int64', 'int64', 'float32', 'int64', 'int64', 'int64'
],
lod_levels=[0, 0, 0, 0, 0, 1, 0, 0],
name=task_name + "_" + pyreader_name,
use_double_buffer=True)

陈明威:
这个是官方提供的代码,我就加了 [-1,args.paragram_conut,args.paragram_max_len, 1]倒数第三个

陈明威:
(src_ids, sent_ids, pos_ids, task_ids, input_mask, paragraph, labels,
qids) = fluid.layers.read_file(pyreader)

陈明威:
数据处理部分是def _pad_batch_records(self, batch_records,paragram_conut):
"""change data type to model"""
batch_token_ids = [record.token_ids for record in batch_records]
batch_text_type_ids = [record.text_type_ids for record in batch_records]
batch_position_ids = [record.position_ids for record in batch_records]
batch_labels = [record.label_id for record in batch_records]
batch_contents_ids = [record.contens_ids for record in batch_records]
if self.is_classify:
batch_labels = np.array(batch_labels).astype("int64").reshape([-1, 1])
elif self.is_regression:
batch_labels = np.array(batch_labels).astype("float32").reshape([-1, 1])

if batch_records[0].qid or batch_records[0].qid == 0:
    batch_qids = [record.qid for record in batch_records]
    batch_qids = np.array(batch_qids).astype("int64").reshape([-1, 1])
else:
    batch_qids = np.array([]).astype("int64").reshape([-1, 1])

# padding

padded_token_ids, input_mask = pad_batch_data(
    batch_token_ids, pad_idx=self.pad_id, return_input_mask=True)
padded_text_type_ids = pad_batch_data(
    batch_text_type_ids, pad_idx=self.pad_id)
padded_position_ids = pad_batch_data(
    batch_position_ids, pad_idx=self.pad_id)
padded_contents_ids = pad_batch_content_data(
    batch_contents_ids,paragram_conut=paragram_conut, pad_idx=self.pad_id)
padded_task_ids = np.ones_like(padded_token_ids, dtype="int64") * self.task_id

# print( np.max(padded_contents_ids), np.min(padded_contents_ids),'111111111111111111111111')

if padded_contents_ids.shape[0]<16:
    print(padded_contents_ids.shape,'111111111111111111111111111111111')

# print(padded_token_ids.shape)

# if np.max(padded_contents_ids)>500 or np.min(padded_contents_ids)<0:

# 

# print('11111111111111111111111111111111111111111111111')

label_tensor = fluid.LoDTensor()
label_tensor.set(padded_contents_ids, fluid.CPUPlace())
return_list = [        padded_token_ids, padded_text_type_ids, pa

陈明威:
其中padded_contents_ids = pad_batch_content_data(
batch_contents_ids,paragram_conut=paragram_conut, pad_idx=self.pad_id)

陈明威:
是我添加的,padded_contents_ids 在这个地方最大值也就476,数值类型都是int64

陈明威:
读取用的是train_pyreader.decorate_tensor_provider(train_data_generator)

陈明威:
然后最后读出来的数据竟然id变得很大

陈明威:
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N failed (received id: 4140473109978529412).

dly7yett

dly7yett1#

我也遇到类似的问题,请问后来解决了吗?

xu3bshqb

xu3bshqb2#

Tensor[read_file_0.tmp_7]
shape: [16,256,1,]
dtype: l
data: 296,404,139,337,1452,78,374,123,327,1653,856,77,432,195,4,1512,304,777,52,171,709,900,441,108,5,566,399,2831,1199,15,16,332,303,777,8,4,321,65,303,777,8,165,86,91,448,111,41,299,263,806,
fluid.layers.Print(data, message="The content of input layer:")

emb_out = fluid.layers.embedding(
        input=data,
        size=[self._voc_size, self._emb_size],
        dtype=self._emb_dtype,
        param_attr=fluid.ParamAttr(
            name=self._word_emb_name, initializer=self._param_initializer),
        is_sparse=False)

打印的id都没问题了 paddle 是1.5版本的

b0zn9rqh

b0zn9rqh3#

@chenmingwei00 好的,谢谢,我这边先仔细看一下。

btxsgosb

btxsgosb4#

好的您看一下,我再贴一下ernie的代码,这边没报错,但是两个代码应该是一样的

ernie代码:

padding id in vocabulary must be set to 0

fluid.layers.Print(src_ids, message="src_ids The content of input layer:")

    emb_out = fluid.layers.embedding(
        input=src_ids,
        size=[self._voc_size, self._emb_size],
        dtype=self._emb_dtype,
        param_attr=fluid.ParamAttr(
            name=self._word_emb_name, initializer=self._param_initializer),
        is_sparse=False)

src_ids打印结果为:
src_ids The content of input layer: The place is:CUDAPlace(0)
Tensor[read_file_0.tmp_0]
shape: [16,140,1,]
dtype: l
data: 1,456,537,958,392,28,725,212,1114,2,29,246,11,456,537,958,392,1051,699,157,357,845,556,588,36,

我的代码:
fluid.layers.Print(data, message="The content of input layer:")

emb_out = fluid.layers.embedding(
        input=data,
        size=[self._voc_size, self._emb_size],
        dtype=self._emb_dtype,
        param_attr=fluid.ParamAttr(
            name=self._word_emb_name, initializer=self._param_initializer),
        is_sparse=False)

打印结果为:
Tensor[read_file_0.tmp_0]
shape: [16,140,1,]
dtype: l
data: 1,456,537,958,392,28,725,212,1114,2,29,246,11,456,537,958,392,1051,699,157,357,845,556,588,36,201,699,4,16,698,212,29,374,704,32,337,798,12,20,8,119,797,40,725,212,135,214,4,32,51,29,246,

34gzjxbg

34gzjxbg5#

我更改了一部代码,增加了一个lstm,下边的是lstm的代码feed

h79rfbju

h79rfbju6#

难道emb_out = fluid.layers.embedding(
input=data,
size=[self._voc_size, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(
name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
这个id最后必须加0????

kqqjbcuj

kqqjbcuj7#

要不我把代码给您您看看

u2nhd7ah

u2nhd7ah8#

rnn_out, last_h, last_c = layers.lstm(emb, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob)
发现应付该试着了的错误,直接只有feed_ding

wh6knrhe

wh6knrhe9#

layers.lstm(ernie.emb_out_paragraph,

init_h,

init_c,

args.max_seq_len,

ernie_config['hidden_size'],

1,

dropout_prob=0.2)

所有的batch_size的seq_length必须是一样的吗

2nbm6dog

2nbm6dog10#

@chenmingwei00 看起来像是你的输入数据的shape格式,导致了后面计算的问题。既然你是按照ernie模型的逻辑写的,建议到ernie repo下提issue会得到更加有针对性的回答。当然我这边也会帮你继续看这个问题。

Ernie repo链接:https://github.com/PaddlePaddle/ERNIE/issues

deyfvvtc

deyfvvtc13#

@chenmingwei00 请问一下,为什么要修改这个shape呢?

7d7tgy0s

7d7tgy0s14#

@chenmingwei00 请问一下,为什么要修改这个shape呢?

您指的是哪个shape

sqyvllje

sqyvllje15#

@chenmingwei00 就是您自己添加的这个shape。
这个是官方提供的代码,我就加了 [-1,args.paragram_conut,args.paragram_max_len, 1]倒数第三个

相关问题