陈明威:
if is_classify:
pyreader = fluid.layers.py_reader(
capacity=50,
shapes=[[-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],
[-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],
[-1, args.max_seq_len, 1], [-1,args.paragram_conut,args.paragram_max_len, 1],
[-1, 1], [-1, 1]],
dtypes=[ 'int64', 'int64', 'int64', 'int64', 'float32', 'int64', 'int64', 'int64'
],
lod_levels=[0, 0, 0, 0, 0, 1, 0, 0],
name=task_name + "_" + pyreader_name,
use_double_buffer=True)
陈明威:
这个是官方提供的代码,我就加了 [-1,args.paragram_conut,args.paragram_max_len, 1]倒数第三个
陈明威:
(src_ids, sent_ids, pos_ids, task_ids, input_mask, paragraph, labels,
qids) = fluid.layers.read_file(pyreader)
陈明威:
数据处理部分是def _pad_batch_records(self, batch_records,paragram_conut):
"""change data type to model"""
batch_token_ids = [record.token_ids for record in batch_records]
batch_text_type_ids = [record.text_type_ids for record in batch_records]
batch_position_ids = [record.position_ids for record in batch_records]
batch_labels = [record.label_id for record in batch_records]
batch_contents_ids = [record.contens_ids for record in batch_records]
if self.is_classify:
batch_labels = np.array(batch_labels).astype("int64").reshape([-1, 1])
elif self.is_regression:
batch_labels = np.array(batch_labels).astype("float32").reshape([-1, 1])
if batch_records[0].qid or batch_records[0].qid == 0:
batch_qids = [record.qid for record in batch_records]
batch_qids = np.array(batch_qids).astype("int64").reshape([-1, 1])
else:
batch_qids = np.array([]).astype("int64").reshape([-1, 1])
# padding
padded_token_ids, input_mask = pad_batch_data(
batch_token_ids, pad_idx=self.pad_id, return_input_mask=True)
padded_text_type_ids = pad_batch_data(
batch_text_type_ids, pad_idx=self.pad_id)
padded_position_ids = pad_batch_data(
batch_position_ids, pad_idx=self.pad_id)
padded_contents_ids = pad_batch_content_data(
batch_contents_ids,paragram_conut=paragram_conut, pad_idx=self.pad_id)
padded_task_ids = np.ones_like(padded_token_ids, dtype="int64") * self.task_id
# print( np.max(padded_contents_ids), np.min(padded_contents_ids),'111111111111111111111111')
if padded_contents_ids.shape[0]<16:
print(padded_contents_ids.shape,'111111111111111111111111111111111')
# print(padded_token_ids.shape)
# if np.max(padded_contents_ids)>500 or np.min(padded_contents_ids)<0:
#
# print('11111111111111111111111111111111111111111111111')
label_tensor = fluid.LoDTensor()
label_tensor.set(padded_contents_ids, fluid.CPUPlace())
return_list = [ padded_token_ids, padded_text_type_ids, pa
陈明威:
其中padded_contents_ids = pad_batch_content_data(
batch_contents_ids,paragram_conut=paragram_conut, pad_idx=self.pad_id)
陈明威:
是我添加的,padded_contents_ids 在这个地方最大值也就476,数值类型都是int64
陈明威:
读取用的是train_pyreader.decorate_tensor_provider(train_data_generator)
陈明威:
然后最后读出来的数据竟然id变得很大
陈明威:
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
Exception: /paddle/paddle/fluid/operators/lookup_table_op.cu:36 Assertion id < N
failed (received id: 4140473109978529412).
17条答案
按热度按时间dly7yett1#
我也遇到类似的问题,请问后来解决了吗?
xu3bshqb2#
Tensor[read_file_0.tmp_7]
shape: [16,256,1,]
dtype: l
data: 296,404,139,337,1452,78,374,123,327,1653,856,77,432,195,4,1512,304,777,52,171,709,900,441,108,5,566,399,2831,1199,15,16,332,303,777,8,4,321,65,303,777,8,165,86,91,448,111,41,299,263,806,
fluid.layers.Print(data, message="The content of input layer:")
打印的id都没问题了 paddle 是1.5版本的
b0zn9rqh3#
@chenmingwei00 好的,谢谢,我这边先仔细看一下。
btxsgosb4#
好的您看一下,我再贴一下ernie的代码,这边没报错,但是两个代码应该是一样的
ernie代码:
padding id in vocabulary must be set to 0
src_ids打印结果为:
src_ids The content of input layer: The place is:CUDAPlace(0)
Tensor[read_file_0.tmp_0]
shape: [16,140,1,]
dtype: l
data: 1,456,537,958,392,28,725,212,1114,2,29,246,11,456,537,958,392,1051,699,157,357,845,556,588,36,
我的代码:
fluid.layers.Print(data, message="The content of input layer:")
打印结果为:
Tensor[read_file_0.tmp_0]
shape: [16,140,1,]
dtype: l
data: 1,456,537,958,392,28,725,212,1114,2,29,246,11,456,537,958,392,1051,699,157,357,845,556,588,36,201,699,4,16,698,212,29,374,704,32,337,798,12,20,8,119,797,40,725,212,135,214,4,32,51,29,246,
34gzjxbg5#
我更改了一部代码,增加了一个lstm,下边的是lstm的代码feed
h79rfbju6#
难道emb_out = fluid.layers.embedding(
input=data,
size=[self._voc_size, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(
name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
这个id最后必须加0????
kqqjbcuj7#
要不我把代码给您您看看
u2nhd7ah8#
rnn_out, last_h, last_c = layers.lstm(emb, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob)
发现应付该试着了的错误,直接只有feed_ding
wh6knrhe9#
layers.lstm(ernie.emb_out_paragraph,
init_h,
init_c,
args.max_seq_len,
ernie_config['hidden_size'],
1,
dropout_prob=0.2)
所有的batch_size的seq_length必须是一样的吗
2nbm6dog10#
@chenmingwei00 看起来像是你的输入数据的shape格式,导致了后面计算的问题。既然你是按照ernie模型的逻辑写的,建议到ernie repo下提issue会得到更加有针对性的回答。当然我这边也会帮你继续看这个问题。
Ernie repo链接:https://github.com/PaddlePaddle/ERNIE/issues
osh3o9ms11#
参考的是百度观点阅读理解代码更改https://aistudio.baidu.com/aistudio/projectdetail/247636
qcuzuvrc12#
没有人回答呀
deyfvvtc13#
@chenmingwei00 请问一下,为什么要修改这个shape呢?
7d7tgy0s14#
@chenmingwei00 请问一下,为什么要修改这个shape呢?
您指的是哪个shape
sqyvllje15#
@chenmingwei00 就是您自己添加的这个shape。
这个是官方提供的代码,我就加了 [-1,args.paragram_conut,args.paragram_max_len, 1]倒数第三个