1.问题现象:
两种写法都是更新tobj,为什么方法1(gather_nd+scatter_nd_add)耗时约8ms, 方法2(切片赋值)耗时约35ms。
2.问题示例代码如下:
import paddle
import datetime
tobj = paddle.load("test_tobj")
mask = paddle.load("test_mask")
score_iou = paddle.load("test_score_iou")
t_indices = paddle.load("test_t_indices")
gr = 1.0
print("tobj shape: {}, mask shape: {}".format(tobj.shape, mask.shape))
#方法1
def fun1(tobj, mask, score_iou,gr):
start_time = datetime.datetime.now()
with paddle.no_grad():
x = paddle.gather_nd(tobj, mask)
tobj = paddle.scatter_nd_add(
tobj, mask, (1.0 - gr) + gr * score_iou - x)
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print("gather_nd + scatter_nd Elapsed time: ", elapsed_time.microseconds/1000, "ms")
return tobj
#方法2
def fun2(tobj,score_iou, t_indices,gr):
b, a, gj, gi = t_indices # image, anchor, gridy, gridx
start_time = datetime.datetime.now()
with paddle.no_grad():
tobj[b, a, gj, gi] = (1.0 - gr) + gr * score_iou # iou ratio
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print("set_value Elapsed time: ", elapsed_time.microseconds/1000, "ms")
return tobj
tobj1 = fun1(tobj, mask, score_iou,gr)
tobj2 = fun2(tobj,score_iou, t_indices,gr)
测试结果:
2条答案
按热度按时间wko9yo5t1#
可能slice性能有问题 建议在这里也贴一下反馈 https://github.com/PaddlePaddle/Paddle/issues
falq053o2#
你好,这个暂时是已知的slice性能问题。之前改回旧版也就是方法2的写法是为了确保高精度,是因为发现在大量数据集不加载预训练也就是从整体随机初始化开始训的时候,方法1写法精度后期会比方法2写法低1个点mAP,loss曲线后期有分叉现象。但小数据集(比如1w张以内)训,或者finetune coco权重训去别的垂类或下游任务数据集,两种写法精度基本一致,就可以使用更快速的方法1写法。会尽快推进这个slice性能问题的解决。另外推荐可以试试 PPYOLOE+ 和 RT-DETR。