Paddle 求问为什么切片赋值的性能极差

dz6r00yl  于 4个月前  发布在  其他
关注(0)|答案(2)|浏览(46)

1.问题现象:
两种写法都是更新tobj,为什么方法1(gather_nd+scatter_nd_add)耗时约8ms, 方法2(切片赋值)耗时约35ms。

2.问题示例代码如下:

import paddle
import datetime

tobj = paddle.load("test_tobj")
mask = paddle.load("test_mask")
score_iou = paddle.load("test_score_iou")
t_indices = paddle.load("test_t_indices")
gr = 1.0

print("tobj shape: {}, mask shape: {}".format(tobj.shape, mask.shape))

#方法1
def fun1(tobj, mask, score_iou,gr):
    start_time = datetime.datetime.now()
    with paddle.no_grad():
        x = paddle.gather_nd(tobj, mask)
        tobj = paddle.scatter_nd_add(
            tobj, mask, (1.0 - gr) + gr * score_iou - x)
    end_time = datetime.datetime.now()
    elapsed_time = end_time - start_time
    print("gather_nd + scatter_nd Elapsed time: ", elapsed_time.microseconds/1000, "ms")
    return tobj

#方法2
def fun2(tobj,score_iou, t_indices,gr):
    b, a, gj, gi = t_indices  # image, anchor, gridy, gridx
    start_time = datetime.datetime.now()
    with paddle.no_grad():
            tobj[b, a, gj, gi] = (1.0 - gr) + gr * score_iou  # iou ratio
    end_time = datetime.datetime.now()
    elapsed_time = end_time - start_time
    print("set_value Elapsed time: ", elapsed_time.microseconds/1000, "ms")
    return tobj

tobj1 = fun1(tobj, mask, score_iou,gr)
tobj2 = fun2(tobj,score_iou, t_indices,gr)

测试结果:

wko9yo5t

wko9yo5t1#

可能slice性能有问题 建议在这里也贴一下反馈 https://github.com/PaddlePaddle/Paddle/issues

falq053o

falq053o2#

你好,这个暂时是已知的slice性能问题。之前改回旧版也就是方法2的写法是为了确保高精度,是因为发现在大量数据集不加载预训练也就是从整体随机初始化开始训的时候,方法1写法精度后期会比方法2写法低1个点mAP,loss曲线后期有分叉现象。但小数据集(比如1w张以内)训,或者finetune coco权重训去别的垂类或下游任务数据集,两种写法精度基本一致,就可以使用更快速的方法1写法。会尽快推进这个slice性能问题的解决。另外推荐可以试试 PPYOLOE+ 和 RT-DETR。

相关问题