SQL Server 高效地连接和聚合两个巨大的表

7vhp5slm  于 2022-11-21  发布在  其他
关注(0)|答案(1)|浏览(140)

我有一个包含超过100万条交易记录的巨大表,我需要将此表连接到自身,并为每个交易提取52周前的所有类似交易,然后将它们聚合起来,以便稍后在ML模型中使用。

select distinct a.transref, 
    a.transdate, a.transamount, 
    a.transtype,
    avg (b.transamount) 
        over (partition by a.transref,a.transdate, a.transamount,a.transtype) as avg_trans_amount
from trans_table a
inner join trans_table b
on a.transtype = b.transtype
and b.transdate <= dateadd(week, -52, a.transdate)
and b.transdate <= a.transdate
and a.transdate between '2022-11-16' and '2021-11-16'

事务表如下所示:

+--------+----------+-----------+---------+
|transref|trasndate |transamount|transtype|
+--------+----------+-----------+---------+
|xh123rdk|2022-11-16|112.48     |food & Re|
|g8jegf90|2022-11-04|23.79      |Misc     |
|ulpef32p|2022-10-23|83.15      |gasoline |
+--------+----------+-----------+---------+

预期输出应如下所示:

+--------+----------+-----------+---------+----------------+
|transref|trasndate |transamount|transtype|avg_trans_amount|
+--------+----------+-----------+---------+----------------+
|xh123rdk|2022-11-16|112.48     |food & Re|180.11          |
|g8jegf90|2022-11-04|23.79      |Misc     |43.03           |
|ulpef32p|2022-10-23|83.15      |gasoline |112.62          |
+--------+----------+-----------+---------+----------------+

由于每个事务可能会提取10,000多条类似类型的记录,因此查询运行起来非常缓慢且开销很大,因此SQL Server无法创建输出表。
如何优化此查询以在合理的时间内高效运行?
注意事项:在运行查询失败后,我最终创建了一个存储过程,将原始表a拆分为较小的块,将其连接到大表,聚合结果并将结果追加到输出表中,然后重复此操作,直到覆盖整个表a。它仍然很慢。我希望有更好的方法在SQL中完成它,而不需要所有这些手动工作。

qgelzfjb

qgelzfjb1#

好的,我想我已经知道是什么原因导致查询运行速度太慢了。诀窍是在执行连接之前先执行一些分组操作,以避免重复和不必要的计算。

with merch as (
    select transtype,
    dateadd(week, -52, transdate) as startdate,
    transdate as enddate),
    from trans_table 
    group by transtype, transdate),

summary as (
    select distinct transtype,
    stratdate, enddate, 
 avg(t.transamt) over (partition by 
 m.transtype, m.startdate, m.enddate) as avg_amt,

 percentile_cont(0.5) within group (order by t.transamt) over (partition by
 m.transtype, m.startdate, m.enddate) as median_amt

from merch as m
inner join trans_table as t
on m.transtype = t.transdate and
   t.transdate between m.starttype and
   m.enddate)


select t.*, s.avg_amt  s.median_amt
from trans_table t
inner join summary s
on t.transtype = s.transtype
and t.transdate = s.enddate

相关问题