SQL Server 高效地连接和聚合两个巨大的表

7vhp5slm 于 2022-11-21 发布在其他

关注(0)|答案(1)|浏览(140)

我有一个包含超过100万条交易记录的巨大表，我需要将此表连接到自身，并为每个交易提取52周前的所有类似交易，然后将它们聚合起来，以便稍后在ML模型中使用。

select distinct a.transref, 
    a.transdate, a.transamount, 
    a.transtype,
    avg (b.transamount) 
        over (partition by a.transref,a.transdate, a.transamount,a.transtype) as avg_trans_amount
from trans_table a
inner join trans_table b
on a.transtype = b.transtype
and b.transdate <= dateadd(week, -52, a.transdate)
and b.transdate <= a.transdate
and a.transdate between '2022-11-16' and '2021-11-16'

事务表如下所示：

+--------+----------+-----------+---------+
|transref|trasndate |transamount|transtype|
+--------+----------+-----------+---------+
|xh123rdk|2022-11-16|112.48     |food & Re|
|g8jegf90|2022-11-04|23.79      |Misc     |
|ulpef32p|2022-10-23|83.15      |gasoline |
+--------+----------+-----------+---------+

预期输出应如下所示：

+--------+----------+-----------+---------+----------------+
|transref|trasndate |transamount|transtype|avg_trans_amount|
+--------+----------+-----------+---------+----------------+
|xh123rdk|2022-11-16|112.48     |food & Re|180.11          |
|g8jegf90|2022-11-04|23.79      |Misc     |43.03           |
|ulpef32p|2022-10-23|83.15      |gasoline |112.62          |
+--------+----------+-----------+---------+----------------+

由于每个事务可能会提取10，000多条类似类型的记录，因此查询运行起来非常缓慢且开销很大，因此SQL Server无法创建输出表。
如何优化此查询以在合理的时间内高效运行？
注意事项：在运行查询失败后，我最终创建了一个存储过程，将原始表a拆分为较小的块，将其连接到大表，聚合结果并将结果追加到输出表中，然后重复此操作，直到覆盖整个表a。它仍然很慢。我希望有更好的方法在SQL中完成它，而不需要所有这些手动工作。

sql-server

来源：https://stackoverflow.com/questions/74454935/join-and-aggregate-two-huge-tables-efficiently

1条答案

按热度按时间

qgelzfjb1#

好的，我想我已经知道是什么原因导致查询运行速度太慢了。诀窍是在执行连接之前先执行一些分组操作，以避免重复和不必要的计算。

with merch as (
    select transtype,
    dateadd(week, -52, transdate) as startdate,
    transdate as enddate),
    from trans_table 
    group by transtype, transdate),

summary as (
    select distinct transtype,
    stratdate, enddate, 
 avg(t.transamt) over (partition by 
 m.transtype, m.startdate, m.enddate) as avg_amt,

 percentile_cont(0.5) within group (order by t.transamt) over (partition by
 m.transtype, m.startdate, m.enddate) as median_amt

from merch as m
inner join trans_table as t
on m.transtype = t.transdate and
   t.transdate between m.starttype and
   m.enddate)


select t.*, s.avg_amt  s.median_amt
from trans_table t
inner join summary s
on t.transtype = s.transtype
and t.transdate = s.enddate

赞(0）回复(0）举报 2022-11-21

我来回答

SQL Server 高效地连接和聚合两个巨大的表

1条答案

相关问题

热门标签

最新问答