在PostgreSQL中计算累计和

我想要找到字段的累计或运行量，并将其从分段插入到表中。我的试运行结构是这样的：

ea_month    id       amount    ea_year    circle_id
April       92570    1000      2014        1
April       92571    3000      2014        2
April       92572    2000      2014        3
March       92573    3000      2014        1
March       92574    2500      2014        2
March       92575    3750      2014        3
February    92576    2000      2014        1
February    92577    2500      2014        2
February    92578    1450      2014        3

我希望我的目标表看起来像这样：

ea_month    id       amount    ea_year    circle_id    cum_amt
February    92576    1000      2014        1           1000 
March       92573    3000      2014        1           4000
April       92570    2000      2014        1           6000
February    92577    3000      2014        2           3000
March       92574    2500      2014        2           5500
April       92571    3750      2014        2           9250
February    92578    2000      2014        3           2000
March       92575    2500      2014        3           4500
April       92572    1450      2014        3           5950

我真的很困惑如何去实现这个结果。我希望使用PostgreSQL来实现这一结果。

有谁能建议如何实现这一结果集？

基本上，您需要一个window function。这是当今的一个标准特征。除了真正的窗口函数外，您还可以通过添加OVER子句在postgres中使用ANY聚合函数作为窗口函数。

这里的特殊困难是正确地获得分区和排序顺序：

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id
                         ORDER BY ea_year, ea_month) AS cum_amt
FROM   tbl
ORDER  BY circle_id, ea_year, ea_month;

和noGROUP BY。

每一行的总和是从分区中的第一行到当前行计算的-或者引用手册来精确地计算：
默认的取景选项为RANGE UNBOUNDED PRECEDING，与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW相同。对于ORDER BY，它将帧设置为从分区开始到当前行的最后一个ORDER BY对等体的所有行。

大胆强调我的观点。这是你想要的累积(或“运行”)总和。

在默认的RANGE模式中，在排序顺序中具有相同排名的行是*“Peers”*-在该查询中是相同的(circle_id, ea_year, ea_month)。所有这些都显示了相同的运行总和，所有同行都加到了总和中。但是我假设您的表是(circle_id, ea_year, ea_month)上的UNIQUE，那么排序顺序是确定的，并且没有行有对等项。(你不妨使用更便宜的ROWS模式。)

Postgres 11添加了包含/排除具有新frame_exclusion选项的对等点的工具。请参见：

聚合不在同一组中的所有值

现在，ORDER BY ... ea_month将不能使用月份名称的字符串。Postgres将根据区域设置按字母顺序排序。

如果您的表中存储了实际的date值，则可以正确排序。如果没有，我建议将ea_year和ea_month替换为表中类型为date的单个列the_date。

使用to_date()改变您所拥有的：

to_date(ea_year || ea_month , 'YYYYMonth') AS the_date

显示时，可以使用to_char()获取原始字符串：

to_char(the_date, 'Month') AS ea_month
  to_char(the_date, 'YYYY')  AS ea_year

在坚持这个不幸的设计的同时，这将是可行的：

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id ORDER BY the_date) AS cum_amt
FROM   (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS the_date FROM tbl)
ORDER  BY circle_id, mon;

在PostgreSQL中计算累计和

1条答案

相关问题

热门标签

最新问答