sql—将两个表聚合为时间序列

w3nuxt5m  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(414)

假设我有两个表,它们成对地保存面向时间的数据(date,data)。其中一个是我的日常开支,另一个是我的日常收入。日期是指该费用/收入率的起始日期。
例如:如果我在收入表中有两个记录:(2020-01-01,50),(2020-02-14100)这意味着在2020-01-01和2020-02-14之间,我的收入是每天50个单位,而在2020-02-14之后,我的收入是每天100个单位。费用也一样。
我想在postgres表中为给定的时间间隔生成一个时间序列(可能使用time generate\ u series(date,date,interval)函数),其中一行如下所示:(day,income,expense)
例如,如果我的收入表是这样的:

  1. +------------+--------+
  2. | date | income |
  3. +------------+--------+
  4. | 2020-02-12 | 50 |
  5. | 2020-02-14 | 100 |
  6. +------------+--------+

费用如下:

  1. +------------+--------------+
  2. | date | expenses |
  3. +------------+--------------+
  4. | 2020-02-12 | 70 |
  5. | 2020-02-13 | 50 |
  6. +------------+--------------+

我希望从2020-02-12到2020-02-15的时间间隔结果如下所示:

  1. +------------+--------+--------------+
  2. | day | income | expenses |
  3. +------------+--------+--------------+
  4. | 2020-02-12 | 50 | 70 |
  5. | 2020-02-13 | 50 | 50 |
  6. | 2020-02-14 | 100 | 50 |
  7. | 2020-02-15 | 100 | 50 |
  8. +------------+--------+--------------+

例如,稍后我可以计算我的利润、亏损和其他统计数据。我怎样才能做到这一点?

ivqmmu1c

ivqmmu1c1#

这有点棘手。你可以 full join 但你需要填补空白。postgres不支持 ignore nulls 在窗口函数中,一个选项是使用条件和来构建组,然后 first_value() :

  1. select
  2. date,
  3. first_value(income) over(partition by grp_i order by date) income,
  4. first_value(expense) over(partition by grp_e order by date) expense
  5. from (
  6. select
  7. date,
  8. i.income,
  9. e.expense,
  10. count(*) filter(where i.income is not null) over(order by date) grp_i,
  11. count(*) filter(where e.expense is not null) over(order by date) grp_e
  12. from incomes i
  13. full join expenses e using(date)
  14. ) t

另一方面,也可以从选定的日期段开始(使用 generate_series() ),然后把table带来 left join s。其余逻辑不变:

  1. select
  2. date,
  3. first_value(income) over(partition by grp_i order by date) income,
  4. first_value(expense) over(partition by grp_e order by date) expense
  5. from (
  6. select
  7. d.date,
  8. i.income,
  9. e.expense,
  10. count(*) filter(where i.income is not null) over(order by d.date) grp_i,
  11. count(*) filter(where e.expense is not null) over(order by d.date) grp_e
  12. from generate_series(date '2020-02-12', date '2020-02-15', interval '1' day) d(date)
  13. left join incomes i on i.date = d.date
  14. left join expenses e on e.date = d.date
  15. ) t
  16. order by date
展开查看全部
vddsk6oq

vddsk6oq2#

一种方法——如果数据不太大——是横向连接:

  1. select gs.dte, i.income, e.expense
  2. from generate_series('2020-02-12'::date, '2020-02-15'::date, interval '1 day'
  3. ) gs(dte) left join lateral
  4. (select i.*
  5. from income i
  6. where i.date <= gs.dte
  7. order by i.date desc
  8. limit 1
  9. ) i
  10. on true left join lateral
  11. (select e.*
  12. from expense e
  13. where e.date <= gs.dte
  14. order by e.date desc
  15. limit 1
  16. ) e
  17. on true;

另一种选择是使用 generate_series() 在每个表上独立计算值。两者都是从最早的日期开始的,所以这是可行的:

  1. select i.date, i.income, e.expense
  2. from (select gs.date, i.income
  3. from (select i.*, lead(date) over (order by date) as next_date
  4. from income i
  5. ) i cross join lateral
  6. generate_series(date, coalesce(next_date - interval '1 day', '2020-02-15'::date), interval '1 day') gs(date)
  7. ) i join
  8. (select gs.date, e.expense
  9. from (select e.*, lead(date) over (order by date) as next_date
  10. from expense e
  11. ) e cross join lateral
  12. generate_series(date, coalesce(e.next_date - interval '1 day', '2020-02-15'::date), interval '1 day') gs(date)
  13. ) e
  14. on i.date = e.date;

这是两种解决方案的混合体。
这很容易修改,以处理两个系列不在同一日期开始的情况。

展开查看全部

相关问题