sql—如何对一列中的多行值求和?

vsikbqxv  于 2021-07-15  发布在  ClickHouse
关注(0)|答案(5)|浏览(476)

我有这个表,我想为几行添加'change'列的值(或者,更确切地说,从'ne'值为零的行到下一行,其中包括'ne'的零(不是第二行本身))。任何答复都将不胜感激。

  1. ┌─rn─┬───────date─┬─ne─┬───────change─┐
  2. 0 2008-12-07 0 -10330848398
  3. 1 2009-04-14 1 -61290
  4. 2 2009-04-26 1 9605743360
  5. 3 2013-07-06 0 -32028871920
  6. 4 2014-01-12 1 -42296164902
  7. 5 2015-06-08 1 59100383646
  8. └────┴────────────┴────┴──────────────┘

我们期望的结果是这样的。

  1. row start end sum(change)
  2. --------------------------------------------------
  3. 0 | 2008-12-07 | 2009-04-26 | -725,166,328
  4. --------------------------------------------------
  5. 1 | 2013-07-06 | 2015-06-08 | -15,224,653,176
  6. --------------------------------------------------
oipij1gg

oipij1gg1#

在bigdata(>1亿行)中是无法解决的

  1. SELECT
  2. d[1] AS s,
  3. d[-1] AS e,
  4. arraySum(c) AS sm
  5. FROM
  6. (
  7. SELECT
  8. arraySplit((x, y) -> (NOT y), d, n) AS dd,
  9. arraySplit((x, y) -> (NOT y), c, n) AS cc
  10. FROM
  11. (
  12. SELECT
  13. groupArray(date) AS d,
  14. groupArray(ne) AS n,
  15. groupArray(change) AS c
  16. FROM
  17. (
  18. SELECT *
  19. FROM mytable
  20. ORDER BY rn ASC
  21. )
  22. )
  23. )
  24. ARRAY JOIN
  25. dd AS d,
  26. cc AS c
  27. ┌─s──────────┬─e──────────┬───────────sm─┐
  28. 2008-12-07 2009-04-26 -725166328
  29. 2013-07-06 2015-06-08 -15224653176
  30. └────────────┴────────────┴──────────────┘
展开查看全部
ffx8fchx

ffx8fchx2#

这是一个缺口和孤岛问题。规范的解决方案确实使用了窗口函数,而clickhouse不支持这些函数。
下面是一种使用子查询模拟条件窗口和的方法:

  1. select
  2. min(date) start_date,
  3. max(date) end_date,
  4. sum(change) sum_change
  5. from (
  6. select
  7. t.*,
  8. (select count(*) from mytable t1 where t1.date <= t.date and t1.ne = 0) grp
  9. from mytable t
  10. ) t
  11. group by grp

子查询统计有多少行 ne = 0 从表的第一行到当前行。这定义了记录组。那么剩下要做的就是聚合。
如果您可以使用窗口函数,您可以将其表述为:

  1. select
  2. min(date) start_date,
  3. max(date) end_date,
  4. sum(change) sum_change
  5. from (
  6. select
  7. t.*,
  8. sum(case when ne = 0 then 1 else 0 end) over(order by date) grp
  9. from mytable t
  10. ) t
  11. group by grp
展开查看全部
xpcnnkqh

xpcnnkqh3#

只是解决此任务的另一种方法:

  1. WITH (SELECT arraySort(groupArray(rn))
  2. FROM test_table
  3. WHERE ne = 0) as group_start_id
  4. SELECT argMin(date, rn) start, argMax(date, rn) end, sum(change)
  5. FROM (
  6. SELECT rn, date, change
  7. FROM test_table
  8. ORDER BY rn)
  9. GROUP BY arrayFirstIndex(x -> rn < x, group_start_id)
  10. ORDER BY start

样本数据测试:

  1. WITH (SELECT arraySort(groupArray(rn))
  2. FROM (
  3. SELECT data.1 rn, data.2 date, data.3 ne, data.4 change
  4. FROM (
  5. SELECT arrayJoin([
  6. (0, toDate('2008-12-07'), 0, toInt64(-10330848398)),
  7. (1, toDate('2009-04-14'), 1, toInt64(-61290)),
  8. (2, toDate('2009-04-26'), 1, toInt64(9605743360)),
  9. (3, toDate('2013-07-06'), 0, toInt64(-32028871920)),
  10. (4, toDate('2014-01-12'), 1, toInt64(-42296164902)),
  11. (5, toDate('2015-06-08'), 1, toInt64(59100383646)),
  12. (6, toDate('2015-06-08'), 0, toInt64(101)),
  13. (7, toDate('2015-06-09'), 0, toInt64(102)),
  14. (8, toDate('2015-06-10'), 0, toInt64(103)),
  15. (9, toDate('2015-06-11'), 1, toInt64(104))
  16. ]) data))
  17. WHERE ne = 0) as group_start_id
  18. SELECT argMin(date, rn) start, argMax(date, rn) end, sum(change)
  19. FROM (
  20. SELECT data.1 rn, data.2 date, data.4 change
  21. FROM (
  22. SELECT arrayJoin([
  23. (0, toDate('2008-12-07'), 0, toInt64(-10330848398)),
  24. (1, toDate('2009-04-14'), 1, toInt64(-61290)),
  25. (2, toDate('2009-04-26'), 1, toInt64(9605743360)),
  26. (3, toDate('2013-07-06'), 0, toInt64(-32028871920)),
  27. (4, toDate('2014-01-12'), 1, toInt64(-42296164902)),
  28. (5, toDate('2015-06-08'), 1, toInt64(59100383646)),
  29. (6, toDate('2015-06-08'), 0, toInt64(101)),
  30. (7, toDate('2015-06-09'), 0, toInt64(102)),
  31. (8, toDate('2015-06-10'), 0, toInt64(103)),
  32. (9, toDate('2015-06-11'), 1, toInt64(104))
  33. ]) data)
  34. ORDER BY rn)
  35. GROUP BY arrayFirstIndex(x -> rn < x, group_start_id)
  36. ORDER BY start
  37. /* result
  38. ┌──────start─┬────────end─┬──sum(change)─┐
  39. │ 2008-12-07 │ 2009-04-26 │ -725166328 │
  40. │ 2013-07-06 │ 2015-06-08 │ -15224653176 │
  41. │ 2015-06-08 │ 2015-06-08 │ 101 │
  42. │ 2015-06-09 │ 2015-06-09 │ 102 │
  43. │ 2015-06-10 │ 2015-06-11 │ 207 │
  44. └────────────┴────────────┴──────────────┘
  45. * /
展开查看全部
snz8szmq

snz8szmq4#

选择ne,min(日期)作为开始,max(日期)作为结束,sum(更改)作为更改组by ne

ipakzgxi

ipakzgxi5#

假设clickhouse支持变量:

  1. set @block := -1;
  2. select
  3. block as row,
  4. min(date) as start,
  5. max(date) as end,
  6. sum(change)
  7. from
  8. (select
  9. case when ne = 0 then @block:=@block+1 end as dummy,
  10. @block as block,
  11. t.*
  12. from t) tt
  13. group by block;

相关问题