spark在柱计算中的应用

kknvjkwl  于 2021-05-17  发布在  Spark
关注(0)|答案(2)|浏览(370)

我有一个数据框,如下所示。它本质上是一个时间序列衍生的Dataframe。我的问题是,第n行c列的公式是:-
列(c)=(列a(第n行)-列a(第n-1行))+列c(第n-1行)。
因此,列c的计算是自参考列c的先前值。我使用的是sparksql,有人能告诉我怎么做吗?对于列a的计算,我使用滞后函数

vlf7wbxs

vlf7wbxs1#

你的公式是累加和。下面是一个完整的示例:

SELECT rowid, a, SUM(c0) OVER(ORDER BY rowid) as c
  FROM
  (
    SELECT rowid, a, a - LAG(a, 1) OVER(ORDER BY rowid) as c0
    FROM
    (
      SELECT 1 as rowid, 5 as a union all
      SELECT 2 as rowid, 6 as a union all
      SELECT 3 as rowid, 5 as a union all
      SELECT 4 as rowid, 7 as a union all
      SELECT 5 as rowid, 8 as a union all
      SELECT 6 as rowid, 3 as a
    )t
  )t
0lvr5msh

0lvr5msh2#

看来可乐只是可乐减去第一排的可乐。
例如

1 = 6-5,
0 = 5-5,
2 = 7-5,
3 = 8-5,
-2 = 3-5

所以这个查询应该可以:

SELECT colA, colA - FIRST(colA) OVER (ORDER BY id) AS colC

相关问题