scala-calculate使用数组的最后一个元素

9njqaruj  于 2021-07-12  发布在  Spark
关注(0)|答案(1)|浏览(288)

我有一个结构如下的Dataframe

ID:string
Amt:long
Col:array
    element:struct
        Seq:int
        Pct:double
        Sh:double

Dataframe输出

+----+-------+------------------------------------------+
|ID  |Amt    |col                                       |
+----+-------+------------------------------------------+
|ABC |23077  |[[1, 1.5, 1, 10000], [2, 1.2, 2.5,40000]] |
+------------+------------------------------------------+

我需要到下面的计算第一阵列的最后一个元素将是相同的10000。对于下一个数组,我需要用第一个数组的值(40000-10000)减去它,得到30000的输出

Expected output
+----+-------+-------------------------------------------+
|ID  |Amt    |col1                                       |
+----+---------------------------------------------------+
|ABC |23077  |[[1, 1.5, 1, 10000], [2, 1.2, 2.5, 30000]] |
+----+-------+-------------------------------------------+

我该如何做到这一点?

hwazgwia

hwazgwia1#

你可以用 transform 并将金额与上一个条目进行比较:

val df2 = df.withColumn(
    "col", 
    expr("""
        transform(
            col, 
            (x, i) -> struct(
                x.Seq as Seq, x.Pct as Pct, x.Sh as Sh, 
                case when i=0 then x.Amt else x.Amt - col[i-1].Amt end as Amt
            )
        )
    """)
)

df2.show(false)
+-----+---+--------------------------------------------+
|Amt  |ID |col                                         |
+-----+---+--------------------------------------------+
|23077|ABC|[[1, 1.5, 1.0, 10000], [2, 1.2, 2.5, 30000]]|
+-----+---+--------------------------------------------+

相关问题