spark/scala如何从上一列的值中减去当前列的值?

rjee0c15  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(486)

我有这样一个数据框:

+--------------+-------+-------+-------+-------+-------+-------+-------+
|Country/Region| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
|       Senegal|      0|      4|     10|     18|     27|     31|     35|
+--------------+-------+-------+-------+-------+-------+-------+-------+
|       Tunisia|      1|      8|     15|     21|     37|     42|     59|
+--------------+-------+-------+-------+-------+-------+-------+-------+

对于每个国家,我都有一个独特的行,但我有许多列代表天。我想遍历每一列并从中减去前一列中的相应值,例如结果df应如下所示:

+--------------+-------+-------+-------+-------+-------+-------+-------+
|Country/Region| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
|       Senegal|      0|      4|      6|      8|      9|      4|      4|
+--------------+-------+-------+-------+-------+-------+-------+-------+
|       Tunisia|      1|      7|      7|      6|     16|      5|     17|
+--------------+-------+-------+-------+-------+-------+-------+-------+
2skhul33

2skhul331#

也许这是有帮助的-

df2.show(false)
    df2.printSchema()
    /**
      * +--------------+------+------+------+-------+-------+-------+-------+
      * |Country/Region|3/7/20|3/8/20|3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
      * +--------------+------+------+------+-------+-------+-------+-------+
      * |Senegal       |0     |4     |10    |18     |27     |31     |35     |
      * |Tunisia       |1     |8     |15    |21     |37     |42     |59     |
      * +--------------+------+------+------+-------+-------+-------+-------+
      *
      * root
      * |-- Country/Region: string (nullable = true)
      * |-- 3/7/20: integer (nullable = true)
      * |-- 3/8/20: integer (nullable = true)
      * |-- 3/9/20: integer (nullable = true)
      * |-- 3/10/20: integer (nullable = true)
      * |-- 3/11/20: integer (nullable = true)
      * |-- 3/12/20: integer (nullable = true)
      * |-- 3/13/20: integer (nullable = true)
      */

    val new_df = df2.withColumn("01/01/70", lit(0))
    val tuples = new_df.schema.filter(_.dataType.isInstanceOf[NumericType])
      .map(_.name)
      .map(c => {
      val sdf = new SimpleDateFormat("MM/dd/yy")
      (sdf.parse(c), c)
    }).sortBy(_._1)
      .map(_._2)
      .sliding(2, 1)
      .map(seq => (col(seq.last) - col(seq.head)).as(seq.last))

    new_df.select(col("Country/Region") +: tuples.toSeq: _* )
      .show(false)

    /**
      * +--------------+------+------+------+-------+-------+-------+-------+
      * |Country/Region|3/7/20|3/8/20|3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
      * +--------------+------+------+------+-------+-------+-------+-------+
      * |Senegal       |0     |4     |6     |8      |9      |4      |4      |
      * |Tunisia       |1     |7     |7     |6      |16     |5      |17     |
      * +--------------+------+------+------+-------+-------+-------+-------+
      */

相关问题