如何不使用pivot方法将sparkscala中的列的行透视/转置到各个列

pdtvr36n 于 2021-05-17 发布在 Spark

关注(0)|答案(2)|浏览(434)

请检查下面的图片以参考我的用例

来源：https://stackoverflow.com/questions/64928506/how-to-pivot-transpose-rows-of-a-column-in-to-individual-columns-in-spark-scala

2条答案

按热度按时间

4uqofj5v1#

如果您知道新列的所有名称，则可以通过手动添加列来获得相同的结果，而无需使用pivot：

import org.apache.spark.sql.functions.{col, when}

dataframe
  .withColumn("cheque", when(col("ttype") === "cheque", col("tamt")))
  .withColumn("draft", when(col("ttype") === "draft", col("tamt")))
  .drop("tamt", "ttype")

由于此解决方案不会触发洗牌，因此处理速度将比使用pivot更快。
如果您不知道列的名称，则可以将其泛化。但是，在这种情况下，您应该进行基准测试，以检查pivot是否更具性能：

import org.apache.spark.sql.functions.{col, when}

val newColumnNames = dataframe.select("ttype").distinct.collect().map(_.getString(0))

newColumnNames
  .foldLeft(dataframe)((df, columnName) => {
    df.withColumn(columnName, when(col("ttype") === columnName, col("tamt")))
  })
  .drop("tamt", "ttype")

赞(0）回复(0）举报 2021-05-17

esyap4oy2#

使用 groupBy , pivot & agg 功能。检查以下代码。添加了内联注解。

scala> df.show(false)
+----------+------+----+
|tdate     |ttype |tamt|
+----------+------+----+
|2020-10-15|draft |5000|
|2020-10-18|cheque|7000|
+----------+------+----+

scala> df
.groupBy($"tdate") // Grouping data based on tdate column.
.pivot("ttype",Seq("cheque","draft")) // pivot based on ttype and "draft","cheque" are new column name
.agg(first("tamt")) // aggregation by "tamt" column.
.show(false)

+----------+------+-----+
|tdate     |cheque|draft|
+----------+------+-----+
|2020-10-18|7000  |null |
|2020-10-15|null  |5000 |
+----------+------+-----+

赞(0）回复(0）举报 2021-05-17

我来回答

如何不使用pivot方法将sparkscala中的列的行透视/转置到各个列

2条答案

相关问题

热门标签

最新问答