我是spark的新手,我希望使用scala将Dataframe的一行作为轴心,如下所示:
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Country| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Japan| 0| 4| 10| 18| 27| 31| 35|
+--------------+-------+-------+-------+-------+-------+-------+-------+
我的数据透视框应该如下所示
+--------------+-------+
| Country| Japan |
+--------------+-------+
| 3/7/20| 0|
+--------------+-------+
| 3/8/20| 4|
+--------------+-------+
| 3/9/20| 10|
+--------------+-------+
| 3/10/20| 18|
+--------------+-------+
| ...| ...|
+--------------+-------+
我已尝试使用以下方法,但不确定是否正确获取聚合表达式:
val pivoted = df.groupBy("Country").pivot("Country", Seq("Japan")).agg(col("Country"))
1条答案
按热度按时间aoyhnmkz1#
试试这个-
使用
stack
```df2.show(false)
df2.printSchema()
/**
* +-------+------+------+------+-------+-------+-------+-------+
* |Country|3/7/20|3/8/20|3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
* +-------+------+------+------+-------+-------+-------+-------+
* |Japan |0 |4 |10 |18 |27 |31 |35 |
* +-------+------+------+------+-------+-------+-------+-------+
*
* root
* |-- Country: string (nullable = true)
* |-- 3/7/20: integer (nullable = true)
* |-- 3/8/20: integer (nullable = true)
* |-- 3/9/20: integer (nullable = true)
* |-- 3/10/20: integer (nullable = true)
* |-- 3/11/20: integer (nullable = true)
* |-- 3/12/20: integer (nullable = true)
* |-- 3/13/20: integer (nullable = true)
/
val stringCol = df2.columns.map(c => s"'$c', cast(
$c
as string)").mkString(", ")val processedDF = df2.selectExpr(s"stack(${df2.columns.length}, $stringCol) as (col_1, col_2)")
processedDF.show(false)
/*
* +-------+-----+
* |col_1 |col_2|
* +-------+-----+
* |Country|Japan|
* |3/7/20 |0 |
* |3/8/20 |4 |
* |3/9/20 |10 |
* |3/10/20|18 |
* |3/11/20|27 |
* |3/12/20|31 |
* |3/13/20|35 |
* +-------+-----+
*/