我有这个dataframe,我试图将这个dataframe操作转换成sql使用 pivot
功能
val df = Seq(
(1, "a,b,c"),
(2, "b,c")
).toDF("id", "page_path")
df.createOrReplaceTempView("df")
df.show()
df
.withColumn("splitted", split($"page_path", ","))
.withColumn("exploded", explode($"splitted"))
.groupBy("id")
.pivot("exploded")
.count().show
产出:
+---+----+---+---+
| id| a| b| c|
+---+----+---+---+
| 1| 1| 1| 1|
| 2|null| 1| 1|
+---+----+---+---+
我看到了要使用的databricks链接 pivot
以sql方式运行,我尝试应用,但失败了。。
有人想过用sql的方式应用pivot函数吗?
我只是试过这样,
spark.sql(
"""
(select * from (Select id, exploded from ( select id, explode(split( page_path ,',')) as exploded from df )
group by id, exploded )
Pivot
(id, exploded) )
""".stripMargin
).show
产生的结果是
+---+--------+
| id|exploded|
+---+--------+
| 2| c|
| 1| c|
| 1| b|
| 2| b|
| 1| a|
+---+--------+
与上面使用Dataframe操作显示的输出不同
我也试过这个
spark.sql(
"""
select * from ( select * from df lateral view explode(split( page_path ,',')) as exploded )
pivot (exploded)
""".stripMargin
).show
例外情况:
线程“main”org.apache.spark.sql.analysisexception中出现异常:列别名数与列数不匹配。列别名数:1;列数:3。;第2行位置21
打破我的头,在想是否可能。
1条答案
按热度按时间amrnrhlw1#
我们
don't
必须提到groupby子句使用sqlpivot作为spark隐式执行groupby。From documentation:
pivot的一个重要思想是,它基于隐式groupby列列表和pivot列执行分组聚合。隐式GROUPBY列是from子句中的列,这些列不会出现在任何聚合函数中或作为透视列出现。在
pivot
我们需要在数据透视列和中添加聚合查询for
提到最终数据集中需要的列列表。Using Spark-sql Pivot from spark-2.4:
```import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.functions._
val df1: DataFrame = Seq((1, "a,b,c"),(2, "b,c")).toDF("id", "page_path")
df1.createOrReplaceTempView("df1")
spark.sql(
"""
|Select * from
|( select id, explode(split( page_path ,',')) as exploded from df )
|pivot(count(exploded) for exploded in ('a','b','c')
|)
""".stripMargin)
.show
+---+----+---+---+
| id| a| b| c|
+---+----+---+---+
| 1| 1| 1| 1|
| 2|null| 1| 1|
+---+----+---+---+
`Using Dataframe api:`
df1.withColumn("splitted", split($"page_path", ",")).
withColumn("exploded", explode($"splitted")).
groupBy("id").
pivot("exploded").
count().
show
+---+----+---+---+
| id| a| b| c|
+---+----+---+---+
| 1| 1| 1| 1|
| 2|null| 1| 1|
+---+----+---+---+