scala—将sparkDataframe中的列聚合为json

af7jpaap  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(462)

我有以下spark dataframe,我想将一列中的所有列聚合为json,如下所示:如果输入dataframe为:

key,name,title
123,hsd,jds
148,sdf,qsz
589,qsz,aze

预期结果将是:

key,name,title,aggregation
123,hsd,jds,{"key":"123","name":"hsd", "title":"jds"}
148,sdf,qsz,{"key":"148","name":"sdf", "title":"qsz"}
589,qsz,aze,{"key":"589","name":"qsz", "title":"aze"}

解决方案不应硬编码字段名称,有什么想法如何做到这一点,请?

oxosxuxt

oxosxuxt1#

使用 to_json 但使用更灵活的列:

Seq(
    (123, "hsd", "jds"),
    (148, "sdf", "qsz"),
    (589, "qsz", "aze")
).toDF("key", "name", "title")
dfA.withColumn("aggregation", to_json(
  map(dfA.columns.flatMap(columnName => Seq(lit(columnName), col(columnName))):_*))
).show(truncate = false)

+---+----+-----+----------------------------------------+
|key|name|title| aggregation                              |
+---+----+-----+----------------------------------------+
|123|hsd |jds  |{"key":"123","name":"hsd","title":"jds"}|
|148|sdf |qsz  |{"key":"148","name":"sdf","title":"qsz"}|
|589|qsz |aze  |{"key":"589","name":"qsz","title":"aze"}|
+---+----+-----+----------------------------------------+
ygya80vv

ygya80vv2#

你可以用 to_json 功能

val df = Seq(
  (123, "hsd", "jds"),
  (148, "sdf", "qsz"),
  (589, "qsz", "aze")
).toDF("key", "name", "title")

import org.apache.spark.sql.functions._
df.withColumn("aggregation", to_json(struct($"key", $"name", $"title")))
  .show(false)

如果您有许多列,可以在下面使用它。

df.withColumn("aggregation", to_json(struct(df.columns.map(col): _*)))

输出:

+---+----+-----+--------------------------------------+
|key|name|title|aggregation                           |
+---+----+-----+--------------------------------------+
|123|hsd |jds  |{"key":123,"name":"hsd","title":"jds"}|
|148|sdf |qsz  |{"key":148,"name":"sdf","title":"qsz"}|
|589|qsz |aze  |{"key":589,"name":"qsz","title":"aze"}|
+---+----+-----+--------------------------------------+

相关问题