如何从多列模式使用spark.read.json

1qczuiv0 于 2021-05-27 发布在 Spark

关注(0)|答案(0)|浏览(248)

这就是我要做的。在“type”列中，首先旋转dataframe。然后使用“type”列中的数据生成模式。

val df = spark.sql("""select id,type,key,value from student_details""")

val pivot_df = df.groupBy("id","key").pivot("type").agg(first("value"))

val dist_type_df = spark.sql("""select distinct type from student_details""")

val needed_col_names : List[String] =  dist_type_df.select("type").map(_.getString(0)).collect.toList

如何使用所需的列名称来生成structtype的架构和structtype的数组？我需要使用为“type”列的每个数据生成的模式。

val schema = spark.read.json(pivot_df.select(needed_col_names.head , needed_col_names.tail : _*).as[String]).schema

我应该如何修改下面的代码以获得所有列的Dataframe？

val res_df = df.select($"id",$"type",$"key",from_json($"value",schema).as("s")).select("id","type","key","s.*")

scala apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/63282009/how-to-use-spark-read-json-from-multiple-column-schema

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

如何从多列模式使用spark.read.json

暂无答案！

相关问题

热门标签

最新问答