Pyspark:如何通过基于行值向每列添加动态前缀来创建嵌套Json

velaa5lx  于 2023-01-13  发布在  Apache
关注(0)|答案(1)|浏览(120)

我有一个 Dataframe 在下面的格式。
输入:
| id|Name_type|Name|Car|
| - ------|- ------|- ------|- ------|
| 1|First|rob|Nissan|
| 2|First|joe|x1米11米1x|
| 1|Last|dent|Infiniti|
| 2|Last|Kent|Genesis|
需要通过在给定键列的格式下方附加行值来转换为JSON列,如下所示。
预期结果:
| x1米20英寸1x|json_column|
| - ------|- ------|
| 1|{"First_Name":"rob","First_*Car", "Nissan","Last_Name":"dent","Last_Car", "Infiniti"}|
| 2|{"First_Name":"joe","First_Car", "Hyundai","Last_Name":"kent","Last_Car", "Genesis"}|
使用以下代码段
column_set = ['Name','Car'] df = df.withColumn("json_data", to_json(struct(\[df\[x\] for x in column_set\])))
'我能够产生数据
| id|Name_type|Json_data|
| - ------|- ------|- ------|
| x1米30英寸1x|First|{"Name":"rob", "Car": "Nissan"}|
| 2|First|{"Name":"joe", "Car": "Hyundai"}|
| 1|Last|{"Name":"dent", "Car": "infiniti"}|
| x1米39英寸|x1米40英寸1x|{"Name":"kent", "Car": "Genesis"}|
我可以使用to_json为给定行创建一个json列。
'但无法确定如何将行值附加到列名,以及如何将给定键列转换为嵌套json。''

w8ntj3qf

w8ntj3qf1#

要完成这个任务,首先需要对输入 Dataframe 进行一些操作,可以按id列进行分组,然后围绕Name_type列进行透视,如下所示:

from pyspark.sql.functions import first

df = spark.createDataFrame(
    [
        ("1", "First", "rob", "Nissan"),
        ("2", "First", "joe", "Hyundai"),
        ("1", "Last", "dent", "Infiniti"),
        ("2", "Last", "Kent", "Genesis")
    ],
    ["id", "Name_type", "Name", "Car"]
)
output = df.groupBy("id").pivot("Name_type").agg(first("Name").alias('Name'), first("Car").alias('Car'))

output.show()
+---+----------+---------+---------+--------+
| id|First_Name|First_Car|Last_Name|Last_Car|
+---+----------+---------+---------+--------+
|  1|       rob|   Nissan|     dent|Infiniti|
|  2|       joe|  Hyundai|     Kent| Genesis|
+---+----------+---------+---------+--------+

然后,您可以使用与您用于获得所需结果的代码完全相同的代码,但使用4列而不是2列:

from pyspark.sql.functions import to_json, struct

column_set = ['First_Name','First_Car', 'Last_Name', 'Last_Car']
output = output.withColumn("json_data", to_json(struct([output[x] for x in column_set])))

output.show(truncate=False)
+---+----------+---------+---------+--------+----------------------------------------------------------------------------------+
|id |First_Name|First_Car|Last_Name|Last_Car|json_data                                                                         |
+---+----------+---------+---------+--------+----------------------------------------------------------------------------------+
|1  |rob       |Nissan   |dent     |Infiniti|{"First_Name":"rob","First_Car":"Nissan","Last_Name":"dent","Last_Car":"Infiniti"}|
|2  |joe       |Hyundai  |Kent     |Genesis |{"First_Name":"joe","First_Car":"Hyundai","Last_Name":"Kent","Last_Car":"Genesis"}|
+---+----------+---------+---------+--------+----------------------------------------------------------------------------------+

相关问题