pyspark转换json和分组依据

0pizxfdo  于 2022-12-17  发布在  Spark
关注(0)|答案(1)|浏览(129)

我在pyspark有个数据框。

+----------------------+------------------+------------+------------+
|      transaction_date|                id|latest_quote|policy_price|
+----------------------+------------------+------------+------------+
|            16/12/2022|0063t000013qmPFAAY|2022-11-25  |      899.18|
|            16/12/2022|0063t000013qQNTAA2|2022-11-30  |     3147.78|
+----------------------+------------------+------------+------------+

我需要这个。

+--------------------------------------------------------------------------------------+
|unique_column                                                                         
|
+--------------------------------------------------------------------------------------+
|{"transaction_date":"16/12/2022", "trans_opp":[                                       
|
| {"id":"0063t000013qmPFAAY", "latest_quote":"2022-11-25", "policy_price":"899.18"},   
|
| {"id":"0063t000013qQNTAA2", "latest_quote":"2022-11-30", "policy_price":" 3147.78"}]}|
+--------------------------------------------------------------------------------------+

其思想是所有行都位于一个称为unique_column的列中,

goqiplq2

goqiplq21#

我没有使用与您提问时完全相同的 Dataframe ,但我的解决方案中的列名是:

df = spark.createDataFrame([
    (1, 2, 3, 4),
    (1, 6, 7, 8)
], ['transaction_date', 'id', 'latest_quote', 'policy_price'])

df = (
    df
    .groupBy('transaction_date')
    .agg(f.collect_list(f.struct(f.col('id'), f.col('latest_quote'), f.col('policy_price'))).alias('trans_opp'))
    .withColumn('unique_column', f.struct(f.col('transaction_date'), f.col('trans_opp')))
    .withColumn('unique_column', f.to_json(f.col('unique_column')))
    .select('unique_column')
)

df.show(truncate= False)

输出:

+---------------+
|unique_column  |
+--------------------------------------------------------------------------------------------------------------------------+
|{"transaction_date":1,"trans_opp":[{"id":2,"latest_quote":3,"policy_price":4},{"id":6,"latest_quote":7,"policy_price":8}]}|
+--------------------------------------------------------------------------------------------------------------------------+

相关问题