使用scala中dataframe中的现有数据在dataframe中创建arraytype列

dohp0rv5 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(467)

这个问题在这里已经有答案了：

如何使用groupby将行收集到Map中(3个答案）
8个月前关门了。
我有一个Dataframe所有者

accoutMasterId | OwnerMasterId |Owner name |

123            | ABC           | Jack      |

456            | DEF           | Amy       |

789            | ABC           | Rach      |

我想要一个新的Dataframe，它有如下数据：

accoutMasterIdArray | OwnerMasterId 

{123,789}           | ABC    

{456}               | DEF

AccountMasterIDArray字段将为arraytype。有什么建议吗？

scala apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/61735364/creating-arraytype-column-in-a-dataframe-using-existing-data-in-dataframe-in-sca

1条答案

按热度按时间

qxsslcnc1#

使用 .groupBy 以及 collect_list 函数来创建数组。

//sample dataframe 
ownerMaster.show()
//+---------------+-------------+---------+
//|accountMasterId|OwnerMasterId|Ownername|
//+---------------+-------------+---------+
//|            123|          ABC|     Jack|
//|            456|          DEF|      Amy|
//|            789|          ABC|     Rach|
//+---------------+-------------+---------+

ownerMaster.groupBy("OwnerMasterId").
agg(collect_list(col("accountMasterId")).alias("accoutMasterIdArray")).
show()

//casting array as string type then write as csv file
ownerMaster.groupBy("OwnerMasterId").
agg(collect_list(col("accountMasterId")).cast("string").alias("accoutMasterIdArray")).
show()
//+-------------+-------------------+
//|OwnerMasterId|accoutMasterIdArray|
//+-------------+-------------------+
//|          DEF|              [456]|
//|          ABC|         [123, 789]|
//+-------------+-------------------+

//schema
ownerMaster.groupBy("OwnerMasterId").agg(collect_list(col("accountMasterId")).alias("accoutMasterIdArray")).printSchema
//root
// |-- OwnerMasterId: string (nullable = true)
// |-- accoutMasterIdArray: array (nullable = true)
// |    |-- element: integer (containsNull = true)

赞(0）回复(0）举报 2021-05-27

我来回答

使用scala中dataframe中的现有数据在dataframe中创建arraytype列

1条答案

相关问题

热门标签

最新问答