我是一个新的Spark我想合并两个 Dataframe 到一个单一的表与一些组的逻辑。尝试使用收集列表和收集集,但没有得到正确的输出
Parent table
+-------------------+------+--------------------+
| individual_id| name| age|
+-------------------+------+--------------------+
|1.00000000000000000|vishal|30.00000000000000000|
|1.00000000000000000|vishal|30.00000000000000000|
+-------------------+------+--------------------+
另一个子表作为
+-------------------+--------------+-----------------------+
| order_id|sum_item_price|sales_order_product_dlm|
+-------------------+--------------+-----------------------+
|1.00000000000000000| 1500.00| [{2.0000000000000...|
+-------------------+--------------+-----------------------+
如果我将子数据框转换为单列数据框,转换为json列值
childataframe.toJson().show();
+--------------------+
| value|
+--------------------+
|{"order_id":1.000...|
+--------------------+
ExpectedOutput我想将子json值列合并到父数据框中,并在individual_id列上进行group by,这样输出将如下所示
+-------------------+------+--------------------+--------+
| individual_id| name| age| value
+-------------------+------+--------------------+--------
|1.00000000000000000|vishal|30.00000000000000000|{"order_id":1.000...
+-------------------+------+--------------------+----------------
子数据框和父数据框都属于同一方案
1条答案
按热度按时间2vuwiymt1#
导入必要的包
输出