在pyspark中转换Dataframe模式

sqxo8psd  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(493)

这个问题在这里已经有了答案

spark scalaDataframe:将多列合并为单列(1个答案)
7个月前关门了。
我有一个Dataframe

+------------------+-------------------+--------------------+
|              name|                sku|         description|
+------------------+-------------------+--------------------+
|    Mary Rodriguez| hand-couple-manage|Senior word socia...|
|    Jose Henderson| together-table-oil|Apply girl treatm...|
|    Karen Villegas|     child-somebody|Every tell serve....|
|      Olivia Lynch|forget-matter-avoid|Perhaps environme...|
|     Whitney Wiley|    side-blue-dream|Quickly short soc...|
|  Brittany Johnson|        east-pretty|Indicate view sim...|
|       Paul Morris|    radio-window-us|Society month sho...|
|   Jason Patterson|   night-art-be-act|Entire around pla...|
|      Kiara Gentry|   compare-politics|Air my kind staff...|

架构

root
 |-- sku: string (nullable = true)
 |-- name_description: array (nullable = true)
 |    |-- element: string (containsNull = true)

如何按列分组 sku 以及从 name 以及 description 获取列 name_description 将值作为 JSON 格式 [{"name":..., "description":...}, {"name":..., "description":...}, ....] 对于中的每个值 sku 在Pypark?

bbuxkriu

bbuxkriu1#

检查以下代码。

df.show(false)
+---------------+-------------------+-------------------+
|name           |sku                |description        |
+---------------+-------------------+-------------------+
|MaryRodriguez  |hand-couple-manage |Seniorwordsocia... |
|JoseHenderson  |together-table-oil |Applygirltreatm... |
|KarenVillegas  |child-somebody     |Everytellserve.... |
|OliviaLynch    |forget-matter-avoid|Perhapsenvironme...|
|WhitneyWiley   |side-blue-dream    |Quicklyshortsoc... |
|BrittanyJohnson|east-pretty        |Indicateviewsim... |
|PaulMorris     |radio-window-us    |Societymonthsho... |
|JasonPatterson |night-art-be-act   |Entirearoundpla... |
|KiaraGentry    |compare-politics   |Airmykindstaff...  |
+---------------+-------------------+-------------------+
df.groupBy(F.col("sku").agg(F.collect_list(F.struct(F.col("name"),F.col("description"))).alias("name_description")).toJSON.show(false)
+-------------------------------------------------------------------------------------------------------------+
|value                                                                                                        |
+-------------------------------------------------------------------------------------------------------------+
|{"sku":"hand-couple-manage","name_description":[{"name":"MaryRodriguez","description":"Seniorwordsocia..."}]}|
|{"sku":"night-art-be-act","name_description":[{"name":"JasonPatterson","description":"Entirearoundpla..."}]} |
|{"sku":"forget-matter-avoid","name_description":[{"name":"OliviaLynch","description":"Perhapsenvironme..."}]}|
|{"sku":"compare-politics","name_description":[{"name":"KiaraGentry","description":"Airmykindstaff..."}]}     |
|{"sku":"child-somebody","name_description":[{"name":"KarenVillegas","description":"Everytellserve...."}]}    |
|{"sku":"side-blue-dream","name_description":[{"name":"WhitneyWiley","description":"Quicklyshortsoc..."}]}    |
|{"sku":"radio-window-us","name_description":[{"name":"PaulMorris","description":"Societymonthsho..."}]}      |
|{"sku":"east-pretty","name_description":[{"name":"BrittanyJohnson","description":"Indicateviewsim..."}]}     |
|{"sku":"together-table-oil","name_description":[{"name":"JoseHenderson","description":"Applygirltreatm..."}]}|
+-------------------------------------------------------------------------------------------------------------+

相关问题