pyspark最新值替换组中的所有其他值

vsikbqxv  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(452)

我们有以下PyparkDataframe:

  1. +----+----------+----------+----------+---------+
  2. |year|language_1| summary_1|language_2|summary_2|
  3. +----+----------+----------+----------+---------+
  4. |2013| Java| Great| Python| Briliant|
  5. |2014| Python| Awesome| Scala| Horrible|
  6. |2015| Python| Amazing| Java| Wow|
  7. |2016| Python|Incredible| C++| Nice|
  8. |2017| Scala| Good| C++| Noway|
  9. |2018| Scala| Fantastic| C++| Cool|
  10. +----+----------+----------+----------+---------+

这个问题有点难以解释,所以请容忍我。对于语言1和语言2中的所有相同语言,我希望能够使用“年”列作为分界符来调整摘要1和摘要2列的值,因此相同语言应该为该语言选择具有最大年份的行,并将摘要1和摘要2中的所有摘要更改为等于最大年份行的摘要)。例如,对于python,我希望能够用“incredible”替换所有摘要,因为“incredible”行是python最近的一年。以此类推。所以结果是:

  1. +----+----------+----------+----------+----------+
  2. |year|language_1| summary_1|language_2| summary_2|
  3. +----+----------+----------+----------+----------+
  4. |2013| Java| Wow| Python|Incredible|
  5. |2014| Python|Incredible| Scala| Fantastic|
  6. |2015| Python|Incredible| Java| Wow|
  7. |2016| Python|Incredible| C++| Cool|
  8. |2017| Scala| Fantastic| C++| Cool|
  9. |2018| Scala| Fantastic| C++| Cool|
  10. +----+----------+----------+----------+----------+
gdx19jrr

gdx19jrr1#

不确定这是否是最好的方法,但您可以首先融化数据框,使其仅包含3列(年份、语言、摘要),使用上一个问题的答案,然后旋转数据框以恢复原始结构:

  1. df2 = df.selectExpr(
  2. 'year',
  3. 'posexplode(array(struct(language_1 as language, summary_1 as summary), struct(language_2 as language, summary_2 as summary)))'
  4. ).select(
  5. 'year', 'pos', 'col.*'
  6. ).withColumn(
  7. 'summary',
  8. F.max(F.struct('year', 'summary')).over(Window.partitionBy('language'))['summary']
  9. ).groupBy('year').pivot('pos').agg(
  10. F.first(F.struct('language', 'summary'))
  11. ).select(
  12. 'year', '0.*', '1.*'
  13. ).toDF(*df.columns).orderBy('year')
  14. df2.show()
  15. +----+----------+----------+----------+----------+
  16. |year|language_1| summary_1|language_2| summary_2|
  17. +----+----------+----------+----------+----------+
  18. |2013| Java| Wow| Python|Incredible|
  19. |2014| Python|Incredible| Scala| Fantastic|
  20. |2015| Python|Incredible| Java| Wow|
  21. |2016| Python|Incredible| C++| Cool|
  22. |2017| Scala| Fantastic| C++| Cool|
  23. |2018| Scala| Fantastic| C++| Cool|
  24. +----+----------+----------+----------+----------+
展开查看全部

相关问题