pyspark-通过比较不同Dataframe中的值,根据条件更新Dataframe

ht4b089n  于 2021-05-19  发布在  Spark
关注(0)|答案(1)|浏览(514)

我有两个Dataframe:
旧的Dataframe将是一个固定的Dataframe;新东东-每天都在变化。
对于每个id,基于new琰df中age列的值,旧琰df中的age将被更新,并且每当有更新时计数器将递增1。如果年龄没有变化,则计数器和年龄列保持相同的值(无增量)
旧的(编辑,包括第5个id)

  1. id age counter
  2. 1 12 0
  3. 2 15 0
  4. 3 22 0
  5. 4 19 0
  6. 5 11 0

新建\u df

  1. id age
  2. 1 20
  3. 2 21
  4. 3 22
  5. 4 19

旧数据框的输出现在应该是:
旧数据框:

  1. id age counter
  2. 1 20 1
  3. 2 21 1
  4. 3 22 0
  5. 4 19 0
  6. 5 11 0

到目前为止,我尝试了以下方法:

  1. df_old = df_old.withColumnRenamed('id','id_old')\
  2. .withColumnRenamed('age','age_old')
  3. joinedDF = df_old.join(df_new, df_new["id"] == df_old["id_old"], "outer")
  4. if(joinedDF.select(joinedDF.age_old != joinedDF.age)):
  5. joinedDF = joinedDF.withColumn("age_old",joinedDF['age'])
  6. joinedDF = joinedDF.withColumn("counter",joinedDF['counter']+1)
  7. joinedDF[['id_old', 'age_old', 'counter']].toPandas()
  8. id_old age_old counter
  9. 1 20 1
  10. 2 21 1
  11. 3 22 1
  12. 4 19 1

如果你能看到我得到的id\u old=3和4的输出,我应该得到0的计数器值,但是得到1。感谢您的帮助

uyhoqukh

uyhoqukh1#

这样你才能达到同样的效果,

  1. row = Row('id', 'age','counter')
  2. old_df = spark.createDataFrame([row(1, 12, 0), row(2, 15, 0), row(3, 22, 0), row(4, 19, 0)])
  3. old_df.show()
  4. row2 = Row('id', 'age')
  5. new_df = spark.createDataFrame([row(1, 20), row(2, 21), row(3, 22), row(4, 19)])
  6. new_df.show()
  7. old_df = old_df.alias("old_df").join(new_df.alias("new_df"), old_df.id == new_df.id, "inner").selectExpr("old_df.id as id", "new_df.age as age ",
  8. "if(old_df.age != new_df.age, old_df.counter+1,old_df.counter) as counter").sort("id")
  9. old_df.show()

输出:

  1. +---+---+-------+
  2. | id|age|counter|
  3. +---+---+-------+
  4. | 1| 12| 0|
  5. | 2| 15| 0|
  6. | 3| 22| 0|
  7. | 4| 19| 0|
  8. +---+---+-------+
  9. +---+---+
  10. | id|age|
  11. +---+---+
  12. | 1| 20|
  13. | 2| 21|
  14. | 3| 22|
  15. | 4| 19|
  16. +---+---+
  17. +---+---+-------+
  18. | id|age|counter|
  19. +---+---+-------+
  20. | 1| 20| 1|
  21. | 2| 21| 1|
  22. | 3| 22| 0|
  23. | 4| 19| 0|
  24. +---+---+-------+
展开查看全部

相关问题