如何使用pyspark从Dataframe的日期列中提取年份

0kjbasz6  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(568)

我最近开始使用pyspark,我正在尝试从Dataframe的add\u date列中提取年份的不同方法,并在同一Dataframe中创建一个名为year的新列。

  1. +-------+-------+-----+---------+-----------------+
  2. |show_id| type|title| country| date_added|
  3. +-------+-------+-----+---------+-----------------+
  4. | s1|TV Show| 3%| Brazil| August 14, 2020|
  5. | s2| Movie| 7:19| Mexico|December 23, 2016|
  6. | s3| Movie|23:59|Singapore|December 20, 2018|
  7. +-------+-------+-----+---------+-----------------+
nwsw7zdq

nwsw7zdq1#

可以使用子字符串:

  1. import pyspark.sql.functions as F
  2. df2 = df.withColumn('year', F.expr('substring(date_added, -4)'))
  3. df2.show()
  4. +-------+-------+-----+---------+-----------------+----+
  5. |show_id| type|title| country| date_added|year|
  6. +-------+-------+-----+---------+-----------------+----+
  7. | s1|TV Show| 3%| Brazil| August 14, 2020|2020|
  8. | s2| Movie| 7:19| Mexico|December 23, 2016|2016|
  9. | s3| Movie|23:59|Singapore|December 20, 2018|2018|
  10. +-------+-------+-----+---------+-----------------+----+

相关问题