基于Pyspark中日期的上一个月和下一个月、一年

ht4b089n  于 2022-11-28  发布在  Spark
关注(0)|答案(2)|浏览(311)

我有这样的数据

我需要这样的输出

如何在Pyspark中实现这一点?

zrfyljdw

zrfyljdw1#

使用日期函数。首先使用to_date将日期格式化为字符串
接下来,使用add_months创建previous和next。使用date_format对其进行格式化,并将每个数组存储在array中。将两个数组合并为struct列。分解struct列。下面是代码和逻辑
数据类型

df=spark.createDataFrame([('1'    , 'A' ,'14-02-22' ) ,
 ('1'    , 'B'    , '11-03-22' )],
  ('Id' , 'Status' , 'Date' ))
  
df.show()

+---+------+--------+
| Id|Status|    Date|
+---+------+--------+
|  1|     A|14-02-22|
|  1|     B|11-03-22|


(df.withColumn('Date',to_date('Date', "dd-MM-yy"))#Coerce string to date
 .withColumn('Date1', F.struct(array(date_format(add_months('Date',-1),"MMM yy"), date_format('Date',"MMM yy")).alias('Previous')#Create an array of date and previous day, store in struct as Previous
  ,array(date_format('Date',"MMM yy"),date_format(add_months('Date',1),"MMM yy")).alias('Next')#Create an array of date and next day, store in struct as Previous
  )).select('Id','Status','Date','Date1.*')#Select all required columns exploding Date1 with each struct element as column
).show(truncate=False)

+---+------+----------+----------------+----------------+
|Id |Status|Date      |Previous        |Next            |
+---+------+----------+----------------+----------------+
|1  |A     |2022-02-14|[Jan 22, Feb 22]|[Feb 22, Mar 22]|
|1  |B     |2022-03-11|[Feb 22, Mar 22]|[Mar 22, Apr 22]|
+---+------+----------+----------------+----------------+

也可以使用concat_ws,如下所示

(df.withColumn('Date',to_date('Date', "dd-MM-yy"))#Coerce string to date
 .withColumn('Date1', F.struct(concat_ws('-',lit((date_format(add_months('Date',-1),"MMM yy").astype('string'))), lit((date_format('Date',"MMM yy").astype('string')))).alias('Previous')#create string by concat of date with string format of the previous months date
  ,concat_ws('-',lit((date_format(add_months('Date',1),"MMM yy").astype('string'))), lit((date_format('Date',"MMM yy").astype('string')))).alias('Next')#Create a string by concat date with string format of the next months date
  )).select('Id','Status','Date','Date1.*')#Select all required columns exploding Date1 with each struct element as column
).show(truncate=False)

+---+------+----------+-------------+-------------+
|Id |Status|Date      |Previous     |Next         |
+---+------+----------+-------------+-------------+
|1  |A     |2022-02-14|Jan 22-Feb 22|Mar 22-Feb 22|
|1  |B     |2022-03-11|Feb 22-Mar 22|Apr 22-Mar 22|
+---+------+----------+-------------+-------------+
3b6akqbq

3b6akqbq2#

执行此任务所需的基本函数包括:使用“date_format”将日期设置为所需格式,使用“add_months”将日期加减。

from pyspark.sql import functions as F

date_df = spark.createDataFrame(
    [
        ('A', '02/09/2022'),
        ('B', '02/07/2022'),],
    ['name', 'date'])

(
    date_df
    .withColumn('date', F.to_date('date', 'dd/MM/yyyy'))
    .withColumn(
        'current_month', 
        F.date_format(F.col('date'), 'MMM yyyy'))
    .withColumn(
        'prev_month',
        F.date_format(
            F.add_months(F.col('date'),1), 
            'MMM yyyy'))
    .withColumn(
        'next_month', 
        F.date_format(
            F.add_months(F.col('date'),-1), 
            'MMM yyyy'))
    .withColumn(
        'Previous',
        F.concat(F.col('prev_month'), F.lit('-'), F.col('current_month')))
    .withColumn(
        'Next',
        F.concat(F.col('current_month'), F.lit('-'), F.col('next_month')))

).show()

+----+----------+-------------+----------+----------+-----------------+-----------------+
|name|      date|current_month|prev_month|next_month|         Previous|             Next|
+----+----------+-------------+----------+----------+-----------------+-----------------+
|   A|2022-09-02|     Sep 2022|  Oct 2022|  Aug 2022|Oct 2022-Sep 2022|Sep 2022-Aug 2022|
|   B|2022-07-02|     Jul 2022|  Aug 2022|  Jun 2022|Aug 2022-Jul 2022|Jul 2022-Jun 2022|
+----+----------+-------------+----------+----------+-----------------+-----------------+

相关问题