pyspark:将具有特定时间戳的行插入到Dataframe中

vshtjzan  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(409)

我有以下sparkDataframe:

id  |  time                   |Value|    
   id1 | 2020-02-22 04:57:36.843 | 1.4 |
   id2 | 2020-02-22 04:57:50.850 | 1.7 |
   id3 | 2020-02-22 04:58:02.133 | 1.2 |

我希望在现有行之间插入具有一定时间距离(例如5s)的行。输出应如下所示:

id  |  time                   |Value|    
   id1 | 2020-02-22 04:57:36.843 | 1.4 |
   id1 | 2020-02-22 04:57:41.843 |     |
   id1 | 2020-02-22 04:57:46.843 |     |
   id1 | 2020-02-22 04:57:51.843 |     |
   id2 | 2020-02-22 04:57:50.850 | 1.7 |
   id2 | 2020-02-22 04:57:55.850 |     |
   id2 | 2020-02-22 04:58:00.850 |     |
   id2 | 2020-02-22 04:58:05.850 |     |
   id3 | 2020-02-22 04:58:02.133 | 1.2 |
   id3 | 2020-02-22 04:58:07.133 |     |
   id3 | 2020-02-22 04:58:12.133 |     |
   id3 | 2020-02-22 04:58:17.133 |     |

我试着通过for循环来实现这一点,创建新的Dataframe(每一行),并通过“union”将其合并到现有的Dataframe中,但没有成功。我尤其不能用这种方法得到身份证。
你知道我怎样才能达到我想要的产量吗?

lhcgjxsq

lhcgjxsq1#

这里是我的尝试与一些修改,例如,我不能理解如何存在62秒。

from pyspark.sql.functions import *
from pyspark.sql import Window

w = Window.orderBy('time')

df.select('id', 'time') \
  .withColumn('time', to_timestamp('time', 'yyyy-MM-dd HH:mm:ss.SSS')) \
  .withColumn('time2', coalesce(lead('time', 1).over(w), expr('time + interval 10 seconds'))) \
  .withColumn('seq', expr("sequence(time, time2 + interval 5 seconds, interval 5 seconds)")) \
  .withColumn('time', explode('seq')) \
  .select('id', 'time') \
  .join(df, ['id', 'time'], 'left') \
  .fillna(0).show(20, False)

+---+-----------------------+-----+
|id |time                   |Value|
+---+-----------------------+-----+
|id1|2020-02-22 04:57:36.843|1.4  |
|id1|2020-02-22 04:57:41.843|0.0  |
|id1|2020-02-22 04:57:46.843|0.0  |
|id1|2020-02-22 04:57:51.843|0.0  |
|id2|2020-02-22 04:57:50.85 |1.7  |
|id2|2020-02-22 04:57:55.85 |0.0  |
|id2|2020-02-22 04:58:00.85 |0.0  |
|id3|2020-02-22 04:57:59.133|1.2  |
|id3|2020-02-22 04:58:04.133|0.0  |
|id3|2020-02-22 04:58:09.133|0.0  |
|id3|2020-02-22 04:58:14.133|0.0  |
+---+-----------------------+-----+

相关问题