在两个给定的时间戳之间创建时间序列(范围)

0lvr5msh 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(483)

我有一个pysparkDataframedf

---------------------------------------------------------
primaryKey |   start_timestamp   |   end_timestamp
---------------------------------------------------------
 key1      | 2020-08-13 15:40:00 | 2020-08-13 15:44:47
 key2      | 2020-08-14 12:00:00 | 2020-08-14 12:01:13

我想创建一个Dataframe，它的时间序列在所有键的start\u timestamp和end\u timestamp之间，间隔为x秒。例如，对于x=120秒的间隙，输出如下：-

-----------------------------------------------------------
primaryKey |  start_timestamp_new  | end_timestamp_new
   key1    |  2020-08-13 15:40:00  | 2020-08-13 15:41:59
   key1    |  2020-08-13 15:42:00  | 2020-08-13 15:43:59
   key1    |  2020-08-13 15:44:00  | 2020-08-13 15:45:59
   key2    |  2020-08-14 12:00:00  | 2020-08-14 12:01:59

我正在尝试使用这里提到的方法，但无法将其应用于sparkDataframe。
任何关于创建这个的信息都会有很大的帮助。

python apache-spark pyspark apache-spark-sql user-defined-functions

来源：https://stackoverflow.com/questions/63633305/create-timeseries-range-between-two-given-timestamp

1条答案

按热度按时间

kuhbmx9i1#

你可以用 sequence 功能。

x = 120

df.withColumn('start_timestamp', to_timestamp('start_timestamp')) \
  .withColumn('end_timestamp', to_timestamp('end_timestamp')) \
  .withColumn('start_timestamp', explode(sequence('start_timestamp', 'end_timestamp', expr(f'interval {x} seconds')))) \
  .withColumn('end_timestamp', col('start_timestamp') + expr(f'interval {x - 1} seconds')) \
  .show()

+----------+-------------------+-------------------+
|primaryKey|    start_timestamp|      end_timestamp|
+----------+-------------------+-------------------+
|      key1|2020-08-13 15:40:00|2020-08-13 15:41:59|
|      key1|2020-08-13 15:42:00|2020-08-13 15:43:59|
|      key1|2020-08-13 15:44:00|2020-08-13 15:45:59|
|      key2|2020-08-14 12:00:00|2020-08-14 12:01:59|
+----------+-------------------+-------------------+

赞(0）回复(0）举报 2021-05-27

我来回答

在两个给定的时间戳之间创建时间序列(范围)

1条答案

相关问题

热门标签

最新问答