pyspark 如何在Databricks中查找从间隔日到第二个对象的持续时间(以秒为单位)

x33g5p2x  于 2024-01-06  发布在  Spark
关注(0)|答案(1)|浏览(198)

在一个数据库SQL查询中,我创建了一个列来获取连续行的时间戳之差。我使用了如下方法:

  1. select
  2. (Timestamp - (LAG(Timestamp, 1) OVER (partition by colA order by Timestamp))) tdiff,
  3. *
  4. from
  5. the_table

字符串
但这给了我interval day to second类型的值,如{"seconds": 180, "nano": 0, "negative": false, "zero": false, "units": ["SECONDS", "NANOS"]}
我怎么能把它转换成秒呢?对于这个例子,我基本上应该看到180。

9w11ddsr

9w11ddsr1#

看起来在早期的运行时中(至少在13.2中测试过)datediff支持秒单位。
https://docs.databricks.com/en/sql/language-manual/functions/datediff3.html
这只是生成一些随机id,其中包含多个值和模糊时间以显示语法。
还显示了与原始姿势相同的间隔提取物。

  1. %sql
  2. with some_ids_and_random_ints as (
  3. select
  4. id,
  5. floor(rand(5) * 1000) random_int
  6. from
  7. range(1000) AS t
  8. ),
  9. force_dups_on_id as (
  10. select
  11. mod(id, 10) as id_with_dups,
  12. timestampadd(second, random_int, current_timestamp()) as some_time
  13. from
  14. some_ids_and_random_ints
  15. ),
  16. result_compare as (
  17. select
  18. *,
  19. datediff(
  20. second,
  21. lag(some_time) over (
  22. partition by id_with_dups
  23. order by
  24. some_time
  25. ),
  26. some_time
  27. ) as diff_inbuilt_function,
  28. (some_time - (LAG(some_time, 1) OVER (partition by id_with_dups order by some_time))) tdiff_as_interval
  29. from
  30. force_dups_on_id
  31. order by
  32. id_with_dups,
  33. some_time
  34. )
  35. select id_with_dups,
  36. some_time,
  37. diff_inbuilt_function,
  38. cast(extract(second from tdiff_as_interval) as integer) as seconds_from_interval
  39. from result_compare

字符串

展开查看全部

相关问题