spark scala

jdg4fx2g  于 2021-05-19  发布在  Spark
关注(0)|答案(1)|浏览(505)

我在Dataframe(scala)中有一个时间戳列,希望从中获取毫秒。unix\u timestamp是到秒斩波,我不能做unix\u timestamp*1000,因为我在寻找精确的毫秒转换
输入Dataframe

  1. +---------+-----------------------+-----+-----------------------+
  2. |OrderName|DateTime |Count|timestamp |
  3. +---------+-----------------------+-----+-----------------------+
  4. |a |2020-07-11 23:58:45.538|1 |2020-07-11 23:58:45.538|
  5. |a |2020-07-12 00:00:07.307|2 |2020-07-12 00:00:07.307|
  6. |a |2020-07-12 00:01:08.817|3 |2020-07-12 00:01:08.817|
  7. |a |2020-07-12 00:02:15.675|1 |2020-07-12 00:02:15.675|
  8. |a |2020-07-12 00:05:48.277|1 |2020-07-12 00:05:48.277|
  9. +---------+-----------------------+-----+-----------------------+
  10. Second column is string and i used to to_timestamp($"DateTime") to get 4th column
  1. Example 2020-07-11 23:58:45.538 -> 1594537125538
wh6knrhe

wh6knrhe1#

您可以使用一个自定义项来获取此值,该自定义项将字符串读入一个瞬间,然后将其转换为epoch毫秒:

  1. import org.apache.spark.sql.functions._
  2. import java.time._
  3. import java.time.format.DateTimeFormatter
  4. //...
  5. spark.udf.register("to_epoch_millis",
  6. (s: String) => LocalDateTime.parse(s, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS"))
  7. .toInstant(ZoneOffset.UTC).toEpochMilli())

然后

  1. df.selectExpr("to_epoch_millis(DateTime) as ts").show()
  1. +-------------+
  2. | ts|
  3. +-------------+
  4. |1594511925538|
  5. |1594512007307|
  6. +-------------+

以上假设 DateTime 是utc时间戳。

展开查看全部

相关问题