计算ms spark sql中的差异

c86crjj0  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(451)

这个问题在这里已经有答案了

使用scala的spark 2.0时间戳差异(毫秒)(2个答案)
六个月前关门了。
目前基于scala中的sparksql来计算两列之间的差异

writingTime,time
 2020-06-25T13:29:33.415Z,2020-06-25T13:29:33.190Z

我不确定 415Z 对应于我使用的微秒,代码如下:

import org.apache.spark.sql.functions._

 val resultDf = df.withColumn("date_diff_seconds",
      $"writingTime".cast("timestamp").cast("long") -  $"time".cast("timestamp").cast("long"))

但是这段代码不会超过秒来计算时间戳的差异
你知道怎么修吗?

bt1cpqcv

bt1cpqcv1#

以毫秒为单位获取时间戳差异的替代方法-

加载测试数据

val data =
      """
        |writingTime,time
        |  2020-06-25T13:29:33.415Z,2020-06-25T13:29:33.190Z
      """.stripMargin
    val stringDS1 = data.split(System.lineSeparator())
      .map(_.split("\\,").map(_.replaceAll("""^[ \t]+|[ \t]+$""", "")).mkString(","))
      .toSeq.toDS()
    val df1 = spark.read
      .option("sep", ",")
      .option("inferSchema", "true")
      .option("header", "true")
      .option("nullValue", "null")
      .csv(stringDS1)
    df1.show(false)
    df1.printSchema()
    /**
      * +-----------------------+----------------------+
      * |writingTime            |time                  |
      * +-----------------------+----------------------+
      * |2020-06-25 18:59:33.415|2020-06-25 18:59:33.19|
      * +-----------------------+----------------------+
      *
      * root
      * |-- writingTime: timestamp (nullable = true)
      * |-- time: timestamp (nullable = true)
      */

以毫秒为单位查找差异

请注意,列已经是timestamp类型,因此不需要强制转换为 timestamp ```
val millis = udf((start: java.sql.Timestamp, end: java.sql.Timestamp) => end.getTime - start.getTime)
df1.withColumn("date_diff_millis", millis($"time", $"writingTime"))
.show(false)

/**
  * +-----------------------+----------------------+----------------+
  * |writingTime            |time                  |date_diff_millis|
  * +-----------------------+----------------------+----------------+
  * |2020-06-25 18:59:33.415|2020-06-25 18:59:33.19|225             |
  * +-----------------------+----------------------+----------------+
  */

### 将时间戳强制转换为long时,它会将其转换为秒(unix epoch)

注意差值为零

df1.withColumn("date_diff_millis", $"time".cast("long") - $"writingTime".cast("long"))
.show(false)

/**
  * +-----------------------+----------------------+----------------+
  * |writingTime            |time                  |date_diff_millis|
  * +-----------------------+----------------------+----------------+
  * |2020-06-25 18:59:33.415|2020-06-25 18:59:33.19|0               |
  * +-----------------------+----------------------+----------------+
  */

相关问题