使用配置单元/scala中的相对日期选择记录

hrirmatl  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(250)

有没有解决这个问题的好方法:假设我想为给定的分组选择至少比先前选择的记录早6个月的记录。
例如,我有:

Col A   Col B   Date
1       A       2015-01-01 00:00:00
1       A       2014-10-01 00:00:00
1       A       2014-05-01 00:00:00
1       A       2014-01-01 00:00:00
1       B       2014-01-01 00:00:00
2       A       2015-01-01 00:00:00
2       A       2014-10-01 00:00:00
2       A       2014-01-01 00:00:00
2       A       2013-10-01 00:00:00

我只想选择与之前选择的日期相隔至少6个月的日期。即它将返回:

Col A   Col B   Date
1       A       2015-01-01 00:00:00
1       A       2014-05-01 00:00:00
1       B       2014-01-01 00:00:00
2       A       2015-01-01 00:00:00
2       A       2014-01-01 00:00:00

如果您想选择与最新订单相关的订单,我很清楚如何使用订单来完成这项工作
(即:

SELECT b.date, b..., a.latest_date 
FROM(
SELECT *, row_number OVER PARTITION BY Col A, Col B ORDER BY Date as row_number
FROM table1) temp
WHERE row_number = 1) a
INNER JOIN TABLE 1 b
ON KEY)
WHERE datediff(date, latestdate)/365 > 0.5

差不多吧
,但我有点不清楚你们是怎么做到的。有没有一种方法可以在hive/scala中递归地实现这一点?

qzlgjiam

qzlgjiam1#

嗨,有一个窗口滞后和超前的概念,概念是Hive和Spark,你可以实现这两个任务。这是spark中的代码。

val data = sc.parallelize(List(("1",       "A",       "2015-01-01 00:00:00"),
     |      | ("1",       "A",       "2014-10-01 00:00:00"),
     |      | ("1",       "A",       "2014-01-01 00:00:00"),
     |      | ("1",       "B",       "2014-01-01 00:00:00"),
     |      | ("2",       "A",       "2015-01-01 00:00:00"),
     |      | ("2",       "A",       "2014-10-01 00:00:00"),
     |      | ("2",       "A",       "2014-01-01 00:00:00"),
     |      | ("2",       "A",       "2013-10-01 00:00:00")
     |      | )).toDF("id","status","Date");

    val data2 =data.select($"id",$"status",to_date($"Date").alias(date));

import org.apache.spark.sql.expressions.Window

import org.apache.spark.sql.functions._

val wSpec3 = Window.partitionBy("id","status").orderBy(desc("date"));

val data3 = data2.withColumn("diff",datediff(lag(data2("date"), 1).over(wSpec3),$"date")).filter($"diff">182.5 || $"diff".isNull);

data3.show
+---+------+----------+----+
| id|status|      date|diff|
+---+------+----------+----+
|  1|     A|2015-01-01|null|
|  1|     A|2014-01-01| 273|
|  1|     B|2014-01-01|null|
|  2|     A|2015-01-01|null|
|  2|     A|2014-01-01| 273|
+---+------+----------+----+

相关问题