查询条件的前几行

ubof19bj  于 2021-07-29  发布在  Java
关注(0)|答案(4)|浏览(302)

我在一个网站上有一个关于用户机票预订模式的数据表。假设下面的数据是关于我的用户的所有历史数据。
这个 session_date 是指用户进入网站并搜索特定路线的那一天,而 flight_date 是航班的起飞日期。我已按顺序订了这张table session_date . 结果记录在 booked .

+---------+--------------+----------------+--------------+-------------+--------+
| user_id | session_date | departure_code | arrival_code | flight_date | booked |
+---------+--------------+----------------+--------------+-------------+--------+
| user1   | 7 Jan        | CA             | MY           | 8 Mar       |      1 |
| user1   | 8 Jan        | US             | MY           | 18 May      |      0 |
| user1   | 8 Jan        | US             | MY           | 18 May      |      1 |
| user1   | 8 Jan        | CA             | MY           | 19 Mar      |      0 |
| user1   | 9 Jan        | US             | MY           | 18 May      |      1 |
+---------+--------------+----------------+--------------+-------------+--------+

我想在我的表中输出一个新的列,名为 previous_flight_date . 新列将说明每次搜索的前一个 flight_date 为了那条特定的路线。即使用户多次搜索同一路线但从未预订,此列中的值也将为空。

+-------+--------------+----------------+--------------+-------------+--------+----------------------+
|  _id  | session_date | departure_code | arrival_code | flight_date | booked | previous_flight_date |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| user1 | 7 Jan        | CA             | SG           | 8 Mar       |      1 | null                 |
| user1 | 8 Jan        | US             | MY           | 18 May      |      0 | null                 |
| user1 | 8 Jan        | US             | MY           | 18 May      |      1 | null                 |
| user1 | 8 Jan        | CA             | SG           | 19 Mar      |      0 | 8 Mar                |
| user1 | 2 Feb        | US             | MY           | 2 Jul       |      1 | 18 May               |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+

因此,例如,该列在第4行之前将为空,该行反映“3月8日”,因为用户已从ca-->sg预订了当天出发的航班。
我试过使用last\u值,但没用。我也不知道当我有多种不同类型的路由时,如何使用lag(),并且我想在一个条件下找到前面的行。如果有人提出解决方案,那就太好了!谢谢您。

jgovgodb

jgovgodb1#

我想你可以用 first_value() . 诀窍是在window函数中设置一个条件,打开 ignore nulls 选项,然后使用回顾具有相同出发/到达(不包括当前行)的前几行的窗框规范:

select
    t.*,
    first_value(case when booked = 1 then flight_date end ignore nulls) over(
        partition by departure_code, arrival code
        order by flight_date desc
        rows between unbounded preceding and 1 preceding
    ) previous_flight_date
from mytable t

实际上是一扇Windows max() 也会工作(然后,不需要 ignore nulls ):

select
    t.*,
    max(case when booked = 1 then flight_date end) over(
        partition by departure_code, arrival code
        order by flight_date desc
        rows between unbounded preceding and 1 preceding
    ) previous_flight_date
from mytable t
xoshrz7s

xoshrz7s2#

我一开始就同意你的建议 LAG ,但随后发现用短语表达查询相当困难。对于一种不使用分析函数的方法,我们可以尝试使用相关子查询来识别同一航线上最近预订的航班日期。

SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
       (SELECT t2.flight_date FROM yourTable t2
        WHERE t2.departure_code = t1.departure_code AND
              t2.arrival_code = t1.arrival_code AND
              t2.booked = 1 AND
              t2.flight_date < t1.flight_date
        ORDER BY t2.flight_date DESC LIMIT 1) AS previous_flight_date
FROM yourTable t1
ORDER BY flight_date;

演示

演示了mariadb,但是相同的查询应该在bigquery上运行,没有任何问题。

oyxsuwqo

oyxsuwqo3#

下面是一个使用窗口功能的基于sql server的解决方案。大查询解决方案应该类似于标准的窗口功能

SELECT
    *
    , Previous_Flight_Date = MAX(CASE booked = 1 THEN flight_date ELSE NULL END ) 
                             OVER (
                                    PARTITION BY user_id, departure_code, arrival_code
                                    ORDER BY flight_date
                                    ROWS UNBOUNDED PRECEDING AND 1 PRECEDING
                             )
FROM historicTable t
yqyhoc1h

yqyhoc1h4#

下面是bigquery标准sql


# standardSQL

SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
  MAX(IF(booked = 1, flight_date, NULL)) OVER(previous_flights) AS previous_flight_date
FROM `project.dataset.table` 
WINDOW previous_flights AS (
  PARTITION BY user_id, departure_code, arrival_code 
  ORDER BY flight_date 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)

如果要应用于您问题中的样本数据,请参见下面的示例


# standardSQL

WITH `project.dataset.table` AS (
  SELECT 'user1' AS user_id, DATE '2020-01-07' AS session_date, 'CA' AS departure_code, 'SG' AS arrival_code, DATE '2020-03-08' AS flight_date, 1 AS booked UNION ALL
  SELECT 'user1', '2020-01-08', 'US', 'MY', '2020-05-18', 0 UNION ALL
  SELECT 'user1', '2020-01-08', 'US', 'MY', '2020-05-18', 1 UNION ALL
  SELECT 'user1', '2020-01-08', 'CA', 'SG', '2020-03-19', 0 UNION ALL
  SELECT 'user1', '2020-02-09', 'US', 'MY', '2020-07-02', 1
)
SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
  MAX(IF(booked = 1, flight_date, NULL)) OVER(previous_flights) AS previous_flight_date
FROM `project.dataset.table` 
WINDOW previous_flights AS (
  PARTITION BY user_id, departure_code, arrival_code 
  ORDER BY flight_date 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
-- ORDER BY flight_date

输出为

Row user_id session_date    departure_code  arrival_code    flight_date booked  previous_flight_date     
1   user1   2020-01-07      CA              SG              2020-03-08  1       null     
2   user1   2020-01-08      CA              SG              2020-03-19  0       2020-03-08   
3   user1   2020-01-08      US              MY              2020-05-18  0       null     
4   user1   2020-01-08      US              MY              2020-05-18  1       null     
5   user1   2020-02-09      US              MY              2020-07-02  1       2020-05-18

相关问题