将带变量的MySQL查询转换为Postgres查询

s5a0g9ez  于 2022-11-28  发布在  Mysql
关注(0)|答案(1)|浏览(118)

我有一个可以正常工作的MySQL查询:

SELECT MAX(count) as `max_streak`
FROM (
     SELECT x.*, 
     IF(DATEDIFF(@prev, login_at) = 1, @count:= @count + 1, @count:=1) as `count`,@prev := login_at
     FROM
        (SELECT DISTINCT(DATE(login_at)) as login_at FROM login_histories WHERE user_id = ?) x,
        (SELECT @count := 1, @prev := null) a
     ORDER BY login_at desc) a

但是我需要转换这个来与我们新的Postgres数据库一起工作,并且我在这样做的过程中遇到了困难。
我能够想出这个(这不工作):

SELECT MAX(count) as max_streak
    FROM (
         SELECT x.*, 
         CASE WHEN DATEDIFF(@prev, login_at) = 1 THEN SET @count = @count + 1 ELSE SET @count = 1 END as count, @prev = login_at
         FROM
            (SELECT DISTINCT(DATE(login_at)) as login_at FROM login_histories WHERE user_id = ?) x,
            (SELECT @count = @count + 1, @prev = null) a
         ORDER BY login_at desc) a

我尝试使用SET来查看是否可以修复它,但是错误总是在SELECT @count = @count + 1, @prev = null)结束时结束

Query 1 ERROR: ERROR:  column "count" does not exist
LINE 7:                     (SELECT @count = @count + 1, @prev = nul...
                                     ^

有什么我错过了,有人可以告诉我在正确的方向吗?

gjmwrych

gjmwrych1#

下面是一个gaps-and-islands类型的查询的演示。我使用的是MySQL 8.0,但是该解决方案使用标准的窗口函数,因此它应该可以在任何现代SQL数据库中工作。

mysql> select * from login_histories;
+---------------------+---------+
| login_at            | user_id |
+---------------------+---------+
| 2022-08-01 00:00:00 |       1 |
| 2022-08-02 00:00:00 |       1 |
| 2022-08-03 00:00:00 |       1 |
| 2022-08-04 00:00:00 |       1 |
| 2022-08-06 00:00:00 |       1 |
| 2022-08-07 00:00:00 |       1 |
| 2022-08-08 00:00:00 |       1 |
| 2022-08-10 00:00:00 |       1 |
| 2022-08-11 00:00:00 |       1 |
+---------------------+---------+

请注意,有三个连续天数范围。

select max(count) as maxcount
from (
  select day_online, count(*) as count
  from (
    select
      date_sub(login_at, interval dense_rank() over (partition by user_id order by login_at)-1 day) as day_online
    from login_histories
  ) as t
  group by day_online
) as t2;

+----------+
| maxcount |
+----------+
|        4 |
+----------+

我将一步一步地加以说明。
窗口函数DENSE_RANK()对行进行连续排序。如果我们从第一个日期范围中的日期减去该排序,我们将得到第一个日期。但是,当连续日期中出现中断时,排序将至少减少一个。因此,减法将得到一个不同的日期,而不是最初的开始日期。然后,另一个间隔将导致差异更大,依此类推。

select
  login_at,
  dense_rank() over (partition by user_id order by login_at)-1 as denserank,
  date_sub(login_at, interval dense_rank() over (partition by user_id order by login_at)-1 day) as day_online
from login_histories;

+---------------------+-----------+---------------------+
| login_at            | denserank | day_online          |
+---------------------+-----------+---------------------+
| 2022-08-01 00:00:00 |         0 | 2022-08-01 00:00:00 |
| 2022-08-02 00:00:00 |         1 | 2022-08-01 00:00:00 |
| 2022-08-03 00:00:00 |         2 | 2022-08-01 00:00:00 |
| 2022-08-04 00:00:00 |         3 | 2022-08-01 00:00:00 |
| 2022-08-06 00:00:00 |         4 | 2022-08-02 00:00:00 |
| 2022-08-07 00:00:00 |         5 | 2022-08-02 00:00:00 |
| 2022-08-08 00:00:00 |         6 | 2022-08-02 00:00:00 |
| 2022-08-10 00:00:00 |         7 | 2022-08-03 00:00:00 |
| 2022-08-11 00:00:00 |         8 | 2022-08-03 00:00:00 |
+---------------------+-----------+---------------------+

这会将数据列分组为连续日期的群组。第三栏中的日期没有任何意义,它们只是表示每个群组的日期不同。
通过这些不同的值进行分组,我们可以计算每个组中的行数。

select day_online, count(*) as count
from ( 
  select
    date_sub(login_at, interval dense_rank() over (partition by user_id order by login_at)-1 day) as day_online
  from login_histories
) as t 
group by day_online;

+---------------------+-------+
| day_online          | count |
+---------------------+-------+
| 2022-08-01 00:00:00 |     4 |
| 2022-08-02 00:00:00 |     3 |
| 2022-08-03 00:00:00 |     2 |
+---------------------+-------+

然后取该计数值的MAX()。
在我的演示中,我只存储了四舍五入后的日期,并且只有一个user_id值。我将让您根据自己的情况来调整它,您可能有不同的时间和user_id。

相关问题