mysql 使用子查询和连接时的不同结果:SQL-1454.活动用户

tp5buhyn  于 2022-12-03  发布在  Mysql
关注(0)|答案(1)|浏览(139)
Input: 
Accounts table:
+----+----------+
| id | name     |
+----+----------+
| 1  | Winston  |
| 7  | Jonathan |
+----+----------+
Logins table:
+----+------------+
| id | login_date |
+----+------------+
| 7  | 2020-05-30 |
| 1  | 2020-05-30 |
| 7  | 2020-05-31 |
| 7  | 2020-06-01 |
| 7  | 2020-06-02 |
| 7  | 2020-06-02 |
| 7  | 2020-06-03 |
| 1  | 2020-06-07 |
| 7  | 2020-06-10 |
+----+------------+
Output: 
+----+----------+
| id | name     |
+----+----------+
| 7  | Jonathan |
+----+----------+

说明:ID = 1的用户Winston仅在2个不同的日子登录了2次,因此Winston不是活跃用户。ID = 7的用户Jonathan在6个不同的日子登录了7次,其中5天是连续的,因此Jonathan是活跃用户。
活跃用户是指连续五天或更长时间登录其帐户的用户。
编写SQL查询以查找活动用户的ID和名称。
返回按id排序的结果表
我遇到了一些问题与我的代码做这个问题。
我的代码:

SELECT DISTINCT a.id, a.name
FROM Accounts a
LEFT JOIN Logins l
ON a.id = l.id
JOIN Logins l1
ON l.id=l1.id AND DATEDIFF(l.login_date, l1.login_date) BETWEEN 1 AND 4
GROUP BY l.login_date
HAVING COUNT(DISTINCT l1.login_date) = 4

输入

{"headers":{"Accounts":["id","name"],"Logins":["id","login_date"]},"rows":{"Accounts":[[182,"Gavriel"],[119,"Naftali"],[31,"Yaakov"],[136,"Menachem"],[142,"Sarah"],[204,"Daniel"],[49,"Ezra"],[27,"David"]],"Logins":[[142,"2020-6-27"],[119,"2020-6-29"],[31,"2020-6-26"],[27,"2020-6-27"],[182,"2020-7-2"],[136,"2020-6-28"],[142,"2020-7-5"],[27,"2020-6-29"],[136,"2020-6-27"],[49,"2020-7-1"],[204,"2020-7-1"],[49,"2020-7-5"],[204,"2020-7-3"],[49,"2020-7-3"],[31,"2020-7-3"],[204,"2020-7-3"],[142,"2020-6-30"],[119,"2020-6-26"],[142,"2020-6-29"],[136,"2020-7-2"],[49,"2020-7-2"],[182,"2020-7-4"],[119,"2020-6-29"],[49,"2020-6-30"],[136,"2020-7-5"],[27,"2020-7-2"],[136,"2020-6-28"],[31,"2020-6-29"],[204,"2020-7-3"],[142,"2020-6-29"],[31,"2020-6-30"],[204,"2020-6-27"],[204,"2020-7-2"],[182,"2020-6-27"],[31,"2020-7-3"],[119,"2020-7-4"],[142,"2020-6-27"],[119,"2020-6-27"],[27,"2020-6-26"],[142,"2020-7-2"],[27,"2020-6-28"],[136,"2020-6-26"],[119,"2020-6-27"],[142,"2020-7-1"],[27,"2020-7-1"],[31,"2020-6-29"],[204,"2020-6-28"],[136,"2020-6-28"],[204,"2020-7-3"],[31,"2020-6-28"],[182,"2020-6-29"],[49,"2020-7-4"],[204,"2020-6-27"],[136,"2020-7-5"],[142,"2020-7-4"],[31,"2020-7-2"],[182,"2020-7-1"],[204,"2020-6-28"],[31,"2020-7-4"],[136,"2020-7-1"],[136,"2020-6-26"],[27,"2020-7-4"],[27,"2020-6-29"],[31,"2020-7-2"]]}}

我的输出:

{"headers": ["id", "name"], "values": [[49, "Ezra"], [136, "Menachem"], [142, "Sarah"], [182, "Gavriel"]]}

预期输出:

{"headers":["id","name"],"values":[[49,"Ezra"]]}

正确答案:

SELECT DISTINCT l1.id,
(SELECT name FROM Accounts WHERE id = l1.id) AS name
FROM Logins l1
JOIN Logins l2 ON l1.id = l2.id AND DATEDIFF(l2.login_date, l1.login_date) BETWEEN 1 AND 4
GROUP BY l1.id, l1.login_date
HAVING COUNT(DISTINCT l2.login_date) = 4

在我的答案中,我连接了Accounts表,而不是像答案中那样使用子查询。但是,我不明白为什么我的代码会导致如此不同的输出。

ivqmmu1c

ivqmmu1c1#

请尝试以下查询:

with data as (
select 
id,l_date,
--lag(l_date) over(partition by id order by l_date) lag_date,
l_date - lag(l_date) over(partition by id order by l_date),
case when l_date - lag(l_date) over(partition by id order by l_date) > 1 then 1 else 0 end as session_change

from login
),
staging_data as (
select 
id,l_date,sum(session_change)
over(partition by id order by l_date rows between unbounded preceding and current row) as session
from data )
select id
from staging_data
group by id,session having count(*) >=5

相关问题