mysql 时间戳上滚动窗口的SQL计数非重复

plicqrtu  于 2023-04-28  发布在  Mysql
关注(0)|答案(2)|浏览(170)

我有这个SQL查询:

SELECT COUNT(DISTINCT CustomerName) over(
            ORDER BY OrderTimestamp 
            RANGE BETWEEN INTERVAL 2 hour PRECEDING AND CURRENT ROW 
        ) AS count_per_time
FROM Orders
WHERE CustomerName IS NOT null
AND CustomerName != ''

但它不起作用,因为不允许在OVER子句中使用DISTINCT。有人能帮我解决这个问题吗?非常感谢。

rks48beu

rks48beu1#

我不认为这可以用窗口函数来解决。
有一种模拟技术使用dense_rank s的减法,但后者不支持窗口框架规范(即:原始代码中的range/rows语法)。好吧,准确地说,它实际上允许语法,但默默地忽略它:所以它仍然在整个分区上运行。这在the documentation中有解释:
标准SQL指定在整个分区上操作的窗口函数不应该有frame子句。MySQL允许这样的函数使用框架子句,但忽略它。即使指定了一个帧,这些函数也会使用整个分区

CUME_DIST()
DENSE_RANK()
LAG()
LEAD()
NTILE()
PERCENT_RANK()
RANK()
ROW_NUMBER()

似乎剩下的唯一选项是子查询:

SELECT o.*,
    (
        SELECT COUNT(DISTINCT o1.CustomerName) 
        FROM Orders o1
        WHERE o1.CustomerName != ''
        WHERE o1.OrderTimestamp BETWEEN o.OrderTimestamp - INTERVAL 2 HOUR AND o.OrderTimestamp
    ) AS count_per_time
FROM Orders o
WHERE o.CustomerName IS NOT null and o.CustomerName != ''
mw3dktmi

mw3dktmi2#

您可以尝试使用DENSE_RANK(),其中PARTITION BY是分组列,ORDER BYASCDESC要计数的列:

SELECT DENSE_RANK() over(
            PARTITION BY CustomerName 
            ORDER BY OrderTimestamp ASC
            RANGE BETWEEN INTERVAL 2 hour PRECEDING AND CURRENT ROW 
        ) +
        DENSE_RANK() over(
            PARTITION BY CustomerName 
            ORDER BY OrderTimestamp DESC
            RANGE BETWEEN INTERVAL 2 hour PRECEDING AND CURRENT ROW 
        ) - 1
        AS count_per_time
FROM Orders
WHERE CustomerName IS NOT null
AND CustomerName != ''

相关问题