postgresql SQL根据上一日期的最大值计算每天的值数

uxhixvfz 于 2023-08-04 发布在 PostgreSQL

关注(0)|答案(2)|浏览(166)

PostgreSQL数据库中有一个表，它存储了关于一本书中的单词和这些单词出现的页面的数据。大概是这样的：

headword | dict_pages       |   timestamp_updated    
----------+------------------+------------------------
 abcdefg  | {229}            | 2023-07-28 14:49:13+00
 ccdsd    | {213}            | 2023-07-28 18:48:11+00
 zdx      | {228}            | 2023-07-27 18:37:42+00
 xcdferc  | {227, 228}       | 2023-07-27 14:47:55+00

字符串
我想知道，有多少页被处理为任何特定的一天。以下是我现在的做法：

SELECT
    to_char(timestamp_updated, 'YYYY-MM-DD') as dt,
    count(distinct(page_num))
FROM dictionary_word dtw, unnest(dict_pages) page_num
WHERE dict_pages IS NOT NULL
GROUP BY dt
ORDER BY dt ASC

型
我的目标是获得处理了多少新页面。以前处理过的旧页面与此统计信息无关。
从上面的例子中，使用我现有的查询，我将得到以下内容：

2023-07-27 - 2
2023-07-28 - 2

型
但是单词ccdsd是在几天/几周/几个月前添加的，所以页面213应该从处理页面数的计算中排除。我认为这个问题的解决方案可能是从前一天获得最大页数，2023-07-27，也就是228，然后对于当前一天，只计算大于228的数字（页数）。
有什么优雅的方法可以用SQL实现这一点吗？

postgresql

来源：https://stackoverflow.com/questions/76796601/sql-to-calculate-number-of-values-for-each-day-based-on-a-max-value-from-previou

2条答案

按热度按时间

khbbv19g1#

这个问题可以重新表述如下：

计算每日处理的唯一页面数，同时过滤小于最后一天最大页面数的页面。*

DB fiddle with step-by-step queries
1.如果使用PostgreSQL 11+：

select day_date, count(distinct(page)) as page_count
from 
(select 
    to_char(timestamp_updated, 'YYYY-MM-DD') as day_date, 
    page,
    first_value(page) over (
        order by to_char(timestamp_updated, 'YYYY-MM-DD')
        groups between 1 preceding and current row
    ) as prev_day_max_page 
from dictionary_word, unnest(dict_pages) page
) prev_day_page_data
where page >= prev_day_max_page
group by day_date
order by day_date;

字符串
GROUPS窗口帮助聚合上一个和当前日期组中的所有行。
文档**link_1和link_2**。
1.不太优雅的方法：

with all_pages as (
    select
        to_char(timestamp_updated, 'YYYY-MM-DD') as day_date,
        page
    from dictionary_word dtw, unnest(dict_pages) page
)
select day_date, count(distinct(page)) as pages_count
from all_pages
join (
    select
        day_date,
        coalesce(
              lag(daily_max_page) over (order by day_date), 
              daily_max_page
        ) as prev_max_page
    from (
          select
              to_char(timestamp_updated, 'YYYY-MM-DD') as day_date,
              max(page::integer) as daily_max_page
          from dictionary_word dtw, unnest(dict_pages) page
          group by day_date
          -- order by day_date
    ) s
) prev_day_max_page_stat using (day_date)
where page::integer >= prev_max_page
group by day_date
order by day_date;

型

详情：

all_pages将从unnest(dict_pages)操作中获得所有页码。
prev_day_max_page_stat查询计算当前日期之前的最大页数。
1.最后一个查询计算每天处理的唯一pages_count。
它按day_date字段对页面进行分组，并使用筛选器筛选值
where page::integer >= prev_max_page。

改进示例：

我改进了insert语句来检查查询的正确性。在下面的屏幕截图中，您可以看到，值小于前一天最大值的页面不包括在结果中。

Screenshot

| 页数| pages_count |
| --| ------------ |
| 一个| 1 |
| 二个| 2 |
| 一个| 1 |

赞(0）回复(0）举报 2023-08-04

nr7wwzry2#

下面的查询首先unnest是页码数组，然后在聚合过程中，过滤掉任何具有对应的dict_page的标题词，该标题词在较早的日期被处理过（即存在一个dict_page大于或等于所讨论的dict_page，且时间戳较小）：

with dt as (
   select date(d.timestamp_updated) date, d.headword, c::int 
   from dictionary_word d cross join unnest(d.dict_pages) c
)
select d.date, count(distinct d.headword) c from dt d 
where not exists (select 1 from dt d1 where d.c <= d1.c and d1.date < d.date)
group by d.date

字符串
See fiddle

赞(0）回复(0）举报 2023-08-04

我来回答

postgresql SQL根据上一日期的最大值计算每天的值数

2条答案

相关问题

热门标签

最新问答