PostgreSQL中的SQL查询帮助

szqfcxe2  于 2023-10-18  发布在  PostgreSQL
关注(0)|答案(2)|浏览(107)

概述

我正在尝试开发一个查询,它可以根据需要选择尽可能多的行,以便更新最多LIMIT unique thing.thing_id s,并按status_1_date列排序(这意味着旧的内容应该首先更新)。thing_id不是主键。

示例架构

  1. CREATE TABLE IF NOT EXISTS thing (
  2. name VARCHAR(255) PRIMARY KEY,
  3. thing_id VARCHAR(255),
  4. c_id VARCHAR(255),
  5. status VARCHAR(255),
  6. etag VARCHAR(255),
  7. last_modified VARCHAR(255),
  8. size VARCHAR(255),
  9. status_1_date DATE
  10. );

样本数据

  1. INSERT INTO thing (name,thing_id, c_id,status, status_1_date)
  2. values
  3. ('protocol://full/path/to/thing/thing_1.file1', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:01'),
  4. ('protocol://full/path/to/thing/thing_1.file2', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:02'),
  5. ('protocol://full/path/to/thing/thing_2.file5', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:02.5'),
  6. ('protocol://something else', 'thing_1','c_id', 'status_1', '2023-09-29 09:00:02.8'),
  7. ('protocol://full/path/to/thing/thing_2.file1', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:03'),
  8. ('protocol://full/path/to/thing/thing_2.file2', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:04'),
  9. ('protocol://full/path/to/thing/thing_2.file3', 'thing_2','c_id', 'status_1', '2023-09-29 09:00:05'),
  10. ('protocol://full/path/to/thing/thing_2.file4', 'thing_2','different', 'status_1', '2023-09-29 09:00:05'),
  11. ('protocol://full/path/to/thing/thing_3.file1', 'thing_3','c_id', 'status_1', '2023-09-29 09:00:06'),
  12. ('protocol://full/path/to/thing/thing_3.file2', 'thing_3','c_id', 'status_1', '2023-09-29 09:00:06.2'),
  13. ('protocol://full/path/to/thing/thing_4.file1', 'thing_4','c_id', 'status_1', '2023-09-29 09:00:06.1'),
  14. ('protocol://full/path/to/thing/thing_4.file2', 'thing_4','c_id', 'status_1', '2023-09-29 09:00:06.3'),
  15. ('protocol://full/path/to/thing/thing_5.file1', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.4'),
  16. ('protocol://full/path/to/thing/thing_5.file2', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.5'),
  17. ('protocol://full/path/to/thing/thing_5.file3', 'thing_5','c_id', 'status_1', '2023-09-29 09:00:06.6'),
  18. ('protocol://full/path/to/thing/thing_6.file1', 'thing_6','c_id', 'status_1', '2023-09-29 09:00:06.7');

预期结果

使用这些数据,我试图得到一个查询,当使用LIMIT 3时,它会更新第1-6行和第7-9行。应该更新与there WHERE子句匹配的thing_id s thing_5thing_6thing_1的行。

其他详细信息

我试图解决的问题是,我使用一个大型数据库,我需要限制从它返回的数据,但确保我得到一个完整的数据集的实体,我查询。thing.thing_id不是我们数据库中的主键,但是thing_id表示一个实体,它可以由来自N行的数据组成,如共享thing_id所示。我需要确保我获得了所有事物实体的完整行集,其中LIMIT表示我正在获取数据的事物实体的总数。
这个问题是从以前的线程产生的,但没有解决方案:Unable to get Postgresql LIMIT to work with UPDATE and subquery

Answer的问题

  • 它不会获取具有最旧文件的thing_idhttps://dbfiddle.uk/GStIbrYA
  • 如果两个针对相同thing.c_idthing.name前缀的查询同时运行,则会发生双重更新。我正在处理的数据库中的这个更新可能一次处理100,000行,所以这是不可取的。
    替代尝试1

看起来可能需要FOR UPDATE来防止双重更新发生,但它不能与窗口函数一起使用。

  1. with ranked_matching_things as (
  2. select
  3. dense_rank() over (order by thing_id) as thing_rank,
  4. name
  5. from thing
  6. where c_id = 'c_id'
  7. and name like 'protocol://full/path/to/thing/%'
  8. order by status_1_date
  9. for update
  10. )
  11. update thing
  12. set status = 'CHANGED'
  13. from ranked_matching_things
  14. where thing.name = ranked_matching_things.name
  15. and ranked_matching_things.thing_rank <= 3
  16. ERROR: FOR UPDATE is not allowed with window functions

替代尝试2

  1. UPDATE thing
  2. SET status = 'CHANGED'
  3. FROM (
  4. SELECT name
  5. FROM thing
  6. WHERE thing.thing_id in (
  7. SELECT DISTINCT thing.thing_id
  8. FROM thing
  9. WHERE c_id = 'c_id' AND name like 'protocol://full/path/to/thing/%' AND status = 'status_1'
  10. ORDER BY thing.thing_id
  11. LIMIT (3)
  12. ) AND thing.c_id = 'c_id' AND thing.name like 'protocol://full/path/to/thing/%' AND thing.status = 'status_1'
  13. ORDER BY thing.status_1_date
  14. FOR UPDATE
  15. ) AS rows
  16. WHERE thing.name = rows.name
  17. RETURNING *

测试设置

以下链接可能对测试有用:https://dbfiddle.uk/ZTcbkqsF

jgzswidk

jgzswidk1#

这不能有限制,因为这太...有限。相反,我们可以使用window function。让我们把它分解…
首先,我们识别匹配的行并将它们按顺序排列。

  1. select
  2. *
  3. from thing
  4. where c_id = 'c_id'
  5. and name like 'protocol://full/path/to/thing/%'
  6. order by status_1_date;

很简单。
从这些结果中,我们需要选择前3个不同的匹配thing_id。我们可以用dense_rank向结果中添加一列。

  1. select
  2. dense_rank() over (order by thing_id) as thing_rank,
  3. *
  4. from thing
  5. where c_id = 'c_id'
  6. and name like 'protocol://full/path/to/thing/%'
  7. order by status_1_date;
  1. thing_rank name thing_id c_id status status_1_date
  2. 1 protocol://full/path/to/thing/thing_1.file1 thing_1 c_id status_1 2023-09-29 09:00:01
  3. 1 protocol://full/path/to/thing/thing_1.file2 thing_1 c_id status_1 2023-09-29 09:00:02
  4. 2 protocol://full/path/to/thing/thing_2.file5 thing_2 c_id status_1 2023-09-29 09:00:02.5
  5. 2 protocol://full/path/to/thing/thing_2.file1 thing_2 c_id status_1 2023-09-29 09:00:03
  6. 2 protocol://full/path/to/thing/thing_2.file2 thing_2 c_id status_1 2023-09-29 09:00:04
  7. 2 protocol://full/path/to/thing/thing_2.file3 thing_2 c_id status_1 2023-09-29 09:00:05
  8. 3 protocol://full/path/to/thing/thing_3.file1 thing_3 c_id status_1 2023-09-29 09:00:06
  9. 4 protocol://full/path/to/thing/thing_4.file1 thing_4 c_id status_1 2023-09-29 09:00:06.1
  10. 3 protocol://full/path/to/thing/thing_3.file2 thing_3 c_id status_1 2023-09-29 09:00:06.2
  11. 4 protocol://full/path/to/thing/thing_4.file2 thing_4 c_id status_1 2023-09-29 09:00:06.3
  12. 5 protocol://full/path/to/thing/thing_5.file1 thing_5 c_id status_1 2023-09-29 09:00:06.4
  13. 5 protocol://full/path/to/thing/thing_5.file2 thing_5 c_id status_1 2023-09-29 09:00:06.5
  14. 5 protocol://full/path/to/thing/thing_5.file3 thing_5 c_id status_1 2023-09-29 09:00:06.6
  15. 6 protocol://full/path/to/thing/thing_6.file1 thing_6 c_id status_1 2023-09-29 09:00:06.7

现在我们有一种方法来识别与前三个thing_id对应的行:thing_1具有等级1,thing_2具有等级2,并且thing_3具有等级3。
我们使用dense_rank,而不是rank,以确保秩是1,2,3.否则我们会有差距。
在主键上与该子查询联接,并且仅更新rank <= 3的子查询。

  1. with ranked_matching_things as (
  2. select
  3. dense_rank() over (order by thing_id) as thing_rank,
  4. name
  5. from thing
  6. where c_id = 'c_id'
  7. and name like 'protocol://full/path/to/thing/%'
  8. order by status_1_date
  9. )
  10. update thing
  11. set status = 'CHANGED'
  12. from ranked_matching_things
  13. where thing.name = ranked_matching_things.name
  14. and ranked_matching_things.thing_rank <= 3

Demonstration

展开查看全部
bvjxkvbb

bvjxkvbb2#

我能够得出以下可能是次优的解决方案。

查询获取与最旧文件关联的有效thing_id列表:

  1. SELECT thing_id, MIN(status_1_date)
  2. FROM thing
  3. WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
  4. GROUP BY thing_id

查询获取LIMITthing_id s,首先获取文件最旧的thing_id s:

  1. SELECT thing_id
  2. FROM (
  3. SELECT thing_id, MIN(status_1_date) AS oldest_file
  4. FROM thing
  5. GROUP BY thing_id
  6. ) as thing_ids
  7. ORDER BY thing_ids.oldest_file
  8. LIMIT 3

查询获取所需行FOR UPDATE

  1. SELECT *
  2. FROM thing
  3. WHERE thing.thing_id in (
  4. SELECT thing_id
  5. FROM (
  6. SELECT thing_id, MIN(status_1_date) AS oldest_file
  7. FROM thing
  8. GROUP BY thing_id
  9. ) as thing_ids
  10. ORDER BY thing_ids.oldest_file
  11. )
  12. FOR UPDATE

查询UPDATE有效行:

  1. WITH rows AS (
  2. SELECT *
  3. FROM thing
  4. WHERE thing.thing_id in (
  5. SELECT thing_id
  6. FROM (
  7. SELECT thing_id, MIN(status_1_date) AS oldest_file
  8. FROM thing
  9. WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
  10. GROUP BY thing_id
  11. ) as thing_ids
  12. WHERE thing.status = ('status_1') AND thing.c_id = ('c_id') AND thing.name LIKE ('protocol://full/path/to/thing/%')
  13. ORDER BY thing_ids.oldest_file
  14. LIMIT 3
  15. )
  16. FOR UPDATE
  17. )
  18. UPDATE thing
  19. SET status = 'CHANGED'
  20. FROM rows
  21. WHERE thing.name = rows.name
  22. RETURNING *

解决方案:https://dbfiddle.uk/Z7trYUKa
版本1:

  1. WITH thing_ids as (
  2. SELECT thing_id, MIN(status_1_date) AS oldest
  3. FROM thing
  4. WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
  5. GROUP BY thing_id
  6. ORDER BY oldest
  7. LIMIT 3
  8. ),
  9. names AS (
  10. SELECT name
  11. FROM thing
  12. WHERE thing.thing_id IN (SELECT thing_id FROM thing_ids) AND thing.status = ('status_1') AND thing.c_id = ('c_id') AND thing.name LIKE ('protocol://full/path/to/thing/%')
  13. FOR UPDATE
  14. )
  15. UPDATE thing
  16. SET status = 'CHANGED'
  17. FROM names
  18. WHERE thing.name = names.name
  19. RETURNING *

版本1解决方案:https://dbfiddle.uk/dg8poZaA

我确信一定有某种方法可以防止多余的WHERE子句,但到目前为止我还无法做到这一点。

修订版2,无重复WHERE

我可以通过创建一个UNLOGGED表来解决冗余的WHERE子句。
创建所有匹配WHERE子句的潜在行的表:

  1. CREATE UNLOGGED TABLE rows AS (
  2. SELECT name, thing_id, status_1_date
  3. FROM thing
  4. WHERE status = ('status_1') AND c_id = ('c_id') AND name LIKE ('protocol://full/path/to/thing/%')
  5. )

更新查询:

  1. UPDATE thing
  2. SET status = 'CHANGED'
  3. WHERE name IN (
  4. SELECT name
  5. FROM thing
  6. WHERE name IN (SELECT name FROM rows) AND
  7. thing_id IN (
  8. SELECT thing_id
  9. FROM rows
  10. GROUP BY thing_id
  11. ORDER BY MIN(status_1_date)
  12. LIMIT 3
  13. )
  14. FOR UPDATE
  15. )
  16. RETURNING *
展开查看全部

相关问题