postgresql -在检查条件是否满足的同时,随机将行从一个表插入到另一个表

yrefmtwq  于 2023-08-04  发布在  PostgreSQL
关注(0)|答案(1)|浏览(181)

我有一个tableA,它包含一个id、一个weight(int)和其他类别字段,我必须过滤它们。权重越低,被复制到表B的概率越高。
我必须复制一定数量的行(125),并且必须确保表B中满足某些条件:

  • 至少5行fieldA = 'value1' and fieldB = 'value10'
  • 至少5行fieldA = 'value1' and fieldB = 'value11'
  • 至少1行fieldA = 'value2'
  • 还有其他大致相同的条件。

选择行并不是一个真正的问题:

  1. insert into tableB (id, weight, fieldA, fieldB)
  2. select id, weight, fieldA, fieldB
  3. from tableA
  4. where
  5. (fieldA = 'value1' and fieldB = 'value10')
  6. or (fieldA = 'value1' and fieldB = 'value11')
  7. or (fieldA = 'value2')
  8. order by random() * weight

字符串
但是,由于表B中填充的是“随机”值,我如何确保满足最低条件?我需要在一个循环中这样做吗?我猜是的:插入一行、检查条件,然后插入另一行、检查条件,依此类推。以前没有在SQL中做过循环,希望有一些指导。
谢谢你,谢谢

6l7fqoea

6l7fqoea1#

ROW_NUMBER加上一些额外的逻辑是你的朋友。
这里的前景

  • 为您的每一个条件定义(例如,fieldA = 'value1' and fieldB = 'value10')a uniquethread_id
  • 对于每个 * 线程 *,相对于降序weight定义row_number-即partition by thread_id order by weight desc
  • 只保留所需的最小数量的row_number(例如5为上面的线程)
  • 选择所有具有row_number的行,并按降序weight-order by rn nulls last, weight desc选择限制的其他行

当然,只有当表具有每个线程所需的最小行数,并且最小行数的总和小于您的总体限制时,它才能工作。

查询

  1. with ta as (
  2. select
  3. a.*,
  4. case /* your conditions here */
  5. when fieldA = 'value1' and fieldB = 'value10' then 1
  6. when fieldA = 'value1' and fieldB = 'value11' then 2
  7. when fieldA = 'value2' then 3
  8. end as thread_id
  9. from tableA a
  10. ), ta2 as (
  11. select
  12. ta.*, /* add index rn for each thread */
  13. row_number () over (partition by thread_id order by weight desc) rn
  14. from ta
  15. )
  16. select
  17. id, fielda, fieldb, weight, thread_id,
  18. case /* keep rn for the required counts */
  19. when thread_id in (1,2) and rn <= 5 or
  20. thread_id = 3 and rn <= 1
  21. then rn end as rn
  22. from ta2
  23. order by 6 nulls last, weight desc
  24. limit 125

字符串

样本数据

  1. create table tableA as
  2. select
  3. id,
  4. 'value'|| floor(random() * 12 + 1)::text as fieldA,
  5. 'value'|| floor(random() * 12 + 1)::text as fieldb,
  6. random() weight
  7. from generate_series(1,100000) t(id)

结果

  1. id |fielda |fieldb |weight |thread_id|rn|
  2. -----+-------+-------+------------------+---------+--+
  3. 75078|value2 |value5 | 0.999945197714716| 3| 1|
  4. 69603|value1 |value10|0.9981664255434559| 1| 1|
  5. 90845|value1 |value11|0.9961481124707383| 2| 1|
  6. 98661|value1 |value10| 0.995685359324316| 1| 2|
  7. 73730|value1 |value11|0.9943281590439526| 2| 2|
  8. 20895|value1 |value10|0.9948395132030576| 1| 3|
  9. 28411|value1 |value11|0.9926846343501374| 2| 3|
  10. 6689|value1 |value10|0.9926810486253324| 1| 4|
  11. 26683|value1 |value11|0.9896049495594461| 2| 4|
  12. 52026|value1 |value10|0.9915162874600227| 1| 5|
  13. 48511|value1 |value11|0.9881885052326673| 2| 5|
  14. 42838|value8 |value2 |0.9999986646619519| | |
  15. 93206|value3 |value12|0.9999955774933724| | |
  16. 39188|value8 |value8 |0.9999941396240928| | |
  17. 54205|value7 |value8 | 0.999971846101122| | |
  18. ...

展开查看全部

相关问题