postgresql 使用每个表的WHERE子句连接多个表

vlju58qv  于 2023-06-05  发布在  PostgreSQL
关注(0)|答案(2)|浏览(171)

前任

SELECT *
FROM A
JOIN B ON A.idx = B.idx
JOIN C ON A.idx = C.idx
WHERE A.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR A.last_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR B.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR B.last_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR C.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR C.last_dt BETWEEN '2023-05-01' AND '2023-05-31';

它是PostgreSQL DB
在多连接表时,我只希望SELECT数据,其中每个表的create_dtlast_dt是最新的数据。
但是,有一个问题,那就是速度太慢。在这种情况下如何解决?
当我只留下A表WHERE条件时,性能是好的。

kg7wmglp

kg7wmglp1#

您可以尝试将以下索引添加到这三个表中:

CREATE INDEX idx_a ON A (idx, create_dt, last_dt);
CREATE INDEX idx_b ON B (idx, create_dt, last_dt);
CREATE INDEX idx_c ON C (idx, create_dt, last_dt);

如果使用这些索引,应该可以加快查询中的连接。

xn1cxnb4

xn1cxnb42#

我只想SELECT每个表的create_dtlast_dt是最新的数据。
那么你当然不能在idx上连接,因为最新的每个表不会共享相同的idx值。
你的查询并不像你说的那样。这一个是:

SELECT *
FROM  (
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_a, *
   FROM   a
   WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY create_dt DESC
   LIMIT 1
   )
   UNION ALL
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_a, *
   FROM   a
   WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY last_dt DESC
   LIMIT 1
   )
   ORDER BY latest_a
   LIMIT 1
   ) a
CROSS JOIN  (
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_b, *
   FROM   b
   WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY create_dt DESC
   LIMIT 1
   )
   UNION ALL
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_b, *
   FROM   b
   WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY last_dt DESC
   LIMIT 1
   )
   ORDER BY latest_b
   LIMIT 1
   ) b
CROSS JOIN  (
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_c, *
   FROM   a
   WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY create_dt DESC
   LIMIT 1
   )
   UNION ALL
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_c, *
   FROM   a
   WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY last_dt DESC
   LIMIT 1
   )
   ORDER BY latest_c
   LIMIT 1
   ) c -- USING (idx);

所有括号均为必填项。
有点冗长。但它是最快的-如果你有这些索引:

CREATE INDEX a_create_dt_idx ON A (create_dt);
CREATE INDEX a_last_dt_idx ON A (last_dt);

CREATE INDEX b_create_dt_idx ON B (create_dt);
CREATE INDEX b_last_dt_idx ON B (last_dt);

CREATE INDEX c_create_dt_idx ON C (create_dt);
CREATE INDEX c_last_dt_idx ON C (last_dt);

每个表将有两次索引查找,每次直接选择一个合格的行。
我使用无条件的CROSS JOIN进行连接,因为每个子查询只返回 * 一 * 行,前提是至少有一个符合条件。
如果其中一个子查询未找到行,则结果为空。也许您真的希望FULL OUTER JOIN在一个表为空时保留其他表的结果。或者只有3个结果行。
再说一次,如果你没有说清楚你真正需要什么我也不会感到惊讶。我的猜测是:你想要这样的东西(每个表中的最新行per idx):

(
SELECT DISTINCT ON (idx) 'a_latest_create_dt' AS what, *
FROM   a
WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, create_dt DESC, last_dt DESC
)
UNION ALL
(
SELECT DISTINCT ON (idx) 'a_latest_last_dt' AS what, *
FROM   a
WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, last_dt DESC, create_dt DESC
)

UNION ALL
(
SELECT DISTINCT ON (idx) 'b_latest_create_dt' AS what, *
FROM   b
WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, create_dt DESC, last_dt DESC
)
UNION ALL
(
SELECT DISTINCT ON (idx) 'b_latest_last_dt' AS what, *
FROM   b
WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, last_dt DESC, create_dt DESC
)

UNION ALL
(
SELECT DISTINCT ON (idx) 'c_latest_create_dt' AS what, *
FROM   c
WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, create_dt DESC, last_dt DESC
)
UNION ALL
(
SELECT DISTINCT ON (idx) 'c_latest_last_dt' AS what, *
FROM   c
WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, last_dt DESC, create_dt DESC
)
ORDER BY idx, what;

这一次,我列出了每个表的每个idx的最新行:一个用于create_dt,一个用于last_dt。如果在时间范围内每种口味都有一行,则为6行。
关于DISTINCT ON

  • 是否选择每个GROUP BY组中的第一行?

假设所有涉及的列都是NOT NULL。否则你可能要做更多...
匹配指标:

CREATE INDEX a_create_dt_idx ON A (idx, create_dt);
CREATE INDEX a_last_dt_idx ON A (idx, last_dt);

CREATE INDEX b_create_dt_idx ON B (idx, create_dt);
CREATE INDEX b_last_dt_idx ON B (idx, last_dt);

CREATE INDEX c_create_dt_idx ON C (idx, create_dt);
CREATE INDEX c_last_dt_idx ON C (idx, last_dt);

相关问题