oracle 为什么MIN()/MAX()函数在子查询中被忽略

hec6srdp  于 2023-10-16  发布在  Oracle
关注(0)|答案(3)|浏览(169)

我需要解释为什么MIN/MAX函数在子查询中被忽略。我在文件中没有发现任何关于大都会的行为。
例如
有两张table

CREATE TABLE A(COL1 NUMBER);
CREATE TABLE B(COL1 NUMBER);

由以下PL/SQL块填充数据

SET TIMING ON
DECLARE
INSERT_STATEMENT VARCHAR2(200);
BEGIN 
FOR I IN 1..100
LOOP
INSERT_STATEMENT := 'INSERT INTO A(COL1) VALUES(round(DBMS_RANDOM.VALUE(1,3)))';
EXECUTE IMMEDIATE INSERT_STATEMENT;
INSERT_STATEMENT := 'INSERT INTO B(COL1) VALUES(round(DBMS_RANDOM.VALUE(1,2)))';
EXECUTE IMMEDIATE INSERT_STATEMENT;
END LOOP;
commit;
END;
/

表值分布

SELECT COL1,COUNT(COL1) COUNT# FROM A GROUP BY COL1;
SELECT COL1,COUNT(COL1) COUNT# FROM B GROUP BY COL1;

      COL1     COUNT#
---------- ----------
         1         26
         2         53
         3         21

SQL>
      COL1     COUNT#
---------- ----------
         1         50
         2         50

“问题”查询是

SQL> SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MIN(B.COL1) FROM B WHERE A.COL1=B.COL1);

  COUNT(*)
----------
        79

SQL> SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MAX(B.COL1) FROM B WHERE A.COL1=B.COL1);

  COUNT(*)
----------
        79

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    67 |  2546 |     5  (20)| 00:00:01 |
|*  1 |  FILTER              |      |       |       |            |          |
|   2 |   HASH GROUP BY      |      |    67 |  2546 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN SEMI    |      |    67 |  2546 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| A    |   100 |  2500 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| B    |   100 |  1300 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"."COL1"=MIN("A"."COL1"))
   3 - access("A"."COL1"="B"."COL1")

两个查询都从表A返回79行,其中表A的值存在于表B中。
我有点困惑,因为我期望上面的查询与MIN()/MAX()函数应该分别返回26和53行。但是看起来MIN/MAX函数在这样的查询中被忽略了。
预期成果的工作示例

SQL> SELECT COUNT(*) FROM A A2 WHERE A2.COL1=(SELECT MIN(B.COL1) FROM A,B WHERE A.COL1=B.COL1);

  COUNT(*)
----------
        26

SQL> SELECT COUNT(*) FROM A A2 WHERE A2.COL1=(SELECT MAX(B.COL1) FROM A,B WHERE A.COL1=B.COL1);

  COUNT(*)
----------
        53

我想知道为什么MIN/MAX函数在子查询中被忽略

SQL> SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MIN(B.COL1) FROM B WHERE A.COL1=B.COL1);
    
      COUNT(*)
    ----------
            79
    
SQL> SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MAX(B.COL1) FROM B WHERE A.COL1=B.COL1);
    
      COUNT(*)
    ----------
            79
rnmwe5a2

rnmwe5a21#

您的子查询

(SELECT MAX(B.COL1) FROM B WHERE A.COL1=B.COL1)

与A相关,因此对于A.COL1的每个值,如果B中存在具有相同COL1值的任何行-对于1和2为真,但对于3不为真-则由于A.COL1=B.COL1MAX(B.COL1)也必须等于MAX(A.COL1)。你可以在 predicate 信息中看到它已经被翻译成了等价的:
由于相关性,A.COL1 = MIN(A.COL1)实际上就是A.COL1 = A.COL1
另一种看待它的方式是,当B中有一行时,你 * 有效地 * 将子查询结果替换为:

WHERE A.COL1 = A.COL1

对于COL1值1和2总是真,对于3总是假;如果B中没有行,那么对于那些行,它实际上被替换为:

WHERE A.COl1 = null

这从来都不是真的
删除该相关性,它将执行您预期的操作-如this db<fiddle所示,使用不同的随机数据:

SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MIN(B.COL1) FROM B WHERE A.COL1=B.COL1);

| COUNT(*)|
| --|
| 76 |

SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MAX(B.COL1) FROM B WHERE A.COL1=B.COL1);

| COUNT(*)|
| --|
| 76 |

SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MIN(B.COL1) FROM B);

| COUNT(*)|
| --|
| 24 |

SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MAX(B.COL1) FROM B);

| COUNT(*)|
| --|
| 52 |

aamkag61

aamkag612#

因为你的条件确保B.COL1等于A.COL1,所以取minmax是无关紧要的,它总是A.COL1的值。我想你可能希望这些查询得到你想要的结果:

SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MIN(B.COL1) FROM B);

SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MAX(B.COL1) FROM B);
bxpogfeg

bxpogfeg3#

SELECT COUNT(*) FROM A WHERE A.COL1=(SELECT MIN(B.COL1) FROM B WHERE A.COL1=B.COL1);

A中查找行,其中B也有一个匹配的col1行(外部查询与内部查询相关,因此当A.col1 = 1时,如果存在任何这样的行,则B.col1也必须等于1,然后MIN(B.col1)也等于1;对于23也是如此,尽管B中的行只存在于值12,而不是3)。
它实际上等同于:

SELECT COUNT(*) FROM A WHERE EXISTS (SELECT 1 FROM B WHERE A.COL1=B.COL1);

由于表B仅包含值12,因此表A中的值的计数将仅为12的值,而不是3的值,这给出了值79。
如果添加GROUP BY子句:

SELECT col1,
       COUNT(*)
FROM   A
WHERE  A.COL1=(SELECT MIN(B.COL1) FROM B WHERE A.COL1=B.COL1)
GROUP BY a.col1;

或:

SELECT col1,
       COUNT(*)
FROM   A
WHERE  EXISTS (SELECT 1 FROM B WHERE A.COL1=B.COL1)
GROUP BY col1;

然后两个查询都输出:
| COL1| COUNT(*)|
| --|--|
| 1 | 26 |
| 2 | 53 |
因为您将计算每个A.col1(而不是计算所有A.col1值的总和)。
您的查询成功:

SELECT COUNT(*)
FROM   A A2
WHERE  A2.COL1=(SELECT MIN(B.COL1) FROM A,B WHERE A.COL1=B.COL1);

内部连接AB,并在聚合后关联A2.COL1=MIN(B.COL1)(而上一个查询在聚合前关联)。
你可以将最终的查询等效地写为:

SELECT COUNT(*)
FROM   A A2
WHERE  EXISTS(
         SELECT 1
         FROM   B
         WHERE  EXISTS(SELECT 1 FROM A WHERE A.COL1=B.COL1)
         HAVING A2.COL1=MIN(B.COL1)
       );

fiddle

相关问题