Oracle Query与Hash Join的性能问题

ie3xauqp  于 2023-06-22  发布在  Oracle
关注(0)|答案(1)|浏览(149)

我正在使用oracle 19 c,在一个简单的查询中遇到了一个奇怪的性能问题。这是我的查询:

SELECT 
TU.ID AS RecordID
FROM TABLE_U TU
JOIN (SELECT to_date('11-04-2001','DD-MM-YYYY') AS PROCESS_DATE FROM DUAL
UNION ALL SELECT to_date('10-04-2001','DD-MM-YYYY') AS PROCESS_DATE FROM DUAL
)  PDATE ON (1=1)
JOIN TABLE_T t ON (t.PROCESSOR = TU.PROCESSOR AND TU.TTD = t.ID AND t.FROMDATE <=PDATE.PROCESS_DATE AND t.TODATE >= PDATE.PROCESS_DATE)
LEFT JOIN TABLE_TD TD ON (TD.PROCESSOR = t.PROCESSOR AND TD.ABC = t.ID)

WHERE t.CREATED <=PDATE.PROCESS_DATE
AND TU.FROMDATE <= PDATE.PROCESS_DATE

这是它的执行计划

Plan hash value: 1872580574
 
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                            | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                                 |  3745 |   369K| 35213   (1)| 00:00:02 |
|*  1 |  HASH JOIN OUTER                        |                                 |  3745 |   369K| 35213   (1)| 00:00:02 |
|   2 |   NESTED LOOPS                          |                                 |  3745 |   303K| 34957   (1)| 00:00:02 |
|   3 |    NESTED LOOPS                         |                                 | 77970 |   303K| 34957   (1)| 00:00:02 |
|   4 |     NESTED LOOPS                        |                                 |  7797 |   388K|  3853   (1)| 00:00:01 |
|   5 |      VIEW                               |                                 |     2 |    18 |     4   (0)| 00:00:01 |
|   6 |       UNION-ALL                         |                                 |       |       |            |          |
|   7 |        FAST DUAL                        |                                 |     1 |       |     2   (0)| 00:00:01 |
|   8 |        FAST DUAL                        |                                 |     1 |       |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS BY INDEX ROWID BATCHED| TABLE_T                         |  3899 |   159K|  1924   (1)| 00:00:01 |
|* 10 |       INDEX RANGE SCAN                  | TABLE_T_FROMDATE                |    23 |       |  1916   (1)| 00:00:01 |
|* 11 |     INDEX RANGE SCAN                    | PK_TABLE_U                      |    10 |       |     3   (0)| 00:00:01 |
|* 12 |    TABLE ACCESS BY INDEX ROWID          | TABLE_U                         |     1 |    32 |     4   (0)| 00:00:01 |
|  13 |   INDEX STORAGE FAST FULL SCAN          | TABLE_TD_IDX                    |   231K|  4066K|   256   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("TD"."PROCESSOR"(+)="T"."PROCESSOR" AND "TD"."ABC"(+)="T"."ID")
  10 - access("T"."TODATE">="PDATE"."PROCESS_DATE" AND "T"."FROMDATE"<="PDATE"."PROCESS_DATE" AND 
              "T"."CREATED"<="PDATE"."PROCESS_DATE")
       filter("T"."CREATED"<="PDATE"."PROCESS_DATE" AND "T"."TODATE">="PDATE"."PROCESS_DATE")
  11 - access("TU"."TTD"="T"."ID" AND "T"."PROCESSOR"="TU"."PROCESSOR")
       filter("T"."PROCESSOR"="TU"."PROCESSOR")
  12 - filter("TU"."FROMDATE"<="PDATE"."PROCESS_DATE")
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------
 
   0 -  SEL$6
         E -  USEX_NL

(统计数据是最新的)我对此有两个问题:

  • 查询执行时间超过30分钟
  • 我不明白为什么oracle要考虑TABLE_TD上的join,因为它是一个左joi,并且在selected列和where条件中都没有被调用。

我已经强制了一个嵌套循环提示,如下所示:

SELECT /*+ USE_NL(t TD)*/
TU.ID AS RecordID
FROM TABLE_U TU
JOIN (SELECT to_date('11-04-2001','DD-MM-YYYY') AS PROCESS_DATE FROM DUAL
UNION ALL SELECT to_date('10-04-2001','DD-MM-YYYY') AS PROCESS_DATE FROM DUAL
)  PDATE ON (1=1)
JOIN TABLE_T t ON (t.PROCESSOR = TU.PROCESSOR AND TU.TTD = t.ID AND t.FROMDATE <=PDATE.PROCESS_DATE AND t.TODATE >= PDATE.PROCESS_DATE)
LEFT JOIN TABLE_TD TD ON (TD.PROCESSOR = t.PROCESSOR AND TD.ABC = t.ID)

WHERE t.CREATED <=PDATE.PROCESS_DATE
AND TU.FROMDATE <= PDATE.PROCESS_DATE

新的计划更正确,查询执行时间不到1秒!!:

Plan hash value: 3532208304
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                                 |  3745 |   369K| 42448   (1)| 00:00:02 |
|   1 |  NESTED LOOPS OUTER                    |                                 |  3745 |   369K| 42448   (1)| 00:00:02 |
|   2 |   NESTED LOOPS                         |                                 |  3745 |   303K| 34957   (1)| 00:00:02 |
|   3 |    NESTED LOOPS                        |                                 |  7797 |   388K|  3853   (1)| 00:00:01 |
|   4 |     VIEW                               |                                 |     2 |    18 |     4   (0)| 00:00:01 |
|   5 |      UNION-ALL                         |                                 |       |       |            |          |
|   6 |       FAST DUAL                        |                                 |     1 |       |     2   (0)| 00:00:01 |
|   7 |       FAST DUAL                        |                                 |     1 |       |     2   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| TABLE_T                         |  3899 |   159K|  1924   (1)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN                  | TABLE_T_FROMDATE                |    23 |       |  1916   (1)| 00:00:01 |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED | TABLE_U                         |     1 |    32 |     4   (0)| 00:00:01 |
|* 11 |     INDEX RANGE SCAN                   | PK_TABLE_U                      |    10 |       |     3   (0)| 00:00:01 |
|* 12 |   INDEX RANGE SCAN                     | TABLE_TD_IDX                    |     1 |    18 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   9 - access("T"."TODATE">="PDATE"."PROCESS_DATE" AND "T"."FROMDATE"<="PDATE"."PROCESS_DATE" AND 
              "T"."CREATED"<="PDATE"."PROCESS_DATE")
       filter("T"."CREATED"<="PDATE"."PROCESS_DATE" AND "T"."TODATE">="PDATE"."PROCESS_DATE")
  10 - filter("TU"."FROMDATE"<="PDATE"."PROCESS_DATE")
  11 - access("TU"."TTD"="T"."ID" AND "T"."PROCESSOR"="TU"."PROCESSOR")
       filter("T"."PROCESSOR"="TU"."PROCESSOR")
  12 - access("TD"."ABC"(+)="T"."ID" AND "TD"."PROCESSOR"(+)="T"."PROCESSOR")

有没有解释为什么oracle可以选择第一个执行计划,我如何在没有强制提示的情况下解决这个问题?

vdgimpew

vdgimpew1#

1.您正在使用TD而没有意识到这一点。只要加入它,你就有可能乘以其他表的行数,如果它应该找到每个键超过一个TD行(processor,abc)。在SELECT子句中不应用筛选器或从筛选器中请求列并不意味着join没有做任何事情。Oracle仍然必须执行连接。

  1. Oracle认为,在达到TD之前进行其他连接之后,它将有3,745行。这刚好超过了其成本数学的阈值,可以预测散列连接比嵌套循环更便宜。如果你注意到你的暗示版本,它计算的成本是42 K,而它的首选计划是35 K-数字非常接近对方,所以它接近拐点。小的变化可能会导致它在这两个计划之间来回翻转。这对Oracle来说是典型的,因为随着时间的推移,数据会发生变化,Oracle总是试图选择它认为是最好的计划。当然,它是不完美的,经常在计算中加入不正确的假设,导致不正确的成本计算,这意味着糟糕的计划。我们尽量给予它最好的统计数据,如果我们仍然有问题,提示在修复事情和保持稳定方面发挥了作用。话虽如此,但不要认为甲骨文在这里是错的。
    1.现在,我怀疑你的测试时间不对。1 s听起来像是计时到出现的第一行(响应时间),而不是计时到提取最后一行(吞吐量)。从散列连接到嵌套循环,这样每个连接都是嵌套循环,绝对可以给予你一个立即<1 s的响应(第一行),因为这就是嵌套循环的工作方式-你在下一行工作之前得到第一行,等等。如果它是嵌套循环,您将立即在客户端中看到第一行。如果在任何地方只有一个散列连接,则必须等到所有行的散列连接都完成后才能看到第一行。如果这就是正在发生的事情,那么您需要更改您的测试。将查询 Package 在SELECT COUNT(*) FROM ([query])中,以便测试整个SQL。我怀疑你会发现甲骨文最初的计划与你的新计划几乎相同,甚至略好一些。
    1.要真正调优查询,您需要找出它花费时间的地方。运行一段时间后,请咨询ASH(v$active_session_history)以获取SQL的sql_id,您可以从v$session中获取。查看sql_plan_line_idsql_plan_operationevent并计算行数。例如:
SELECT sql_plan_line_id,sql_plan_operation,event,COUNT(*)
   FROM v$active_session_history
  WHERE sql_id = 'gdvh64xqz9wcm'
    AND sample_time > SYSDATE - 1/24
  GROUP BY sql_plan_line_id,sql_plan_operation,event
  ORDER BY COUNT(*) DESC

这将显示在哪个计划行和什么事件上花费了多少秒(COUNT)。这将告诉您问题在查询中的哪里。解决它将取决于问题是什么。但这应该能让你开始。

相关问题