SQL Server Where condition dramatically reduces performance unexpectedly

j0pj023g  于 2023-05-05  发布在  其他
关注(0)|答案(1)|浏览(196)

I am attempting to optimize a very large query that has a particularly bad join condition. I have reduced the larger query down to two tables to illustrate the negative behavior I am seeing.

CREATE TABLE #test1 (Id1 INT, Id2 NVARCHAR(256))
CREATE CLUSTERED INDEX PK_test1 ON #test1 (Id1, Id2)
-- Sample (1, "a|b|c|d")

CREATE TABLE #test2 (Id1 INT, Id2 NVARCHAR(256))
CREATE CLUSTERED INDEX PK_test2 ON #test2 (Id1, Id2)
-- Sample (1, "a|b")

The join I am attempting to optimize is a postfix match (query hints added to emulate the join order as these are a part of a much larger query).

DECLARE @id INT = 1

SELECT * 
FROM  #test1 t1
INNER [HASH] JOIN #test2 t2 ON t1.Id1 = t2.Id1
                            AND t1.Id2 LIKE t2.Id2 + '%'
-- This condition is the one that is causing issues
-- WHERE t1.Id1 = @id 
OPTION (FORCE ORDER)

Without the where clause, the join is able to use the HASH join hint:

However with the where clause the optimizer is unable to build that query instead decides to do this:

Substantially reducing the performance of the query.

As the where condition is reducing the data sets from both tables, I would have expected it to be strictly better, however it is completely changing the seek. It also prevents the join from being a hash or merge join, and it is not clear at all why that would be the case as it should just be reducing the search space for both tables.

The main question here is how can that where condition be having such a massive negative effect to the overall query plan when it shouldn't be changing the index access patterns

k0pti3hp

k0pti3hp1#

The reason why you see the error
Msg 8622, Level 16, State 1, Line 12 Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

When adding the

WHERE t1.Id1 = @id

Is because

FROM  #test1 t1
INNER JOIN #test2 t2 ON t1.Id1 = t2.Id1
    AND t1.Id2 LIKE t2.Id2 + '%'
WHERE t1.Id1 = @id

is the same as

FROM  (#test1 t1 WHERE t1.Id1 = @id) t1
CROSS JOIN (#test2 t2 WHERE t2.Id1= @id) t2 
WHERE t1.Id2 LIKE t2.Id2 + '%'

So you lose the equi join that is necessary for a hash join as discussed in Implied Predicates and Query Hints (Craig Freedman).

On the face of it the hash join doesn't really give you anything as the join condition is on Id1 and all of the rows on the build side have the same value for this and fall in the same hash bucket so it has to replay all of the rows in the build side for all of the rows in the probe side anyway.

You could try using the NO_PERFORMANCE_SPOOL hint to get rid of the spool in the plan

相关问题