SQL Server Query takes 10 minutes when table variables are used versus 2 seconds when temporary table is used

gblwokeq  于 2023-10-15  发布在  其他
关注(0)|答案(1)|浏览(86)

There is an old query written by someone which uses table variables and it was running in 2-3 minutes. From last 2 weeks, it runs eternally even though the dataset size is approximately same. Hence, I am trying to optimize the script. The script runs 99.99% faster when I replace table variable with a temporary table. What is the exact reason for this? I agree that the table variable does not suit for large datasets. However in this case, the record count is in thousands. Both table variable and temporary table stores the data in tempDB, then why this difference? What could have happened in the backend where the table variable which was running in 2-3 minutes before, takes an insane amount of time now?

DECLARE @test_table1 TABLE(/*some columns*/ );

    INSERT INTO @test_table1
    SELECT /*some columns*/ FROM SomeTable1
        LEFT JOIN SomeTable2
        LEFT JOIN SomeTable3
        LEFT JOIN SomeTable4
        LEFT JOIN SomeTable5
    WHERE --some conditions with string and date functions

    UPDATE @test_table1 SET col1 = 'xyz' WHERE SUBSTRING(col2,1,4) IN ('some values')

    DECLARE @test_table2 TABLE (/*some columns*/);

    WITH [cte1] AS (
        SELECT /*some columns*/ FROM SomeTable1
            INNER JOIN @test_table1 
            LEFT JOIN SomeTable2)
    
    ,[cte2] AS(
        SELECT /*some columns*/ FROM SomeTable1
            INNER JOIN @test_table1 
            LEFT JOIN SomeTable2    
            LEFT JOIN SomeTable3
        UNION
        SELECT /*some columns*/ FROM SomeTable1
            INNER JOIN @test_table1 
            LEFT JOIN SomeTable2    
            LEFT JOIN SomeTable3)
    ,[cte3] AS(
        SELECT DISTINCT /*some columns*/ FROM [cte2]
        WHERE /*some simple conditions*/)

    INSERT INTO @test_table2 
    SELECT /*some columns with string concat using stuff and xml path*/ FROM [cte3]
    GROUP BY /*some columns*/
tyu7yeag

tyu7yeag1#

"Table variables are kept in the RAM, making all access to it very fast while temporary tables are physically created on the disk."

Every time someone goes to me to ask why temporary tables can actually show better performance than table variables I hear some variation of that quote.

So I must remember people the engine makes lots of estimates and assumptions even when trying to fetch data from the most simple query. One of those most important estimates is about the size of the returned dataset. Most of the time those table variables' bad performance is due to the dataset returning a lot of data, more than the engine is willing to take from the RAM and starts using the disk, exactly the same temp used by temporary tables.

So why worse and not the same performance? Table variables are supposed to be small, very small indeed and most importantly, smaller than the "equivalent"/"default" temporary table. While the dataset size increases, the engine must reallocate more and more disk space and make more and more page splits and stuff and thanks to all those initial assumptions the overhead for table variables is a lot worse.

That's why my advice to anyone trying to use tables variables or temporary tables is: Have you tested the other way around?

The TOP 100 also can be misleading, it means only you want to SHOW UP THE FIRST 100, but that doesn't imply the engine doesn't need to retrieve entire tables to apply a sort on a big mess of data. All at the expense of the initial reserved memory space

相关问题