Do DELETE queries take the same amount of time as SELECT

oalqel3c  于 2023-02-21  发布在  其他
关注(0)|答案(1)|浏览(93)

I am hoping to delete all the duplicates in my large DB. I wrote a query which first filters about 200k records out of a million records and then SELECTs the records.

Now, this whole process takes about 10 MINS 30 SECS under a certain system load. Now, if I replaced the SELECT command with DELETE , will it take the same amount of time?

I cannot test it myself because I do not have DELETE privileges right at this moment.

My Queries are as follows:

  1. For Select
WITH CTE AS(
    SELECT
    ID,
    FIRSTNAME,
    LASTNAME,
    AGE,
    ROW_NUMBER() OVER(
        PARTITION BY
            ID,
            FIRSTNAME,
            LASTNAME,
            AGE
            ORDER BY
                ID,
                FIRSTNAME,
                LASTNAME,
                AGE
    ) AS row_num
    FROM 
        NEWTABLE
    
)
SELECT * FROM CTE WHERE row_num > 1;
  1. For Delete
WITH CTE AS(
    SELECT
    ID,
    FIRSTNAME,
    LASTNAME,
    AGE,
    ROW_NUMBER() OVER(
        PARTITION BY
            ID,
            FIRSTNAME,
            LASTNAME,
            AGE
            ORDER BY
                ID,
                FIRSTNAME,
                LASTNAME,
                AGE
    ) AS row_num
    FROM 
        NEWTABLE
    
)
DELETE FROM CTE WHERE row_num > 1;
mrfwxfqh

mrfwxfqh1#

Now, if I replaced the SELECT command with DELETE, will it take the same amount of time?

No, almost certainly not.

delete is a data modification and fully logged operation, neither of which select is.

The delete will need to acquire IX then X locks in order to delete rows, which may be blocked by other concurrent processes, depending on database usage.

The delete will need to update the base table and also any relevant indexes, plus write the modifications to the transaction log.

Depending on the performance of your hardware and IO it may be better to delete rows in batches of no more than approximately 5000, which is the threshold for the number of locks; this should prevent SQL Server escalating row locks to a table lock which would aid concurrency. You can add a top clause in your CTE to accomplish this.

Also, your CTE doesn't need to select all columns, the Id should suffice.

It may be worth experimenting with using a view in line with this advice .

相关问题