SQL Server Best way to update 40 million rows in batch

rjee0c15  于 11个月前  发布在  其他
关注(0)|答案(5)|浏览(100)

Basically I need to run this on a table with 40 million rows, updating every row at once will crash, so I want to batch the query so that if it crash, it can re-run the query and it would skip the finished batch and just continue with the ones left over.

UPDATE [table]
SET [New_ID] = [Old_ID]

What is the fastest way to do this? Here is how the table is created:

CREATE TABLE [table](
    [INSTANCE_ID] [int] NOT NULL,
    [table_ID] [bigint] IDENTITY(1,1) NOT NULL,
    [old_ID] [bigint] NOT NULL,
    [new_ID] [bigint] NOT NULL,
    [owner_ID] [int] NOT NULL,
    [created_time] [datetime] NULL
) ON [PRIMARY]

There are also indexes on created_time, owner_ID.

EDIT: My update statement is EXACTLY as shown, I literally just need to copy every entry in old_id into new_id for 40 million rows.

inkz8wg9

inkz8wg91#

Declare @Rowcount INT = 1;

WHILE (@Rowcount > 0)   
BEGIN
        UPDATE TOP (100000) [table]   --<-- define Batch Size in TOP Clause
           SET [New_ID] = [Old_ID]
        WHERE [New_ID] <> [Old_ID]

        SET @Rowcount = @@ROWCOUNT;

       CHECKPOINT;   --<-- to commit the changes with each batch
END
yftpprvb

yftpprvb2#

M.Ali's suggestion will work, but you will end up with degrading performance as you work through the 40M records. I would suggest a better filter to find the records to update in each pass. This would assume you have a primary key (or other index) on your identity column:

DECLARE @Rowcount INT = 1
    ,   @BatchSize INT = 100000
    ,   @StartingRecord BIGINT = 1;

WHILE (@Rowcount > 0)   
BEGIN
    UPDATE [table]
        SET [New_ID] = [Old_ID]
    WHERE [table_ID] BETWEEN @StartingRecord AND @StartingRecord + @BatchSize - 1;

    SET @Rowcount = @@ROWCOUNT;

    CHECKPOINT;

    SELECT @StartingRecord += @BatchSize
END

This approach will allow each iteration to be as fast as the first. And if you don't have a valid index you need to fix that first.

slmsl1lt

slmsl1lt3#

Select 1;  -- this will set a rowcount
WHILE (@@Rowcount > 0)   
BEGIN
  UPDATE TOP (1000000) [table]   
    SET [New_ID] =  [Old_ID]
  WHERE [New_ID] <> [Old_ID] 
    or ([New_ID] is null and [Old_ID] is not null)
END

100000 may work better for the top.

Since NewID and OldID is not null then the is null check is not necessary.

hrysbysz

hrysbysz4#

Fastest way is to :

  1. Create a temp table and insert all the values from old to temp table using the create(select having condition) statement.

  2. Copy the constraints and refresh the indexes.

  3. Drop the old table.

  4. Rename temp table to original name.

Complete discussion is available on this link

crcmnpdw

crcmnpdw5#

I have implemented a logic that allows me to update millions of records in a short amount of time for a single column. I would like to share this logic and ask for feedback on its effectiveness. Here's an explanation of the logic:

    • FIRST CREATE CLUSTRED INDEX TRUE IDENTITY COLUMN FOR TARGETED TABLE IF NOT EXIST
    • I create a test table (Table1) and transfer all distinct data from the targeted table to Table1.
    • create another test table (Table2) and transfer the true identity column data from the targeted table and the IDs from Table1
    • NOW UPDATE DATA IN TEST TABLE1
    • Finally, I update the targeted table data based on the identity column from Table2 and the updated values from Table1. Here are the example queries corresponding to each step:

Step 1:

IF NOT EXISTS (SELECT 1 FROM sys.indexes WHERE name = 'ClusteredIndexName' AND object_id = OBJECT_ID('TargetedTable'))
BEGIN
    CREATE CLUSTERED INDEX ClusteredIndexName ON TargetedTable (TrueIdentityColumn);
END

step 2:

CREATE TABLE Table1 (column1 datatype, column2 datatype, ...);

INSERT INTO Table1 (column1, column2, ...)
SELECT DISTINCT column1, column2, ...
FROM TargetedTable;

Step 3:

CREATE TABLE Table2 (TrueIdentityColumn datatype, ID datatype);

INSERT INTO Table2 (TrueIdentityColumn, ID)
SELECT TargetedTable.TrueIdentityColumn, Table1.ID
FROM TargetedTable
JOIN Table1 ON TargetedTable.CommonColumn = Table1.CommonColumn;

Step 4:

UPDATE Table1
SET column1 = 'UPDATEDDATA'

Step 5:

UPDATE TargetedTable
SET column1 = Table1.column1 -- Update the desired column(s) with the updated values from Table1
FROM TargetedTable
JOIN Table2 ON TargetedTable.TrueIdentityColumn = Table2.TrueIdentityColumn
JOIN Table1 ON Table2.ID = Table1.ID;

I would appreciate any feedback or suggestions for improvement on this logic. Is there a better approach or any potential issues with this implementation? Thank you in advance.

相关问题