SQL Server Solving database concurrency issues related to "insert-if-not-exists" pattern

oxcyiej7  于 12个月前  发布在  其他
关注(0)|答案(5)|浏览(112)

I am using Entity Framework Core 3.1.8 with SQL Server 2016.

Consider following example (simplified for clarity):

Database table is defined as follows:

CREATE TABLE [dbo].[Product]
(
    [Id] INT IDENTITY(1,1) NOT NULL , 
    [ProductName] NVARCHAR(500) NOT NULL,
    CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED (Id ASC) WITH (FILLFACTOR=80),
    CONSTRAINT [UQ_ProductName] UNIQUE NONCLUSTERED (ProductName ASC) WITH (FILLFACTOR=80) 
)

And following C# program:

using System;
using System.Linq;
using System.Reflection;

namespace CcTest
{
    class Program
    {
        static int Main(string[] args)
        {
            Product product =  null;

            string newProductName = "Basketball";

            using (CcTestContext context = new CcTestContext())
            using (var transaction = context.Database.BeginTransaction())
            {
                try
                {
                    product = context.Product.Where(p => p.ProductName == newProductName).SingleOrDefault();
                    if (product is null)
                    {
                        product = new Product { ProductName = newProductName };
                        context.Product.Add(product);
                        context.SaveChanges();
                        transaction.Commit();
                    }
                }
                catch(Exception ex)
                {
                    transaction.Rollback();
                }

            }

            if (product is null)
                return -1;
            else
                return product.Id;
        }
    }
}

Everything works as expected during testing - new product is inserted into the table if it didn't already exist. So I expect [UQ_ProductName] constraint to never be hit because everything is done as a single transaction.

However, in reality this code is a part of the business logic connected to Web API. What happened was that 10 instances of this code using the same new product name got executed almost simultaneously (execution time was the same within one hundredth of a second, we save it in the log table). One of them succeeded (new product name got inserted into the table) but the rest of them failed with following exception:
Violation of UNIQUE KEY constraint 'UQ_ProductName'. Cannot insert duplicate key in object 'dbo.Product'. The duplicate key value is (Basketball). The statement has been terminated.

Why did this happen? Isn't this exactly what my use of transaction was supposed to prevent? That is I think checking whether row with such value already exists and inserting if it doesn't should have been an atomic operation. After the first API call was executed and row was inserted the rest of API calls should have detected value already exists and not tried to insert a duplicate.

Can someone explain if there is an error in my implementation? Clearly it is not working the way I expect.

knpiaxh1

knpiaxh11#

TLDR: using transaction (at any isolation level) alone will not solve the issue.

Root cause of the issue is perfectly explained here: https://stackoverflow.com/a/6173482/412352
When using serializable transaction SQL Server issues shared locks on read records / tables. Shared locks doesn't allow other transactions modifying locked data (transactions will block) but it allows other transactions reading data before the transaction which issued locks start modifying data. That is the reason why the example doesn't work - concurrent reads are allowed with shared locks until the first transaction starts modifying data.

Below is the block of code that will reproduce issue all the time and fix is also there (commented out):

using System;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;

namespace CcTest
{
    class Program
    {
        static void Main(string[] args)
        {
            object Lock = new object();

            // Parallel loop below will reproduce the issue all the time
            // (if lock (Lock) is commented out)
            // when in function AddProduct() you assign
            // newProductName value that currently doesn't exist in the database
            Parallel.For(0, 10, index =>
            {
                //lock (Lock) // uncomment this to resolve the issue
                {
                    AddProduct(index);
                }
            });

            // Sequential loop below will aways work as expected
            //for (int index = 0; index < 10; index++)
            //    AddProduct(index);

            Console.ReadKey();
        }

        static void AddProduct( int index)
        {
            Product product = null;

            string newProductName = "Basketball"; // specify something that doesn't exist in database table
            using (CcTestContext context = new CcTestContext())
            using (var transaction = context.Database.BeginTransaction())
            {
                try
                {
                    product = context.Product.FirstOrDefault(p => p.ProductName == newProductName);

                    if (product is null)
                    {
                        product = new Product { ProductName = newProductName };
                        context.Product.Add(product);
                        context.SaveChanges();
                        transaction.Commit();
                        Console.WriteLine($"API call #{index}, Time {DateTime.Now:ss:fffffff}: Product inserted. Id={product.Id}\n");
                    }
                    else
                        Console.WriteLine($"API call #{index}, Time {DateTime.Now:ss:fffffff}: Product already exists. Id={product.Id}\n");
                }
                catch(DbUpdateException dbuex)
                {
                    transaction.Rollback();

                    if (dbuex.InnerException != null)
                        Console.WriteLine($"API call #{index}, Time {DateTime.Now:ss:fffffff}: Exception DbUpdateException caught, Inner Exception Message: {dbuex.InnerException.Message}\n");
                    else
                        Console.WriteLine($"API call #{index}, Time {DateTime.Now:ss:fffffff}: Exception DbUpdateException caught, Exception Message: {dbuex.Message}\n");
                }
                catch (Exception ex)
                {
                    transaction.Rollback();
                    Console.WriteLine($"API call #{index}, Time {DateTime.Now:ss:fffffff}: Exception caught: {ex.Message}\n");
                }
            }
        }
    }
}

As you can see one of the ways to fix the issue is to place the code in the critical section.

Other approach is NOT to place the code in the critical section but catch an exception and check whether it is a DbUpdateException . Then you can check whether inner error message contains something related to constraint violation and if so - try to re-read from the database.

Yet another approach is to use raw SQL and specify SELECT locking hints:

https://weblogs.sqlteam.com/dang/2007/10/28/conditional-insertupdate-race-condition/

PLEASE NOTE: any of the approaches may have negative implications (like a performance decrease).

Other useful pages to take a look at:

https://learn.microsoft.com/en-us/aspnet/mvc/overview/getting-started/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application#modify-the-department-controller

https://weblogs.sqlteam.com/dang/2009/01/31/upsert-race-condition-with-merge/

r8xiu3jd

r8xiu3jd2#

I suspect that behavior is related to transaction IsolationLevel ( ReadCommitted by default for SQL Server in EF Core provider).

I'd rather wait for someone with more expertise to give you a proper explanation, but you can read about Isolation levels and read phenomena here.

8fsztsew

8fsztsew3#

Imagine what would happen if two threads run this code, query the table, then are paused, then resume and try to insert a value. Your current code structure cannot prevent these two threads from attempting an insert at the same time. A Database transaction won't help you here.

You could write raw sql like this to reduce the risk of a duplicate;

insert into Product (ProductName)
select @newName
where not exists(
   select 1 from Product where ProductName = @newName
)

Though the database may encounter the exact same threading issue.

Or you could just wrap your code in a try catch and do it again. (see also Only inserting a row if it's not already there )

Or, if you are planning to update the record anyway;

update Product set ...
where ProductName = @newName

if @@rowcount = 0
begin
    insert into Product (ProductName) values (@newName)
end
nfg76nw0

nfg76nw04#

The Isolation level, ReadCommited by default (MSSSQL), does not prevent a non repeatable read. This means your product name query does not create a lock in the Product table.

So you can do the same read further in the transaction and get a different value (because another thread modified or inserted it).

What you want to use is either Snapshot isolation (better for application performance, other threads don't have to wait for your transaction to complete before they can read the data) or Serializable isolation.

You should be able to specify the isolation level via BeginTransaction(IsolationLevel. ) if you reference the Microsoft.EntityFrameworkCore.Relational package.

p1tboqfb

p1tboqfb5#

using the Serializable isolation level will prevent duplicate inserts (even without the Unique index), though it will throw DbUpdateException and rollback caused by deadlock, which is expected behaviour.

相关问题