patternsqlMinor
Minimise duration of table lock
Viewed 0 times
minimisetabledurationlock
Problem
I've got a SQL table that needs to be updated daily.
There may or may not be queries against that table while the update is happening. It's around 500,000 rows.
We have an issue where there is locking conflict when the job to update the table is running at the same time as a query against it.
So I have rewritten the process to update the table as follows:
```
ALTER procedure [dbo].[Table_Generate] as
declare @d datetime = getdate(), @c as int
--Check temp tables
IF OBJECT_ID('tempdb..#final') IS NOT NULL
DROP TABLE #final
IF OBJECT_ID('tempdb..#base') IS NOT NULL
DROP TABLE #base
--Get source data from linked server
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode
into #base
from [LinkedServer].[Database].dbo.[A_View]
--Generate row_hash
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode,
hashbytes('SHA2_256',(
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode
from #base sub where sub.ID = main.ID for xml raw)) as row_hash
into #final
from #base main
select @c = count(*) from #final
if @c >0 begin
merge [The_Table_Staging] as target
using #final as source
on source.ID = target.ID
--New rows
when not matched by target then
insert ( RunDate,
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode,
Row_Hash
) values (
@d,
source.ID,
source.Reference,
source.StartDate,
source.EndDate,
source.Description,
source.SomeCode,
source.row_hash)
--Existing changed rows
when matched and source.row_hash != target.row_hash then update set
target.RunDate = @d
,target.Reference = source.Reference
,target.StartDate = source.StartDate
,target.EndDate
There may or may not be queries against that table while the update is happening. It's around 500,000 rows.
We have an issue where there is locking conflict when the job to update the table is running at the same time as a query against it.
So I have rewritten the process to update the table as follows:
```
ALTER procedure [dbo].[Table_Generate] as
declare @d datetime = getdate(), @c as int
--Check temp tables
IF OBJECT_ID('tempdb..#final') IS NOT NULL
DROP TABLE #final
IF OBJECT_ID('tempdb..#base') IS NOT NULL
DROP TABLE #base
--Get source data from linked server
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode
into #base
from [LinkedServer].[Database].dbo.[A_View]
--Generate row_hash
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode,
hashbytes('SHA2_256',(
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode
from #base sub where sub.ID = main.ID for xml raw)) as row_hash
into #final
from #base main
select @c = count(*) from #final
if @c >0 begin
merge [The_Table_Staging] as target
using #final as source
on source.ID = target.ID
--New rows
when not matched by target then
insert ( RunDate,
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode,
Row_Hash
) values (
@d,
source.ID,
source.Reference,
source.StartDate,
source.EndDate,
source.Description,
source.SomeCode,
source.row_hash)
--Existing changed rows
when matched and source.row_hash != target.row_hash then update set
target.RunDate = @d
,target.Reference = source.Reference
,target.StartDate = source.StartDate
,target.EndDate
Solution
Unfortunately I see a lot of sins in your code at a quick glance:
-
There's a multitude of known bugs with the
-
-
While tempting, using the
-
Linked Servers are also known to be performance bottlenecks at times too (for fixed cardinality estimates and bringing all the data across the network before processing it). If you were previously relying on it in the transaction of your code, it's possible you can just keep that part outside of the transaction to minimize the lock time of your local tables.
-
Also, I do like the use of the
-
That being said, as noted by Erik Darling,
-
There's a multitude of known bugs with the
MERGE statement that it really should be avoided altogether in production code.-
MERGE is also known to be less performant than writing the individual INSERT, UPDATE, and DELETE statements; which may be the reasoning for some of your blocking issues.-
While tempting, using the
sp_rename function to minimize blocking issues can actually result in worse blocking issues, as discussed in Kendra Little's Why You Should Switch in Staging Tables Instead of Renaming Them. (If I recall correctly, this discusses using partition switching as a better solution.)-
Linked Servers are also known to be performance bottlenecks at times too (for fixed cardinality estimates and bringing all the data across the network before processing it). If you were previously relying on it in the transaction of your code, it's possible you can just keep that part outside of the transaction to minimize the lock time of your local tables.
-
Also, I do like the use of the
HASHBYTES() function to generate a row hash, that's something I've used in the past. But you may also find it more performant to call it on the row itself by using CONCAT() across every column with a safe column separator (e.g. double pipe ||) as the parameter to it as opposed to using it around a subquery that leverages XML.-
That being said, as noted by Erik Darling,
HASHBYTES() itself is sometimes susceptible to performance issues at scale. If it is your bottleneck here, you can try materializing the results of the function in a computed column or indexed view (it is deterministic), or use an alternative method, such as CLR, as mentioned in the linked post.Context
StackExchange Database Administrators Q#312246, answer score: 8
Revisions (0)
No revisions yet.