debugsqlMinor
How to break out of error loop in transactional replication?
Viewed 0 times
errortransactionalloopreplicationhowbreakout
Problem
I was running some simple tests on a publisher and subscriber on the same instance:
correctly pushed to the subscriber.
didn't realize it at the time)
As I was watching the size of the subscriber table, I noticed it would drop down in row count almost to zero and then go back to 50,000. Deletes would start running again, the row count would drop, and then it back to 50,000. This kept happening over and over.
I ran a standard trace and saw all the deletes running normally. After it was almost complete, this statement appeared:
Then I modified the trace to incude error messages and saw this:
So it looks like when the distribution agent is executing all the deletes one-by-one, the process is still included in a transaction (the original DELETE was one line). If there is an error, it performs a ROLLBACK and then starts over.
My questions:
can complete?
really a question. But this seems like a simple error could have
some pretty serious complications unless replication is monitored
very carefully.
- I inserted 50,000 rows into a published article and the data was
correctly pushed to the subscriber.
- I inadvertently deleted the 49,985th record on the subscriber (but
didn't realize it at the time)
- I deleted 50,000 rows from the published article
As I was watching the size of the subscriber table, I noticed it would drop down in row count almost to zero and then go back to 50,000. Deletes would start running again, the row count would drop, and then it back to 50,000. This kept happening over and over.
I ran a standard trace and saw all the deletes running normally. After it was almost complete, this statement appeared:
IF @@TRANCOUNT > 0 ROLLBACKThen I modified the trace to incude error messages and saw this:
The row was not found at the Subscriber when applying the replicated command.So it looks like when the distribution agent is executing all the deletes one-by-one, the process is still included in a transaction (the original DELETE was one line). If there is an error, it performs a ROLLBACK and then starts over.
My questions:
- Is there a standard way to break out of the loop?
- Would it be acceptable to recreate the missing row so the process
can complete?
- What do you do if this happens on production? I guess that's not
really a question. But this seems like a simple error could have
some pretty serious complications unless replication is monitored
very carefully.
Solution
The error is caused by the code in the custom delete stored procedure for each article:
If no row is deleted, a special procedure is called and the transaction is rolled back.
In order to avoid this, I've created a post-Snapshot script that alters the all delete / update replication stored procedures and logs the error to a table instead.
create procedure [dbo].[sp_MSdel_dboChild]
@pkc1 int
as
begin
delete [dbo].[Child]
where [ChildID] = @pkc1
if @@rowcount = 0
if @@microsoftversion>0x07320000
exec sp_MSreplraiserror 20598
endIf no row is deleted, a special procedure is called and the transaction is rolled back.
In order to avoid this, I've created a post-Snapshot script that alters the all delete / update replication stored procedures and logs the error to a table instead.
CREATE procedure [dbo].[sp_MSdel_dboChild]
@pkc1 int
as
begin
delete [dbo].[Child]
where [ChildID] = @pkc1
if @@rowcount = 0
if @@microsoftversion>0x07320000
insert into repl_errors (database_name, table_name, pkey_col, pkey_val)
select dbo.fn_GetDB(APP_NAME()), 'Child', 'ChildID', @pkc1
endCode Snippets
create procedure [dbo].[sp_MSdel_dboChild]
@pkc1 int
as
begin
delete [dbo].[Child]
where [ChildID] = @pkc1
if @@rowcount = 0
if @@microsoftversion>0x07320000
exec sp_MSreplraiserror 20598
endCREATE procedure [dbo].[sp_MSdel_dboChild]
@pkc1 int
as
begin
delete [dbo].[Child]
where [ChildID] = @pkc1
if @@rowcount = 0
if @@microsoftversion>0x07320000
insert into repl_errors (database_name, table_name, pkey_col, pkey_val)
select dbo.fn_GetDB(APP_NAME()), 'Child', 'ChildID', @pkc1
endContext
StackExchange Database Administrators Q#21796, answer score: 2
Revisions (0)
No revisions yet.