snippetsqlMinor
How can I most closely monitor this table in production to find why rows are disappearing?
Viewed 0 times
thisrowscanfindwhyhowdisappearingproductiontableare
Problem
Scenario:
We write online banking software and for the time being (due to legacy design decisions) are using Quartz 2.2 right now as a queue system to perform a handful of operatoins long term. [Let's sidebar the discussion on if it's the right tool, it's worked well for many years until we updated to 2.2]
Part of that quartz.net has the following tables (related) for schema: https://gist.github.com/jcolebrand/8695603
So the process is we're inserting records in this table, to the tune of say 80k records. (I have three replications of quartz.net serving three different configurations, one has 80k records, one 50k, one 280k, so it varies). We insert them from some other table that is the table of record, so rebuilding this table isn't a terrible loss. The issue is described below.
I have a tool which can bulk schedule these tasks (most of which are one-time and will happen anywhere from two weeks to twenty years in the future, again, this is about the db and not the architecture choices) and I can debug step through, see the rows get inserted into the table with no concern. I can monitor the table and see that they get inserted. I then come back 30 minutes later (well, this part varies. It's not deterministic on when they disappear) and check and the records are now missing.
Out of 80k on the one instance, I'm missing approximately 2700 records. And when I run my tool, I see them all synced up, and after some matter of time, they have evaporated again.
Here's things I've tried:
Things I haven't tried:
We write online banking software and for the time being (due to legacy design decisions) are using Quartz 2.2 right now as a queue system to perform a handful of operatoins long term. [Let's sidebar the discussion on if it's the right tool, it's worked well for many years until we updated to 2.2]
Part of that quartz.net has the following tables (related) for schema: https://gist.github.com/jcolebrand/8695603
So the process is we're inserting records in this table, to the tune of say 80k records. (I have three replications of quartz.net serving three different configurations, one has 80k records, one 50k, one 280k, so it varies). We insert them from some other table that is the table of record, so rebuilding this table isn't a terrible loss. The issue is described below.
I have a tool which can bulk schedule these tasks (most of which are one-time and will happen anywhere from two weeks to twenty years in the future, again, this is about the db and not the architecture choices) and I can debug step through, see the rows get inserted into the table with no concern. I can monitor the table and see that they get inserted. I then come back 30 minutes later (well, this part varies. It's not deterministic on when they disappear) and check and the records are now missing.
Out of 80k on the one instance, I'm missing approximately 2700 records. And when I run my tool, I see them all synced up, and after some matter of time, they have evaporated again.
Here's things I've tried:
- checking the all transactions report in SSMS
- checking the all blocking transactions report in SSMS
- leaving my scheduling application open for long duration (in case, by some freak magic, there was an open, uncommitted transaction)
- restarting the Quartz.net windows service application maintaining the database
- inserting records with the quartz.net windows service disabled
Things I haven't tried:
- restarting SQL Server
- my queries don't use "with(nolock)" [I've heard that
Solution
If you don't expect any rows to be deleted from this table, then the simplest, least impact way to catch anyone running deletes would be a delete trigger. You can use an INSTEAD OF trigger if you want to log and prevent, or an after trigger if you just want to log. First, a logging table:
Now, an INSTEAD OF INSERT trigger, could log the DELETEs but not carry through:
Change
The beauty is that in both cases this only has any impact whatsoever if and when a delete is actually issued against the table.
CREATE TABLE dbo.CatchTheDoNoGooders
(
[JOB_NAME] [nvarchar](200),
[JOB_GROUP] [nvarchar](200),
[SCHED_NAME] [varchar](120),
UserName SYSNAME,
HostName SYSNAME,
Program NVARCHAR(4000),
IP VARCHAR(15),
EventDate DATETIME
);Now, an INSTEAD OF INSERT trigger, could log the DELETEs but not carry through:
CREATE TRIGGER dbo.InsteadDelete_QRTZ_JOB_DETAILS
ON dbo.QRTZ_JOB_DETAILS
INSTEAD OF DELETE
AS
BEGIN
SET NOCOUNT ON;
DECLARE @ip VARCHAR(15), @hn SYSNAME, @prog NVARCHAR(4000);
SELECT TOP (1) @ip = client_net_address
FROM sys.dm_exec_connections
WHERE session_id = @@SPID;
SELECT TOP (1) @hn = [host_name], @prog = LEFT([program_name],4000)
FROM sys.dm_exec_sessions
WHERE session_id = @@SPID;
INSERT dbo.CatchTheDoNoGooders
SELECT JOB_NAME, JOB_GROUP, SCHED_NAME,
SUSER_SNAME(), @hn, @prog, @ip, CURRENT_TIMESTAMP
FROM deleted;
END
GOChange
INSTEAD OF to FOR and now you have an AFTER trigger that will still allow the delete to happen, but will log them all anyway. This can be better if you do have some legal deletes going on but want to determine who is deleting the wrong things.The beauty is that in both cases this only has any impact whatsoever if and when a delete is actually issued against the table.
Code Snippets
CREATE TABLE dbo.CatchTheDoNoGooders
(
[JOB_NAME] [nvarchar](200),
[JOB_GROUP] [nvarchar](200),
[SCHED_NAME] [varchar](120),
UserName SYSNAME,
HostName SYSNAME,
Program NVARCHAR(4000),
IP VARCHAR(15),
EventDate DATETIME
);CREATE TRIGGER dbo.InsteadDelete_QRTZ_JOB_DETAILS
ON dbo.QRTZ_JOB_DETAILS
INSTEAD OF DELETE
AS
BEGIN
SET NOCOUNT ON;
DECLARE @ip VARCHAR(15), @hn SYSNAME, @prog NVARCHAR(4000);
SELECT TOP (1) @ip = client_net_address
FROM sys.dm_exec_connections
WHERE session_id = @@SPID;
SELECT TOP (1) @hn = [host_name], @prog = LEFT([program_name],4000)
FROM sys.dm_exec_sessions
WHERE session_id = @@SPID;
INSERT dbo.CatchTheDoNoGooders
SELECT JOB_NAME, JOB_GROUP, SCHED_NAME,
SUSER_SNAME(), @hn, @prog, @ip, CURRENT_TIMESTAMP
FROM deleted;
END
GOContext
StackExchange Database Administrators Q#57817, answer score: 3
Revisions (0)
No revisions yet.