snippetsqlModerate
sql server: updating fields on huge table in small chunks: how to get progress/status?
Viewed 0 times
sqlfieldsupdatingchunksstatushugeprogressgetsmallhow
Problem
We have a very large (100million row) table, and we need to update a couple of fields on it.
For log shipping, etc, we also, obviously, want to keep it to bite-size transactions.
The code is:
For log shipping, etc, we also, obviously, want to keep it to bite-size transactions.
- Will the below do the trick?
- And how can we get it to print some output, so we can see progress? (we tried adding a PRINT statement in there, but nothing was output during while loop)
The code is:
DECLARE @CHUNK_SIZE int
SET @CHUNK_SIZE = 10000
UPDATE TOP(@CHUNK_SIZE) [huge-table] set deleted = 0, deletedDate = '2000-01-01'
where deleted is null or deletedDate is null
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE TOP(@CHUNK_SIZE) [huge-table] set deleted = 0, deletedDate = '2000-01-01'
where deleted is null or deletedDate is null
ENDSolution
I was not aware of this question when I answered the related question ( Are explicit transactions needed in this while loop? ), but for the sake of completeness, I will address this issue here as it was not part of my suggestion in that linked answer.
Since I am suggesting to schedule this via a SQL Agent job (it is 100 million rows, after all), I don't think that any form of sending status messages to the client (i.e. SSMS) will be ideal (though if that is ever a need for other projects, then I agree with Vladimir that using
In this particular case, I would create a status table that can be updated per each loop with the number of rows updated thus far. And it doesn't hurt to throw in the current time to have a heart-beat on the process.
Given that you want to be able to cancel and restart the process, I am weary of wrapping the the UPDATE of the main table with the UPDATE of the status table in an explicit transaction. However, if you feel that the status table is ever out of sync due to cancelling, it is easy to refresh with the current value by simply updating it manually with the
How can you stop the process without, um, well, stopping it? By asking it to stop :-). Yep. By sending the process a "signal" (similar to
How can you communicate with the running process in another session? By using the same mechanism that we created for it to communicate its current status back to you: the status table. We just need to add a column that the process will check at the beginning of each loop so that it knows whether to proceed or abort. And since the intent is to schedule this as a SQL Agent job (run every 10 or 20 minutes), we should also check at the very beginning since there is no point in filling a temp table with 1 million rows if the process is just going to exit a moment later and not use any of that data.
You can then check the status at any time using the foll
Since I am suggesting to schedule this via a SQL Agent job (it is 100 million rows, after all), I don't think that any form of sending status messages to the client (i.e. SSMS) will be ideal (though if that is ever a need for other projects, then I agree with Vladimir that using
RAISERROR('', 10, 1) WITH NOWAIT; is the way to go).In this particular case, I would create a status table that can be updated per each loop with the number of rows updated thus far. And it doesn't hurt to throw in the current time to have a heart-beat on the process.
Given that you want to be able to cancel and restart the process, I am weary of wrapping the the UPDATE of the main table with the UPDATE of the status table in an explicit transaction. However, if you feel that the status table is ever out of sync due to cancelling, it is easy to refresh with the current value by simply updating it manually with the
COUNT(*) FROM [huge-table] WHERE deleted IS NOT NULL AND deletedDate IS NOT NULL. and there are two tables to UPDATE (i.e. the main table and the status table), we should use an explicit transaction to keep those two tables in sync, yet we do not want to risk having an orphaned transaction if you cancel the process at a point after it has started the transaction but has not committed it. This should be safe to do as long as you don't stop the SQL Agent job.How can you stop the process without, um, well, stopping it? By asking it to stop :-). Yep. By sending the process a "signal" (similar to
kill -3 in Unix), you can request that it stop at the next convenient moment (i.e. when there is no active transaction!) and have it clean itself up all nice and tidy-like.How can you communicate with the running process in another session? By using the same mechanism that we created for it to communicate its current status back to you: the status table. We just need to add a column that the process will check at the beginning of each loop so that it knows whether to proceed or abort. And since the intent is to schedule this as a SQL Agent job (run every 10 or 20 minutes), we should also check at the very beginning since there is no point in filling a temp table with 1 million rows if the process is just going to exit a moment later and not use any of that data.
DECLARE @BatchRows INT = 1000000,
@UpdateRows INT = 4995;
IF (OBJECT_ID(N'dbo.HugeTable_TempStatus') IS NULL)
BEGIN
CREATE TABLE dbo.HugeTable_TempStatus
(
RowsUpdated INT NOT NULL, -- updated by the process
LastUpdatedOn DATETIME NOT NULL, -- updated by the process
PauseProcess BIT NOT NULL -- read by the process
);
INSERT INTO dbo.HugeTable_TempStatus (RowsUpdated, LastUpdatedOn, PauseProcess)
VALUES (0, GETDATE(), 0);
END;
-- First check to see if we should run. If no, don't waste time filling temp table
IF (EXISTS(SELECT * FROM dbo.HugeTable_TempStatus WHERE PauseProcess = 1))
BEGIN
PRINT 'Process is paused. No need to start.';
RETURN;
END;
CREATE TABLE #FullSet (KeyField1 DataType1, KeyField2 DataType2);
CREATE TABLE #CurrentSet (KeyField1 DataType1, KeyField2 DataType2);
INSERT INTO #FullSet (KeyField1, KeyField2)
SELECT TOP (@BatchRows) ht.KeyField1, ht.KeyField2
FROM dbo.HugeTable ht
WHERE ht.deleted IS NULL
OR ht.deletedDate IS NULL
WHILE (1 = 1)
BEGIN
-- Check if process is paused. If yes, just exit cleanly.
IF (EXISTS(SELECT * FROM dbo.HugeTable_TempStatus WHERE PauseProcess = 1))
BEGIN
PRINT 'Process is paused. Exiting.';
BREAK;
END;
-- grab a set of rows to update
DELETE TOP (@UpdateRows)
FROM #FullSet
OUTPUT Deleted.KeyField1, Deleted.KeyField2
INTO #CurrentSet (KeyField1, KeyField2);
IF (@@ROWCOUNT = 0)
BEGIN
RAISERROR(N'All rows have been updated!!', 16, 1);
BREAK;
END;
BEGIN TRY
BEGIN TRAN;
-- do the update of the main table
UPDATE ht
SET ht.deleted = 0,
ht.deletedDate = '2000-01-01'
FROM dbo.HugeTable ht
INNER JOIN #CurrentSet cs
ON cs.KeyField1 = ht.KeyField1
AND cs.KeyField2 = ht.KeyField2;
-- update the current status
UPDATE ts
SET ts.RowsUpdated += @@ROWCOUNT,
ts.LastUpdatedOn = GETDATE()
FROM dbo.HugeTable_TempStatus ts;
COMMIT TRAN;
END TRY
BEGIN CATCH
IF (@@TRANCOUNT > 0)
BEGIN
ROLLBACK TRAN;
END;
THROW; -- raise the error and terminate the process
END CATCH;
-- clear out rows to update for next iteration
TRUNCATE TABLE #CurrentSet;
WAITFOR DELAY '00:00:01'; -- 1 second delay for some breathing room
END;
-- clean up temp tables when testing
-- DROP TABLE #FullSet;
-- DROP TABLE #CurrentSet;
You can then check the status at any time using the foll
Context
StackExchange Database Administrators Q#115175, answer score: 12
Revisions (0)
No revisions yet.