patternsqlMinor
SQL Master Data Services database has huge leaf member staging table - can I purge old records?
Viewed 0 times
canoldsqlrecordsdatabasehasservicesmasterhugeleaf
Problem
One of our SQL instances has a large MDS database, and virtually 100% of the size is due to a 300+ million row table named
From what I can tell, this is a Leaf Member Staging Table for a "Project" entity, set up and automated (by a prior consultant) as a part of an ETL process that builds BI cubes/reports.
Looks like it has been processing about 10k rows per half hour for the past several years and has never been purged. When I check the
Can I just
This system is being used in an automated fashion, its not like people are logging into the MDS console and individually monitoring and rolling back changes. I found some references to stored procedures that purge these tables (
stg.Stg_Project_Leaf. It recently started blowing up our weekly index rebuilds, so I need to figure this out.From what I can tell, this is a Leaf Member Staging Table for a "Project" entity, set up and automated (by a prior consultant) as a part of an ETL process that builds BI cubes/reports.
Looks like it has been processing about 10k rows per half hour for the past several years and has never been purged. When I check the
ImportStatus_ID, though, I see 0 rows waiting to be processed. (Most are 1 - succeeded, less than 1% are 2 - failed.)Can I just
TRUNCATE this table? This system is being used in an automated fashion, its not like people are logging into the MDS console and individually monitoring and rolling back changes. I found some references to stored procedures that purge these tables (
mdm.udpStagingClear or maybe mdm.udpDeletedMembersPurge), but I'm not sure which of those is the one I'd need, nor do I know if they would blow up the log if they tried to do 300m rows at once.Solution
In short: The official answer is to use the new proc
I found reference to a SQL 2012 patch that adds 3 stored procedures to MDS, including the one I need:
See also this article for scheduled cleanup options utilizing the same procs.
For my purpose, though, I was still concerned about blowing up the tran log with the 300+ million rows that had to be deleted, so I looked in detail at the inner workings of
There is no attempt to limit the amount of rows deleted at once, and in my case, the 300+ million rows are all for a single
The good news, though, is that if I intend to remove all
as long as I am careful to not attempt it right in the middle of an active batch.
Finally, while I couldn't find any official MS sources that explicitly recommended truncating these tables, I did find several articles where DBAs do a routine
So my plan is to do an initial
mdm.udpEntityStagingBatchTableCleanup to clean up entity staging tables, but in practice, DELETE or TRUNCATE should work just fine (or might be necessary), as long as you understand your batch detail.I found reference to a SQL 2012 patch that adds 3 stored procedures to MDS, including the one I need:
--Cleanup entity-based staging table
EXEC mdm.udpEntityStagingBatchTableCleanup @ModelID, @CleanupOlderThanDate;See also this article for scheduled cleanup options utilizing the same procs.
For my purpose, though, I was still concerned about blowing up the tran log with the 300+ million rows that had to be deleted, so I looked in detail at the inner workings of
mdm.udpEntityStagingBatchTableCleanup. This stored proc does the following:- Populates a temp table with all completed batches that match your deletion criteria (for the given
@Model_ID, older than@CleanupOlderThanDate).
- Builds dynamic
DELETEstatements for the corresponding Leaf, Consolidated, and Relationship tables for eachDISTINCT Entity_IDin the temp table.
- It executes each of those
DELETEstatements for each entity table, filtered via a join to the temp table.
- It then deletes from
mdm.tblStgBatchitself, again filtered by join to the temp table.
There is no attempt to limit the amount of rows deleted at once, and in my case, the 300+ million rows are all for a single
Entity_ID, which would likely blow up the transaction log.The good news, though, is that if I intend to remove all
Entity_IDs and all dates anyway (which I do), then that's functionally equivalent to a TRUNCATE,as long as I am careful to not attempt it right in the middle of an active batch.
Finally, while I couldn't find any official MS sources that explicitly recommended truncating these tables, I did find several articles where DBAs do a routine
TRUNCATE at either the beginning or the end of their import process:- "Step 1: Truncates each of the 3 staging tables."
- "If you want to empty the staging table, you can either delete all records, or click Clear Batches in Integration Management."
- TRUNCATE TABLE stg.TableName_Leaf
So my plan is to do an initial
TRUNCATE, and then schedule mdm.udpEntityStagingBatchTableCleanup to run on a periodic basis.Code Snippets
--Cleanup entity-based staging table
EXEC mdm.udpEntityStagingBatchTableCleanup @ModelID, @CleanupOlderThanDate;Context
StackExchange Database Administrators Q#173638, answer score: 3
Revisions (0)
No revisions yet.