HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does my procedure run slowly until I update statistics?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyupdateuntilprocedurestatisticsdoesslowlyrun

Problem

I am using a SQL 2008 database (not R2 sadly but have also tested this on an R2 database server) and have an ELT process running. Data is extracted from source and loaded to staging tables in the destination DB, and then the transform processes are run. On Tuesday (26 Nov) for no known reason, the procedure which 'shreds' the staging data for providers into the relational tables went from running approx 9 per second to one every 5½ minutes. No parts of the database, ELT process (not a typo!), database server etc were changed.

With around 800 records every night (and up to several thousand) this single step of the process has gone from taking just over a minute to potentially taking just over 3 days (this doesn't happen in practise of course because we notice and stop the long running job). This happens even if I manually run the code which is within the procedure rather than letting it run as part of SSIS, and the same happens if I run through the code for just one record rather than letting the loop go through.

By sheer chance I happened to discover that if I ran the following code against the provider staging table and the provider destination relational table, suddenly my ELT process runs as normal again and zips through everything:

update statistics tblProviderStaging with fullscan
update statistics tblProvider with fullscan


My question (sorry for the journey to get here) is why would I have to run this manually every morning? I can't even run it before the ELT as it makes no difference - only affects the process if I run it while the process is slowly churning through. Presently I am running this at 7am every morning (SSIS kicks off at 6:30). Auto Update Statistics and Auto Create Statistics are both set to 'True'.

I have never, in my limited experience, come across a scenario where I have to manually run this every morning? And the fact that I can't just run it either but have to run it while the ELT is still running? We have tried running the SSI

Solution

SQL Server uses a cost based formula for determining how to satisfy queries.

"It'll cost 10 random reads in a billion row table vs 1000 sequential reads. I think I can satisfy the needs faster by doing the 1000 reads since they'll all in order" and decision made off it goes. Unfortunately, when the engine opens the data file to find those 1000 sequential rows, the data has been updated and instead of living in there, there's simply a pointer saying "sorry we had to move, try this address in the next state" And you now have 1000 forwarding rows to contend with. The database engine can't go back and say let's try re-evaluating this costing metric because those 10 reads sound a lot better.

That's probably what is happening (simplified) with your data. The statistics on your data distribution are all out of whack after your ELT processing which is typical-you've just moved a pantsload of data so what was true before may not be as true now. Depending on usage, I will have an explicit call to a reindexing job (thank you Ola) after ETL is complete so that now that all my data is up to date, I need to get all the metadata the server uses up to date as well.

Context

StackExchange Database Administrators Q#54325, answer score: 3

Revisions (0)

No revisions yet.