patternMinor
Landed as BI, but databases are a big WTF, what to do?
Viewed 0 times
databaseswhatwtfarebutlandedbig
Problem
Maybe a duplicate, but I believe my case is a bit different. From one of the answers I got to this post on SQL Server Central that also comes handy too but is not quite the same scenario: 9 Things to Do When You Inherit a Database
Couple of weeks ago started on a new job. I'm supposed to be working as BI analyst and SQL Developer. But right on the firsts assignments noticed in general everything was taking long to execute. Asked the person that is guiding me on the first days, my supervisor you could say, and he told me that they know the databases are a mess. Asked if I could take a look and see what could be done, got a yes as answer.
So I began to digg in, using several really handy scripts, like for example:
What I've found is a big mess as they told me. As an example, blitzindex procedure returns almost 2000 rows with lot of duplicate indexes, NC indexes including all the columns from a table, lot of heap tables, really wide indexes and more. As for backups, none is done since several weeks, asked about it and IT guys just copy the databases each night to a different server. Couple of databases are over 100Gb and several others are close to that size too. Statistics are updated everyday for every table. There are reports that take more than hour to finish, on not so big tables (just couple of millions of rows). And so on.
As a test I spent couple of days tuning couple of big tables and different procedures and queries that use them. Prepared a baseline, using the profiler. Then made few changes and ran again the test queries. As expected, a report that was taking about 8min now is running in around a minute and a couple of other queries also now take less than half the time. All these changes are done on a test s
Couple of weeks ago started on a new job. I'm supposed to be working as BI analyst and SQL Developer. But right on the firsts assignments noticed in general everything was taking long to execute. Asked the person that is guiding me on the first days, my supervisor you could say, and he told me that they know the databases are a mess. Asked if I could take a look and see what could be done, got a yes as answer.
So I began to digg in, using several really handy scripts, like for example:
- sp_whoisactive from Adam Machanic
- sp_blitz from @BrentOzar
- sp_BlitzIndex also from @Brent Ozar
- sp_IndexAnalysis from Jason Strate
- 2n edit Ola Hallengren maintenance script
- ...and some others to get a picture of what is going on...
What I've found is a big mess as they told me. As an example, blitzindex procedure returns almost 2000 rows with lot of duplicate indexes, NC indexes including all the columns from a table, lot of heap tables, really wide indexes and more. As for backups, none is done since several weeks, asked about it and IT guys just copy the databases each night to a different server. Couple of databases are over 100Gb and several others are close to that size too. Statistics are updated everyday for every table. There are reports that take more than hour to finish, on not so big tables (just couple of millions of rows). And so on.
As a test I spent couple of days tuning couple of big tables and different procedures and queries that use them. Prepared a baseline, using the profiler. Then made few changes and ran again the test queries. As expected, a report that was taking about 8min now is running in around a minute and a couple of other queries also now take less than half the time. All these changes are done on a test s
Solution
Asked if I could take a look and see what could be done, got a yes as answer.
That's very encouraging: you're in an excellent position to advance your career and learn something!
Some of the things you mentioned are problems, some not. I'll answer them briefly and follow with some broader advice.
-
procedure returns almost 2000 rows with lot of duplicate indexes. If they're really duplicates -- type, order, etc. -- extras can be dropped without loss and without fear. The DBMS can use only one, and it doesn't care which.
-
NC indexes including all the columns from a table. Normal. These are known as "covering indexes". They're similar to materialized views.
-
lots of heap tables. Bad. Every table needs a key. Sometimes you'll be told, "it has no key". If true, the whole row can be defined as the primary key, except that an extra column is added (not in the key) as a count. "Adding" a row becomes a matter of adding 1 to the count.
-
really wide indexes might or might not help, cf. covering indexes, above.
-
databases are over 100Gb. Normal, databases grow with data.
-
Statistics are updated everyday for every table. Good.
-
There are reports that take more than hour to finish. Every problem is an opportunity.
My advice is to do your homework by reading four books, and to start small with something dramatic, by fixing the longest-running procedure (or the one you boss thinks is the biggest bugaboo).
When you suggest a change, you want to be able to speak confidently and authoritatively in the domain you're addressing yourself to: relational theory and database design, and SQL Server.
For theory, I recommend CJ Date's An Introduction to Database Systems, SQL and Relational Theory, and Database Design and Relational Theory, in that order. That will give you the information you need, and intellectual reference points when you're asked about why something should be done a certain way. For SQL Server, I can recommend Inside Microsoft SQL Server 2008: T-SQL Querying by Itzik Ben-Gan.
Reading the books is unfortunately the easy part; the hard part is applying them to the database. You may find it difficult to get a list of all the queries used with a table. Not only will make it hard to know which indexes are needed, it means splitting one table into two will require arduous testing to make sure nothing breaks (and application changes, too, potentially). The guy who wants to normalize the database has few friends because any change, even one that's correct and necessary, may create work and disruption for others.
I'd look at your bad boy, the long-running report or the one that seems to be wrong a lot of the time. Examine the query plan and think about the tables in terms of BCNF. If you're lucky, the answer might be adding an index, or possibly removing some. If not, you've got a design problem on your hands, something that's bound to involve others.
Good luck. You have your work cut out for you. The good news is you've got a place to learn some things, and permission to work smarter instead of harder. If you keep at it, one of these days you'll reduce query time not by 30% or 50%, but 99% or more. When that happens, you'll be looking at the new DBA in the mirror, so be careful what you wish for. :-)
That's very encouraging: you're in an excellent position to advance your career and learn something!
Some of the things you mentioned are problems, some not. I'll answer them briefly and follow with some broader advice.
-
procedure returns almost 2000 rows with lot of duplicate indexes. If they're really duplicates -- type, order, etc. -- extras can be dropped without loss and without fear. The DBMS can use only one, and it doesn't care which.
Insert performance improves whenever an index goes bye-bye. -
NC indexes including all the columns from a table. Normal. These are known as "covering indexes". They're similar to materialized views.
-
lots of heap tables. Bad. Every table needs a key. Sometimes you'll be told, "it has no key". If true, the whole row can be defined as the primary key, except that an extra column is added (not in the key) as a count. "Adding" a row becomes a matter of adding 1 to the count.
-
really wide indexes might or might not help, cf. covering indexes, above.
-
databases are over 100Gb. Normal, databases grow with data.
-
Statistics are updated everyday for every table. Good.
-
There are reports that take more than hour to finish. Every problem is an opportunity.
My advice is to do your homework by reading four books, and to start small with something dramatic, by fixing the longest-running procedure (or the one you boss thinks is the biggest bugaboo).
When you suggest a change, you want to be able to speak confidently and authoritatively in the domain you're addressing yourself to: relational theory and database design, and SQL Server.
For theory, I recommend CJ Date's An Introduction to Database Systems, SQL and Relational Theory, and Database Design and Relational Theory, in that order. That will give you the information you need, and intellectual reference points when you're asked about why something should be done a certain way. For SQL Server, I can recommend Inside Microsoft SQL Server 2008: T-SQL Querying by Itzik Ben-Gan.
Reading the books is unfortunately the easy part; the hard part is applying them to the database. You may find it difficult to get a list of all the queries used with a table. Not only will make it hard to know which indexes are needed, it means splitting one table into two will require arduous testing to make sure nothing breaks (and application changes, too, potentially). The guy who wants to normalize the database has few friends because any change, even one that's correct and necessary, may create work and disruption for others.
I'd look at your bad boy, the long-running report or the one that seems to be wrong a lot of the time. Examine the query plan and think about the tables in terms of BCNF. If you're lucky, the answer might be adding an index, or possibly removing some. If not, you've got a design problem on your hands, something that's bound to involve others.
Good luck. You have your work cut out for you. The good news is you've got a place to learn some things, and permission to work smarter instead of harder. If you keep at it, one of these days you'll reduce query time not by 30% or 50%, but 99% or more. When that happens, you'll be looking at the new DBA in the mirror, so be careful what you wish for. :-)
Context
StackExchange Database Administrators Q#37544, answer score: 3
Revisions (0)
No revisions yet.