snippetsqlMinor
How to structure Master/Detail tables for performance without denormalizing?
Viewed 0 times
tablesdetailwithoutdenormalizingmasterstructureforhowperformance
Problem
Some Background Info
We have a set of tables that holds all the transactions for our system,
The Issue
Querying has become cumbersome. It takes a very long time to access the sales of a given store or a given item for any period.
What I've Tried
I have tried to bring
Given that adding more information from the
I've had the idea to partition
We have a set of tables that holds all the transactions for our system,
TransactionHeaders and TransactionDetails. There are roughly 80k Transactions a day which translates to 80k rows in TransactionHeaders and 900k rows in TransactionDetails being burned daily. At current, the TransactionHeaders has about 10 million rows and TransactionDetails has about 110 million rows. We use the data for general reporting. dbo.TransactionHeaders (
TransactionHeaderId int identity,
TransactionHeaderTypeId int not null,
StoreId int not null,
TransactionDate date not null,
TransactionTime time not null,
...)
dbo.TransactionDetails (
TransactionDetailId int identity,
TransactionHeaderId int not null,
ItemUPC NCHAR(14) not null,
Price NUMERIC(16,2) not null,
ReplicationDate datetime not null
...)The Issue
Querying has become cumbersome. It takes a very long time to access the sales of a given store or a given item for any period.
What I've Tried
I have tried to bring
TransactionDate down to the TransactionDetails table in order to partition it on the date with one partition per day. This worked great for finding the sales of an item. The problem is that many of the reports require the StoreID in addition to being over a specific date range.Given that adding more information from the
TransactionHeader table to the TransactionDetail table breaks the pattern, I'm hesitant to denormalize the tables into one table due to storage concerns. I've had the idea to partition
TransactionHeader on TransactionDate and partitioSolution
You've got a few different questions in here:
Q: It takes a very long time to access the sales of a given store or a given item for any period.
A: To troubleshoot that, we would need to see the execution plans of the queries involved, plus know a little about the query runtime and the hardware involved. 10mm rows in a header table and 110mm rows in a detail table isn't much at all for SQL Server, so this should be a solvable problem.
Q: (Partitioning) worked great for finding the sales of an item. The problem is that many of the reports require the StoreID in addition to being over a specific date range.
A: Correct, partitioning rarely makes SELECT queries faster. It's more about improving performance of bulk loads, specifically partition switching. I wouldn't think of partitioning as a solution to this problem, and indeed, it will actually make most queries worse.
Q: Is there a preferred, correct, standard, etc. method for dealing with tables in the Master/Detail pattern in order to increase performance?
A: Absolutely - archive older data. Figure out what you're going to let users query online at high speed, and then beyond that, move the data into a separate set of archive tables. You can use a partitioned view over the old and new tables in order to give them a single seamless view into the data for easier reporting too.
There's a lot of advantages to this approach. For example, when you want to add additional fields to the current table, you can do that quickly without having to deal with a large amount of archive data. If you want to add lots of indexes to the old archive data, you can - because it's not getting tons of inserts/updates/deletes anymore. If you split the old and new data into different databases, you can even use different backup/recovery strategies with them - even while the view is in place, and users don't know the data is split.
Q: It takes a very long time to access the sales of a given store or a given item for any period.
A: To troubleshoot that, we would need to see the execution plans of the queries involved, plus know a little about the query runtime and the hardware involved. 10mm rows in a header table and 110mm rows in a detail table isn't much at all for SQL Server, so this should be a solvable problem.
Q: (Partitioning) worked great for finding the sales of an item. The problem is that many of the reports require the StoreID in addition to being over a specific date range.
A: Correct, partitioning rarely makes SELECT queries faster. It's more about improving performance of bulk loads, specifically partition switching. I wouldn't think of partitioning as a solution to this problem, and indeed, it will actually make most queries worse.
Q: Is there a preferred, correct, standard, etc. method for dealing with tables in the Master/Detail pattern in order to increase performance?
A: Absolutely - archive older data. Figure out what you're going to let users query online at high speed, and then beyond that, move the data into a separate set of archive tables. You can use a partitioned view over the old and new tables in order to give them a single seamless view into the data for easier reporting too.
There's a lot of advantages to this approach. For example, when you want to add additional fields to the current table, you can do that quickly without having to deal with a large amount of archive data. If you want to add lots of indexes to the old archive data, you can - because it's not getting tons of inserts/updates/deletes anymore. If you split the old and new data into different databases, you can even use different backup/recovery strategies with them - even while the view is in place, and users don't know the data is split.
Context
StackExchange Database Administrators Q#21893, answer score: 4
Revisions (0)
No revisions yet.