patternCritical
Duplicate column for faster queries?
Viewed 0 times
columnduplicatefasterforqueries
Problem
The title doesn't make too much sense, but I couldn't think a better title for this problem.
I have the following tables
Projects
Customers
Payments
When a users enters the system, he will have access to a certain project. Now, I want to list all the payments for that project, and it should be pretty easy:
My question is : if it isn't better to add a column id_project to payments table this way the queries will be easier and faster.
I have the following tables
Projects
- id
- name
Customers
- id
- id_project
- name
Payments
- id
- id_customer
- date
- sum
When a users enters the system, he will have access to a certain project. Now, I want to list all the payments for that project, and it should be pretty easy:
SELECT FROM payments where id_customer in (SELECT id from customers where id_project = 5)My question is : if it isn't better to add a column id_project to payments table this way the queries will be easier and faster.
Solution
It seems you are asking if denormalization makes sense.
Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software. A relational normalized database imposes a heavy access load over physical storage of data even if it is well tuned for high performance.
The answer is always "it depends", so here's my rule of thumb:
If ...
then stay normalized. Yes, denormalization is faster, but it also means you have redundant data in the system -- data that has to be maintained and kept in sync. There is no longer "one source" for that data, but multiple sources that can deviate. This is risky over time, so you shouldn't do it unless you have very good reasons to do it, backed by some benchmarks.
I would only denormalize when ...
Joins are very fast on modern hardware, but they are never free.
Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software. A relational normalized database imposes a heavy access load over physical storage of data even if it is well tuned for high performance.
The answer is always "it depends", so here's my rule of thumb:
If ...
- the amount of data is not large
- you aren't doing a ton of joins already
- and/or database performance is not currently a bottleneck
then stay normalized. Yes, denormalization is faster, but it also means you have redundant data in the system -- data that has to be maintained and kept in sync. There is no longer "one source" for that data, but multiple sources that can deviate. This is risky over time, so you shouldn't do it unless you have very good reasons to do it, backed by some benchmarks.
I would only denormalize when ...
- the amount of data is very large
- joins are expensive and you have to do a lot of them to get even trivial queries returned
- database performance is a bottleneck and/or you want to go as fast as possible
Joins are very fast on modern hardware, but they are never free.
Context
StackExchange Database Administrators Q#461, answer score: 65
Revisions (0)
No revisions yet.