HiveBrain v1.2.0
Get Started
← Back to all entries
patternMajor

Bad Practice to Store Calculated Data in Database?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
practicebadstoredatabasedatacalculated

Problem

Is it bad practice to store calculated data in each row, or is it better to calculate at the application layer with every read from the database.

Storing in the database avoids the need to calculate multiple times, but if an error is made then data needs to be updated rather than just changing the application level calculations.

I think the latter is better, but is there a general rule of thumb?

I need to, for example, calculate total daily nutritional intake of foods. So various portions of energy of foods. I can either calculate the portion energy based on the corresponding food and store the energy of each portion in the portions table OR I can calculate from the join with the corresponding food every time.

You can imagine if you had to calculate yearly averages, monthly averages, daily averages, etc. for a long period of time it could get quite unwieldy.

What about using materialized views that would get recomputed every time old data, say a week or older, gets updated based on a trigger or something along those lines?

Solution

If computers were infinitely fast, then, No, you would never store a value that could be calculated from other columns in the database. Storing calculated values is a violation of database normalization.

Examples would include:

  • The extended cost on an invoice line that carries price and quantity fields.



  • The total cost of an invoice’s lines.



  • The elapsed time between two moments.



In the real world, we do sometimes choose to violate normalization. The motivation is usually for performance reasons.

Never violate normal forms without much thought, and hopefully a consultation with another DBA or database developer. And always document thoroughly your decision and its specific motivation.

is there a general rule of thumb?

Yes: Normalize, and test/profile.

Always start with a normalized design. Load with fake data, test performance. Verify any discovered bottleneck is indeed due to your on-the-fly calculations.

Context

StackExchange Database Administrators Q#202758, answer score: 25

Revisions (0)

No revisions yet.