patternMinor
Optimisation Newbie: How much of a 'sin' is redundancy?
Viewed 0 times
muchoptimisationredundancynewbiehowsin
Problem
I've been developing for a while, but never really had to deal with DB/scaling issues before. That's suddenly changed and I've found myself in the deep end.
I have 2 SQL tables, as such:
The 'questions' table looks like this:
The problem I have is that I very often have to retrieve the sum of all option_1 votes (or option_2 votes) for a question. This is currently done by selecting count where question_id = [@question.id] and option_id = [@question.option_1.id]"
I'm guessing it'd be a heckuva lot quicker to just add option_1_votes and option_2_votes columns to the "questions" table and increment them each time a vote is added. But ultimately, that's redundant data.
So, as someone who's pretty clueless re: database design princpiples, what's the rule of thumb here? Would a top DBA just add in the columns, or try some other solution?
Cheers...
I have 2 SQL tables, as such:
VOTES
vote_id (PK)
question_id (FK)
user_id (FK)
option_id (FK) The 'questions' table looks like this:
QUESTIONS
question_id (PK)
option_1 (FK --> Options)
option_2 (FK --> Options)The problem I have is that I very often have to retrieve the sum of all option_1 votes (or option_2 votes) for a question. This is currently done by selecting count where question_id = [@question.id] and option_id = [@question.option_1.id]"
I'm guessing it'd be a heckuva lot quicker to just add option_1_votes and option_2_votes columns to the "questions" table and increment them each time a vote is added. But ultimately, that's redundant data.
So, as someone who's pretty clueless re: database design princpiples, what's the rule of thumb here? Would a top DBA just add in the columns, or try some other solution?
Cheers...
Solution
Generally, I don't add redundant columns unless I really need too.
Running a COUNT over a set of data is quite efficient in any RDBMS.
Consider this is a read over indexed (hopefully) cached data to get the count will beat the the 2nd write in to maintain the denormlaised column. This write requires more resources/locking/longer transaction etc which impact reads more
If performance becomes an issue over time, then you can pre-calculate the COUNT more efficiently using an indexed (aka materialised) view
Running a COUNT over a set of data is quite efficient in any RDBMS.
Consider this is a read over indexed (hopefully) cached data to get the count will beat the the 2nd write in to maintain the denormlaised column. This write requires more resources/locking/longer transaction etc which impact reads more
If performance becomes an issue over time, then you can pre-calculate the COUNT more efficiently using an indexed (aka materialised) view
Context
StackExchange Database Administrators Q#16803, answer score: 6
Revisions (0)
No revisions yet.