HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Good reason to denormalize?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
denormalizereasongood

Problem

I have a database with about 40-50 tables. All but 5 are part of a giant hierarchy of 1:M relationships that all point back to one solitary parent (call it "Project"). Each table is joined to its direct parent using a foreign key to the parent's primary key, which is an identity field. Some branches go as many as 6 or 7 levels deep. Many of the tables have millions of rows. Nearly all queries require returning all records of a particular table that pertain to a single project. Queries normally only return fields from the single entity in question. Thus a typical query (practically all queries other than random ad hoc) looks like this:

SELECT a.* FROM a
JOIN b ON a.id=b.id
JOIN c ON b.id=c.id
JOIN d ON c.id=d.id
JOIN Project p ON d.ProjectID=p.ProjectID
WHERE p.ProjectID = 12345


As you can imagine, the farther you go down the hierarchy the worse the queries perform. I considered denormalizing on the one field only by redundantly persisting the ProjectID field throughout all levels of the hierarchy, and creating the clustered index on ProjectID in each table. This would allow me to satisfy all data requests with a single index seek. Doing so dramatically improves query performance. However, I have no idea if doing this negatively impacts the fidelity of the schema design. I'm hoping someone can provide me some insight in case I am missing something.

Solution

We do something similar with client id. Yes it can improve performance if you don't need to go through all those intervening tables in every query. However, and it's an important however, this is best done only if you are using a surrogate key that never changes. Otherwise a change of the project id could require a cascade of updates that affect every table and lockup the system. I suspect project_id is as unlikely to change as client Id (client name, now that's another story) and so you might be fine. But please do consider if you will have updates to the field you denormalize, Also it is critical to set up a way to make sure the tables with the denormalized fields cannot get out of synch with the main table (PK/FK relationships are good for this and you might need cascading updates set (although I personally prefer not to use them if I can help it).)

Context

StackExchange Database Administrators Q#18033, answer score: 6

Revisions (0)

No revisions yet.