patternModerate
Normalization: Is it considered compliant to split static, numeric values like a year into their own table?
Viewed 0 times
valuesyearnumericcompliantintoconsideredlikesplitowntheir
Problem
I am having an interesting discussion with another database designer about normalization. In this example, we have a GameTitles table and each record must contain the year in which the game was released. He says 2NF mandates that everything must be normalized, so, to be compliant, the year field should be split off into a ReleaseYears table with its own primary key that is referenced by the GameTitles table. I say it should remain as a field on the GameTitles table itself.
My argument for this is that a year is just a non-primitive numeric value that is static by its very nature (ie, 2011 will always be 2011). Due to this, it serves as its own identifier and needs nothing to reference it since it is what it is. This also introduces additional maintenance since you now have to add a new year to the table just to reference it. If you prepopulate the table with a large range of years then you have extra records that potentially won't have references to them at all. This also increases database size since you now have an extra table, record overhead, and the additional primary key for the year itself. If you keep the year as a field on the GameTitles table, you eliminate all this additional maintenance and overhead.
Thoughts on this?
edit: Meant to post this on StackOverflow. Can someone vote to delete this or flag it for attention?
My argument for this is that a year is just a non-primitive numeric value that is static by its very nature (ie, 2011 will always be 2011). Due to this, it serves as its own identifier and needs nothing to reference it since it is what it is. This also introduces additional maintenance since you now have to add a new year to the table just to reference it. If you prepopulate the table with a large range of years then you have extra records that potentially won't have references to them at all. This also increases database size since you now have an extra table, record overhead, and the additional primary key for the year itself. If you keep the year as a field on the GameTitles table, you eliminate all this additional maintenance and overhead.
Thoughts on this?
edit: Meant to post this on StackOverflow. Can someone vote to delete this or flag it for attention?
Solution
The other database designer is simply wrong, but your reasoning is wrong as well. Assume you start with this table, which has a single candidate key, "game_title".
You evaluate whether it's in 2NF by asking yourself these questions.
Q: First of all, is it in 1NF?
A: Yes, it is.
Q: What are the prime attributes (attributes that are part of a candidate key)?
A: "game_title" is the only prime attribute.
Q: What are the non-prime attributes?
A: "year_first_released" is the only one.
Q: Is "year_first_released" functionally dependent on the whole of "game_title", or on just a part of it?
A: The sole candidate key, "game_title", is a single column; it doesn't even have parts. So "year_first_released" is functionally dependent on the whole of "game_title".
Voilà. You've found 2NF.
You can cut through some of the formal terms by asking first whether it's in 1NF, and then answering this question.
Q: Are there any composite candidate keys?
A: No.
Voilà. You've found 2NF again.
By definition, for a table to violate 2NF, it has to have at least one candidate key that has more than one column.
Here are your reasons for rejecting your friend's opinion.
None of these reasons have anything at all to do with whether a table is in 2NF.
In designing a database, it's not wrong to consider maintenance issues, database size, unreferenced rows, range constraints, and so on. It's just wrong to call those things normalization.
Oh, and that two-column table that I provided above--it's in 5NF.
Table: game_titles
game_title year_first_released
--
The first game 1998
The second game 1999
Best game: the third one 2001
The fourth game 2003
Forty-two, the end of games 2011You evaluate whether it's in 2NF by asking yourself these questions.
Q: First of all, is it in 1NF?
A: Yes, it is.
Q: What are the prime attributes (attributes that are part of a candidate key)?
A: "game_title" is the only prime attribute.
Q: What are the non-prime attributes?
A: "year_first_released" is the only one.
Q: Is "year_first_released" functionally dependent on the whole of "game_title", or on just a part of it?
A: The sole candidate key, "game_title", is a single column; it doesn't even have parts. So "year_first_released" is functionally dependent on the whole of "game_title".
Voilà. You've found 2NF.
You can cut through some of the formal terms by asking first whether it's in 1NF, and then answering this question.
Q: Are there any composite candidate keys?
A: No.
Voilà. You've found 2NF again.
By definition, for a table to violate 2NF, it has to have at least one candidate key that has more than one column.
Here are your reasons for rejecting your friend's opinion.
- A year is just a non-primitive numeric value.
- A year is static by its very nature.
- A year serves as its own identifier.
- A table of years introduces additional maintenance.
- A table of years might have extra rows that aren't referenced.
- A table of years increases database size.
None of these reasons have anything at all to do with whether a table is in 2NF.
In designing a database, it's not wrong to consider maintenance issues, database size, unreferenced rows, range constraints, and so on. It's just wrong to call those things normalization.
Oh, and that two-column table that I provided above--it's in 5NF.
Code Snippets
Table: game_titles
game_title year_first_released
--
The first game 1998
The second game 1999
Best game: the third one 2001
The fourth game 2003
Forty-two, the end of games 2011Context
StackExchange Database Administrators Q#4889, answer score: 14
Revisions (0)
No revisions yet.