patternMinor
Do I have to (sometimes) sacrifice normalization for data integrity?
Viewed 0 times
sometimessacrificeintegrityfornormalizationdatahave
Problem
I often come up against the following challenge, and end up de-normalizing my table design in order to enforce data integrity. I would be interested to hear if there is a different way of modelling the data that is both normalized and enforces integrity.
Here is a typical (simplified) example:
```
create table [ProductType]
(
[ProductTypeId] INT identity(1,1) not null,
[ProductTypeName] nvarchar(100) not null,
constraint [PK_ProductType] primary key ([ProductTypeId])
)
create table [Product]
(
[ProductId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[ProductName] nvarchar(100) not null,
constraint [PK_Product] primary key ([ProductId]),
constraint [AK_Product] unique ([ProductId], [ProductTypeId]),
constraint [FK_Product_ProductType] foreign key ([ProductTypeId]) references ProductType
)
create table [ProductTypeProperty]
(
[PropertyId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[PropertyName] nvarchar(100) not null,
constraint [PK_ProductTypeProperty] primary key ([PropertyId]),
constraint [AK_ProductTypeProperty] unique ([PropertyId], [ProductTypeId]),
constraint [FK_ProductTypeProperty_ProductType] foreign key ([ProductTypeId]) references ProductType
)
create table [ProductPropertyValue]
(
[ProductId] INT not null,
[PropertyId] INT not null,
[ProductTypeId] int not null,
[PropertyValue] nvarchar(100) not null,
constraint [PK_ProductPropertyValue] primary key ([ProductId], [PropertyId]),
constraint [FK_ProductPropertyValue_ProductTypeProperty] foreign key ([PropertyId], [ProductTypeId]) references ProductTypeProperty,
constraint [FK_ProductPropertyValue_Product] foreign key ([ProductId], [ProductTypeId]) references Product
)
SET IDENTITY_INSERT [dbo].[ProductType] ON
INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (1
Here is a typical (simplified) example:
```
create table [ProductType]
(
[ProductTypeId] INT identity(1,1) not null,
[ProductTypeName] nvarchar(100) not null,
constraint [PK_ProductType] primary key ([ProductTypeId])
)
create table [Product]
(
[ProductId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[ProductName] nvarchar(100) not null,
constraint [PK_Product] primary key ([ProductId]),
constraint [AK_Product] unique ([ProductId], [ProductTypeId]),
constraint [FK_Product_ProductType] foreign key ([ProductTypeId]) references ProductType
)
create table [ProductTypeProperty]
(
[PropertyId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[PropertyName] nvarchar(100) not null,
constraint [PK_ProductTypeProperty] primary key ([PropertyId]),
constraint [AK_ProductTypeProperty] unique ([PropertyId], [ProductTypeId]),
constraint [FK_ProductTypeProperty_ProductType] foreign key ([ProductTypeId]) references ProductType
)
create table [ProductPropertyValue]
(
[ProductId] INT not null,
[PropertyId] INT not null,
[ProductTypeId] int not null,
[PropertyValue] nvarchar(100) not null,
constraint [PK_ProductPropertyValue] primary key ([ProductId], [PropertyId]),
constraint [FK_ProductPropertyValue_ProductTypeProperty] foreign key ([PropertyId], [ProductTypeId]) references ProductTypeProperty,
constraint [FK_ProductPropertyValue_Product] foreign key ([ProductId], [ProductTypeId]) references Product
)
SET IDENTITY_INSERT [dbo].[ProductType] ON
INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (1
Solution
I often come up against the following challenge, and end up de-normalizing my table design in order to enforce data integrity.
This sounds like a contradiction. It's normalizing that usually enforces integrity. And in my opinion, you have a misunderstanding. You did not de-normalize your design. It is normalized just fine.
I would be interested to hear if there is a different way of modeling the data that is both normalized and enforces integrity.
The specific issue - I call it diamond shape - appears quite often, although many do not realize or see it as a problem and do not enforce the constraint that you have enforced with your design.
And I don't think there is any other way, ignoring some details. A unique key of the "grandparent" table (
Point #4 is the one that causes the challenge. In order to create foreign keys to achieve this ([FK_ProductProperty_ProductTypeProperty] and [FK_ProductPropertyValue_Product]) I have de-normalized and added "unnecessary" unique constraints:
I disagree about the "unnecessary". They are needed, first for the
Your unique constraints are just fine and needed for the integrity reasons you have described:
to ensure that a
The redundant in the design are the primary key constraints on the identity columns of
I used the phrase "de-normalizing my table design" to mean "adding columns to tables that are already in other related tables" (i.e. the value for a given row can be derived from a related row). Isn't one aspect of normalisation the removal of redundant columns?
Yes, it is. But you have to start from entities, attributes and functional dependencies. When you add - before doing that - surrogate keys, you are tempted to use them everywhere as foreign key columns, considering that DBMS provides with the extra feature of Uniqueness. So, you may miss using some other unique column or combination. For example,
Whether that would be more efficient is another matter, as long columns like
And I say "technically" and stress again that the only reason that
This challenge seems to come up when my design departs from a simple branching "snowflake" and there needs to be "loops" in the relationships:
That's why I call it "diamond shape"!
And please note that there is no loop there!
- Just start anywhere you like the follow the arrows. Can you return to where you started?
- No. So there are no loops.
Note: this is a simple example with a 4 tables. With a real-world design with many levels of related tables, the issue becomes magnified (unique constraints on multiple columns, more than one "unnecessary" unique constraint per table etc).
Well, that is a problem. When the rules are complicated
This sounds like a contradiction. It's normalizing that usually enforces integrity. And in my opinion, you have a misunderstanding. You did not de-normalize your design. It is normalized just fine.
I would be interested to hear if there is a different way of modeling the data that is both normalized and enforces integrity.
The specific issue - I call it diamond shape - appears quite often, although many do not realize or see it as a problem and do not enforce the constraint that you have enforced with your design.
And I don't think there is any other way, ignoring some details. A unique key of the "grandparent" table (
PropertyType in this case), has to be part of unique constraints of the "parent" tables (Product, Property here) and to appear in the "child" table (ProductPropertyValue), exactly as you did.Point #4 is the one that causes the challenge. In order to create foreign keys to achieve this ([FK_ProductProperty_ProductTypeProperty] and [FK_ProductPropertyValue_Product]) I have de-normalized and added "unnecessary" unique constraints:
I disagree about the "unnecessary". They are needed, first for the
FOREIGN KEY constraints to be defined and work. Your unique constraints are just fine and needed for the integrity reasons you have described:
to ensure that a
ProductPropertyValue is related to a Product and a Property that both (Product and Property) belong to the same PropertyType. The redundant in the design are the primary key constraints on the identity columns of
Product and Property. They can be removed without any loss of integrity. I know it's more than common that an automatically incrementing (identity in SQL Server) column has also a UNIQUE or PRIMARY KEY constraint. But it doesn't really need it. It may be good for performance reasons to keep that constraint, though. You may also use the column as foreign key in some other table, where the there is no connection with property types (and thus the PropertyTypeId is unnecessary). But from a logical design and normalization point of view, there is no normalization issue. It's just an implementation, physical design detail. I used the phrase "de-normalizing my table design" to mean "adding columns to tables that are already in other related tables" (i.e. the value for a given row can be derived from a related row). Isn't one aspect of normalisation the removal of redundant columns?
Yes, it is. But you have to start from entities, attributes and functional dependencies. When you add - before doing that - surrogate keys, you are tempted to use them everywhere as foreign key columns, considering that DBMS provides with the extra feature of Uniqueness. So, you may miss using some other unique column or combination. For example,
(ProductTypeId, ProductName) might be able to identify products, too, so it could have a unique constraint, I don't know the details of your business rules. If it was, it could be used instead as a foreign key in the ProductPropertyValue. And similarly for Property.Whether that would be more efficient is another matter, as long columns like
nvarachar(100) are not the best fit for indexes. So, even if you had that unique constraint, you might end up using what you chose in the first place, for efficiency reasons. (Consider 4+4+4=12 vs 4+200+200=404 if we had the two nvarchars from Product and Property.) Choosing between a 12-bytes and a 404-bytes index is not hard choice. I'd think that all would go with the 12 size, even if the ProductPropertyValue is technically not in 3NF.And I say "technically" and stress again that the only reason that
ProductpropertyValue looks as if it is not in 3NF, is because the DBMS provided identity columns (ProductId and PropertyId for Product and Property) have the nice extra feature of Uniqueness (which actually isn't guaranteed in SQL Server. You could set IDENTITY_INSERT ON/OFF or some similar property in other DBMS and push values that are not unique in identity columns. Uniqueness is only guaranteed by the unique constraints you have in place.)This challenge seems to come up when my design departs from a simple branching "snowflake" and there needs to be "loops" in the relationships:
P ---------> PT
^ ^
| |
| |
PPV ------> PTPThat's why I call it "diamond shape"!
And please note that there is no loop there!
- Just start anywhere you like the follow the arrows. Can you return to where you started?
- No. So there are no loops.
Note: this is a simple example with a 4 tables. With a real-world design with many levels of related tables, the issue becomes magnified (unique constraints on multiple columns, more than one "unnecessary" unique constraint per table etc).
Well, that is a problem. When the rules are complicated
Code Snippets
P ---------> PT
^ ^
| |
| |
PPV ------> PTPPropertyType Survey
/ \ / \
/ \ / \
/ \ / \
Product Property PersonSurvey QuestionSurevy
\ / \ /
\ / \ /
\ / \ /
ProductPropertyValue ResponseContext
StackExchange Database Administrators Q#150804, answer score: 4
Revisions (0)
No revisions yet.