patternMajor
Entity relationship problem
Viewed 0 times
problementityrelationship
Problem
I have 4 tables related like this (it's an example):
Suppose that I have a
And a
This will allow the creation of a worker that is from two companies because the classification and department are linked to the company separately. I don't want that to happen, so I think I have a problem with my relationships and I don't know how to solve it.
Company:
ID
Name
CNPJ
Department:
ID
Name
Code
ID_Company
Classification:
ID
Name
Code
ID_Company
Workers:
Id
Name
Code
ID_Classification
ID_DepartmentSuppose that I have a
classification with id = 20, id_company = 1. And a
department that has id_company = 2 (that represents another company). This will allow the creation of a worker that is from two companies because the classification and department are linked to the company separately. I don't want that to happen, so I think I have a problem with my relationships and I don't know how to solve it.
Solution
I don't think you have a problem with the relationships. I think instead the problem is that by using surrogate keys (ie Ids) for each table the resulting database is unable to prevent Workers from being inserted whose Department is of one Company while the Classification is of another and vice versa. A good way to understand this is to visualize the schema using an ER Diagramming tool. I will use the Oracle Data Modeler tool which is a free download.
ER Diagram
As it stands, you could have 2 companies - say
The 1 Ids represent
Options to Resolve
So what are the options to prevent his from happening? There are two immediate options. The first is to realize that by using a surrogate key for every table this issue exists and introduce additional programming to verify it doesn't occur. This could be done in the application, but if inserts and updates can occur outside of the application then incorrect associations can still occur. A better approach would be to create a trigger that fires on insert and update of an employee to make sure that the assigned department is of the same company as the assigned classification, and if not fail the insert or update.
The second option is to not use surrogate keys for every table. Instead, use the surrogate keys only for the
Why is this impossible? It is impossible because the schema implements referential integrity between
Here is an updated diagram of an implementation of the second option:
Preferred Option
Of the two options, I absolutely prefer the second - using the identifying relationships and the cascading keys - for two reasons. First, this option achieves the desired rule with no additional programming. Developing a trigger is not trivial. It must be coded, tested, and maintained. Ensuring the trigger logic is optimal so as not to impact performance is also not trivial. The book Applied Mathematics for Database Professionals provides a lot of detail on the complexity of such a solution. Second, the rules imply that a Department and a Classification can not exist outside the context of the
This is a great question because it shows exactly why simply assuming every table requires a surrogate key is a bad idea. Fabian Pascal has an excellent blog post on just this topic showing that not only can a surrogate key be a bad idea from a data integrity standpoint it can also result in making some retrievals slower at the physical level precisely because joins are required that, had keys been properly cascaded, would be unnecessary. Another interesting topic this question reveals is that a database cannot ensure that all data inserted into it is accurate with respect to the real world. Instead, it can only ensure that the data inserte
ER Diagram
As it stands, you could have 2 companies - say
IBM and Microsoft. IBM can have a Software Development department, and Microsoft can have a Desktop Software department. IBM can have a Software Engineer classification, and Microsoft can have a Software Developer classification. Now, because you have a surrogate key for Department and Classification, the fact that Software Development is an IBM department and Desktop Software is a Microsoft department is lost for future child relationships. This is also the case with Classification. Therefore it is easy to accidentally assign Harlan Mills, who is an IBM employee in the Software Development department, a classification of Software Developer which is a Microsoft classification! Likewise, the worker could be given the right classification and wrong department! Here is a diagram showing the first example:The 1 Ids represent
IBM, and the 2 Ids represent Microsoft. I've highlighted in red the scenario where Harlan Mills and Bill Gates are assigned to the wrong departments, which is visualized by the 10 department Id associated to the 200 classification Id and vice versa.Options to Resolve
So what are the options to prevent his from happening? There are two immediate options. The first is to realize that by using a surrogate key for every table this issue exists and introduce additional programming to verify it doesn't occur. This could be done in the application, but if inserts and updates can occur outside of the application then incorrect associations can still occur. A better approach would be to create a trigger that fires on insert and update of an employee to make sure that the assigned department is of the same company as the assigned classification, and if not fail the insert or update.
The second option is to not use surrogate keys for every table. Instead, use the surrogate keys only for the
Company table, which is fundamental and has no parents, and then create identifying relationships to the Department and Classification child tables. The Department and Classification tables now have a PK of the Company Id plus a Sequence Number or Name to distinguish them. Then, the relationships from Department and Classification to Worker also become identifying and thus the PK of Worker becomes the Company Id, plus the Department Number (I'm using a sequence number in this example), plus the Classification Number. The result is there is only one Company Id in the Worker table. It is now impossible to assign a Worker to a Department in one Company and to a Classification in another Company.Why is this impossible? It is impossible because the schema implements referential integrity between
Worker and Department and Classification. If an attempt is made to insert a Worker for a Department in one Company and and a Classification of another, the combination that does not exist in the corresponding parent table will trigger a referential integrity violation and the insert will not work.Here is an updated diagram of an implementation of the second option:
Preferred Option
Of the two options, I absolutely prefer the second - using the identifying relationships and the cascading keys - for two reasons. First, this option achieves the desired rule with no additional programming. Developing a trigger is not trivial. It must be coded, tested, and maintained. Ensuring the trigger logic is optimal so as not to impact performance is also not trivial. The book Applied Mathematics for Database Professionals provides a lot of detail on the complexity of such a solution. Second, the rules imply that a Department and a Classification can not exist outside the context of the
Company, and so the schema now more accurately reflects the real world.This is a great question because it shows exactly why simply assuming every table requires a surrogate key is a bad idea. Fabian Pascal has an excellent blog post on just this topic showing that not only can a surrogate key be a bad idea from a data integrity standpoint it can also result in making some retrievals slower at the physical level precisely because joins are required that, had keys been properly cascaded, would be unnecessary. Another interesting topic this question reveals is that a database cannot ensure that all data inserted into it is accurate with respect to the real world. Instead, it can only ensure that the data inserte
Context
StackExchange Database Administrators Q#112576, answer score: 25
Revisions (0)
No revisions yet.