patternMinor
Database design: Normalizing a "(many-to-many)-to-many" relationship
Viewed 0 times
designdatabasemanynormalizingrelationship
Problem
Short version
I have to add a fixed number of additional properties to each pair in an existing many-to-many join. Skipping to the diagrams below, which of Options 1-4 is the best way, in terms of advantages and disadvantages, to accomplish this by extending the Base Case? Or, is there a better alternative I haven't considered here?
Longer version
I currently have two tables in a many-to-many relationship, via an intermediate join table. I now need to add additional links to properties that belong to the pair of existing objects. I have a fixed number of these properties for each pair, though one entry in the property table may apply to multiple pairs (or even be used multiple times for one pair). I'm trying to determine the best way to do this, and am having trouble sorting out how to think of the situation. Semantically it seems as if I can describe it as any of the following equally well:
Example
I have two object types, X and Y, each with unique IDs, and a linking table
Base Case
Now additionally I have a set of properties defined in another table, and a set of conditions under which a given (X,Y) pair should have property P. The number of conditions is fixed, and the same for all pairs. They basically say "In situation C1, pair (X1,Y1) has property P1", "In situation C2, pair (X1,Y1) has property P2", and so on, for three situations/conditions for each pair in the join table.
Option 1
In my current situation there are exactly three such conditions, and I have no reason to expect that to increase, so one possibility is to add columns `c1_
I have to add a fixed number of additional properties to each pair in an existing many-to-many join. Skipping to the diagrams below, which of Options 1-4 is the best way, in terms of advantages and disadvantages, to accomplish this by extending the Base Case? Or, is there a better alternative I haven't considered here?
Longer version
I currently have two tables in a many-to-many relationship, via an intermediate join table. I now need to add additional links to properties that belong to the pair of existing objects. I have a fixed number of these properties for each pair, though one entry in the property table may apply to multiple pairs (or even be used multiple times for one pair). I'm trying to determine the best way to do this, and am having trouble sorting out how to think of the situation. Semantically it seems as if I can describe it as any of the following equally well:
- One pair linked to one set of a fixed number of additional properties
- One pair linked to many additional properties
- Many (two) objects linked to one set of properties
- Many objects linked to many properties
Example
I have two object types, X and Y, each with unique IDs, and a linking table
objx_objy with columns x_id and y_id, which together form the primary key for the link. Each X can be related to many Ys, and vice versa. This is the setup for my existing many-to-many relationship.Base Case
Now additionally I have a set of properties defined in another table, and a set of conditions under which a given (X,Y) pair should have property P. The number of conditions is fixed, and the same for all pairs. They basically say "In situation C1, pair (X1,Y1) has property P1", "In situation C2, pair (X1,Y1) has property P2", and so on, for three situations/conditions for each pair in the join table.
Option 1
In my current situation there are exactly three such conditions, and I have no reason to expect that to increase, so one possibility is to add columns `c1_
Solution
- Option 1
*This doesn't seem like a great idea to me, because it complicates the SQL to select all properties applied to a feature…
It does not necessarily complicate query SQL (see conclusion below).
…and doesn't readily scale to more conditions…
It scales readily to more conditions, as long as there are still a fixed number of conditions, and there aren't dozens or hundreds.
However, it does enforce the requirement of a certain number of conditions per (X,Y) pair. In fact, it is the only option here that does so.*
It does, and although you say in a comment that this is "the least important of my requirements", you haven't said it doesn't matter at all.
- Option 2
One downside to this is that it doesn't specify the number of conditions for each pair. Another is that when I am only considering the initial relationship…I then have to add a DISTINCT clause to avoid duplicate entries…
I think you can dismiss this option because of the complications you mention. The
objx_objy table is likely to be the driving table for some of your queries (eg "select all properties applied to a feature", which I am taking to mean all properties applied to an objx or objy). You can use a view to pre-apply the DISTINCT so it is not a matter of complicating queries, but that's going to scale very badly performance-wise for very little gain.- Option 3
Does it make sense though to create a new ID that identifies nothing other than existing IDs?
No, it doesn't — Option 4 is better in every regard.
- Option 4
…it basically duplicates an entire table multiple times (or feels that way, anyway) so also doesn't seem ideal.
This option is just fine — it is the obvious way of setting up the relations if the number of properties is variable or subject to change
Conclusion
My preference would be option 1 if the number of properties per
objx_objy is likely to be stable, and if you can't imagine ever adding more than a handful extra. It is also the only option that enforces the 'number of properties = 3' constraint — enforcing a similar constraint on option 4 would likely involve adding c1_p_id… columns to the xy table anyway*.If you really don't care much about that condition, and you also have reason to doubt that the number of properties condition is going to be stable then choose option 4.
If you aren't sure which, choose option 1 — it is simpler and that is definitely better if you have the option, as others have said. If you are put off option 1 "…because it complicates the SQL to select all properties applied to a feature…" I suggest the creating a view to provide the same data as the extra table in option 4:
option 1 tables:
create table prop(id integer primary key);
create table objx(id integer primary key);
create table objy(id integer primary key);
create table objx_objy(
x_id integer references objx
, y_id integer references objy
, c1_p_id integer not null references prop
, c2_p_id integer not null references prop
, c3_p_id integer not null references prop
, primary key (x_id, y_id)
);
insert into prop(id) select generate_series(90,99);
insert into objx(id) select generate_series(10,12);
insert into objy(id) select generate_series(20,22);
insert into objx_objy(x_id,y_id,c1_p_id,c2_p_id,c3_p_id)
select objx.id, objy.id, 90, 91, 90+floor(random()*10)
from objx cross join objy;view to 'emulate' option 4:
create view objx_objy_prop as
select x_id
, y_id
, unnest(array[1,2,3]) c_id
, unnest(array[c1_p_id,c2_p_id,c3_p_id]) p_id
from objx_objy;"select all properties applied to a feature":
select distinct p_id from objx_objy_prop where x_id=10 order by p_id;
/*
|p_id|
|---:|
| 90|
| 91|
| 97|
| 98|
*/dbfiddle here
Code Snippets
create table prop(id integer primary key);
create table objx(id integer primary key);
create table objy(id integer primary key);
create table objx_objy(
x_id integer references objx
, y_id integer references objy
, c1_p_id integer not null references prop
, c2_p_id integer not null references prop
, c3_p_id integer not null references prop
, primary key (x_id, y_id)
);
insert into prop(id) select generate_series(90,99);
insert into objx(id) select generate_series(10,12);
insert into objy(id) select generate_series(20,22);
insert into objx_objy(x_id,y_id,c1_p_id,c2_p_id,c3_p_id)
select objx.id, objy.id, 90, 91, 90+floor(random()*10)
from objx cross join objy;create view objx_objy_prop as
select x_id
, y_id
, unnest(array[1,2,3]) c_id
, unnest(array[c1_p_id,c2_p_id,c3_p_id]) p_id
from objx_objy;select distinct p_id from objx_objy_prop where x_id=10 order by p_id;
/*
|p_id|
|---:|
| 90|
| 91|
| 97|
| 98|
*/Context
StackExchange Database Administrators Q#63389, answer score: 7
Revisions (0)
No revisions yet.