patternMinor
Best way to model the relationship of unique pairs
Viewed 0 times
pairsuniquethebestwaymodelrelationship
Problem
I have two tables; one for storing
dbfiddle example
Assume:
In order to ensure we don't
Is there a better or more efficient way to store/model this data than the example?
EDIT: There are multiple DBMS being considered for the greater application. They include PostgreSQL, MariaDB, and MySQL. PostgreSQL is the current preference.
thing and one for storing the relationship between two thing objects.dbfiddle example
Assume:
AB == BA. Storing both would be redundant
A != B. The relationship of onethingto itself is not useful
- Calculating the relationship between
ABis expensive but idempotent
CREATE TABLE thing (
id INT PRIMARY KEY
);
CREATE TABLE relationships (
thing_one INT REFERENCES thing(id),
thing_two INT REFERENCES thing(id),
relationship INT NOT NULL,
PRIMARY KEY (thing_one, thing_two),
CHECK (thing_one != thing_two)
);In order to ensure we don't
INSERT AB and BA:CREATE UNIQUE INDEX unique_pair_ix
ON relationships (
least(thing_one, thing_two),
greatest(thing_one, thing_two)
);Is there a better or more efficient way to store/model this data than the example?
EDIT: There are multiple DBMS being considered for the greater application. They include PostgreSQL, MariaDB, and MySQL. PostgreSQL is the current preference.
Solution
In SQL Server 2022 you can now use a
If you are not using a SQL Server instance older than 2022 there may be some other alternatives, I am not totally sure I have the best answer. I was not able to come up with anything that utilized some kind of
I have however done some performance testing with a few different methods. What data I captured is below:
This data would suggest one of 3 options:
GREATEST() and LEAST() function. Brent Ozar talks about it in this Blog Post of his.If you are not using a SQL Server instance older than 2022 there may be some other alternatives, I am not totally sure I have the best answer. I was not able to come up with anything that utilized some kind of
hash or other type of comparison mechanism. I am also running in SQL Server so I don't have the ability to use a least() and greatest() function. See this DBA Stack Exchange Question.I have however done some performance testing with a few different methods. What data I captured is below:
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
| | | Instead of Trigger | Instead of Trigger | | After Trigger | Sum and Absolute Difference |
| Event | Baseline | Case Statements | Not Exists | Indexed View | Only Completed | Computed Persisted Columns |
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
| Execution Time | 07:06.510 | 13:46.490 | 08:47.594 | 22:18.911 | 30:38.267 | 11:24:38 |
| Query Profile Statistics | | | | | | |
| Number of INSERT, DELETE and UPDATE statements | 125250 | 249500 | 499000 | 250000 | | 249999 |
| Rows affected by INSERT, DELETE, or UPDATE statements | 124750 | 249500 | 374250 | 124750 | | 124750 |
| Number of SELECT statements | 0 | 0 | 0 | 0 | | 0 |
| Rows returned by SELECT statements | 0 | 0 | 0 | 0 | | 0 |
| Number of transactions | 125250 | 249500 | 499000 | 250000 | | 249999 |
| Network Statistics | | | | | | |
| Number of server roundtrips | 1 | 250000 | 250000 | 250000 | | 250000 |
| TDS packets sent from client | 6075 | 250000 | 250000 | 250000 | | 250000 |
| TDS packets received from server | 462 | 250000 | 250000 | 250000 | | 250000 |
| Bytes sent from client | 24882190 | 62068000 | 62568000 | 59568000 | | 61567990 |
| Bytes received from server | 1888946 | 76910970 | 8782500 | 67527710 | | 69783720 |
| Time Statistics | | | | | | |
| Client processing time | 420901 | 269564 | 18202 | 240341 | | 238190 |
| Total execution time | 424682 | 811028 | 512726 | 1325281 | | 665491 |
| Wait time on server replies | 3781 | 541464 | 494524 | 1084940 | | 427301 |
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+This data would suggest one of 3 options:
- Assuming you are ok with the use of an
IDENTITYcolumn as thePrimary Key, then the best performing method appears to be theINSTEAD OF Trigger - NOT EXISTS.
- Assuming you are not ok with the use of an
IDENTITYcolumn as thePrimary Key, and you are ok with the existence of 2 otherPersistent Computed Columnsthen the best performing method appears to be thePersistent Computed Columns.
- Assuming you are not ok wi
Code Snippets
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
| | | Instead of Trigger | Instead of Trigger | | After Trigger | Sum and Absolute Difference |
| Event | Baseline | Case Statements | Not Exists | Indexed View | Only Completed | Computed Persisted Columns |
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
| Execution Time | 07:06.510 | 13:46.490 | 08:47.594 | 22:18.911 | 30:38.267 | 11:24:38 |
| Query Profile Statistics | | | | | | |
| Number of INSERT, DELETE and UPDATE statements | 125250 | 249500 | 499000 | 250000 | | 249999 |
| Rows affected by INSERT, DELETE, or UPDATE statements | 124750 | 249500 | 374250 | 124750 | | 124750 |
| Number of SELECT statements | 0 | 0 | 0 | 0 | | 0 |
| Rows returned by SELECT statements | 0 | 0 | 0 | 0 | | 0 |
| Number of transactions | 125250 | 249500 | 499000 | 250000 | | 249999 |
| Network Statistics | | | | | | |
| Number of server roundtrips | 1 | 250000 | 250000 | 250000 | | 250000 |
| TDS packets sent from client | 6075 | 250000 | 250000 | 250000 | | 250000 |
| TDS packets received from server | 462 | 250000 | 250000 | 250000 | | 250000 |
| Bytes sent from client | 24882190 | 62068000 | 62568000 | 59568000 | | 61567990 |
| Bytes received from server | 1888946 | 76910970 | 8782500 | 67527710 | | 69783720 |
| Time SINSERT INTO relationship (thing_one, thing_two, relationship) VALUES(1, 1, 1 * 1)
GO
INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(1, 2, 1 * 2)
GO
INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(1, 3, 1 * 3)
GO
...
INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(500, 498, 500 * 498)
GO
INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(500, 499, 500 * 499)
GO
INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(500, 500, 500 * 500)
GOCREATE TRIGGER InsteadOfInsertTrigger on [dbo].[relationship]
INSTEAD OF INSERT
AS
INSERT INTO [dbo].[relationship]
(
thing_one,
thing_two,
relationship
)
SELECT
CASE
WHEN I.thing_one <= I.thing_two
THEN I.thing_one
ELSE
I.thing_two
END
,CASE
WHEN I.thing_one <= I.thing_two
THEN I.thing_two
ELSE
I.thing_one
END
,I.relationship
FROM inserted I
GOCREATE TRIGGER InsteadOfInsertTrigger on [dbo].[relationship]
INSTEAD OF INSERT
AS
INSERT INTO [dbo].[relationship]
(
thing_one,
thing_two,
relationship
)
SELECT
I.thing_one
,I.thing_two
,I.relationship
FROM inserted I
WHERE NOT EXISTS
(
SELECT 1
FROM [dbo].[relationship] t
WHERE (t.thing_one = i.thing_two AND t.thing_two = i.thing_one)
--This one shouldn't be needed because of the Primary Key
--AND (t.thing_one = i.thing_one AND t.thing_two = i.thing_two)
)
GOCREATE VIEW dbo.relationship_indexedview_view
WITH SCHEMABINDING
AS
SELECT
CASE
WHEN thing_one <= thing_two
THEN thing_one
ELSE
thing_two
END as thing_one_sorted,
CASE
WHEN thing_one <= thing_two
THEN thing_two
ELSE
thing_one
END as thing_two_sorted
FROM [dbo].[relationship_indexedview]
GO
CREATE UNIQUE CLUSTERED INDEX relationship_indexedview_view_unique
ON dbo.relationship_indexedview_view (thing_one_sorted, thing_two_sorted)
GOContext
StackExchange Database Administrators Q#261309, answer score: 2
Revisions (0)
No revisions yet.