HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Best way to model the relationship of unique pairs

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
pairsuniquethebestwaymodelrelationship

Problem

I have two tables; one for storing thing and one for storing the relationship between two thing objects.

dbfiddle example

Assume:

  • AB == BA. Storing both would be redundant



  • A != B. The relationship of one thing to itself is not useful



  • Calculating the relationship between AB is expensive but idempotent



CREATE TABLE thing (
    id INT PRIMARY KEY
);
CREATE TABLE relationships (
    thing_one INT REFERENCES thing(id),
    thing_two INT REFERENCES thing(id),
    relationship INT NOT NULL,
    PRIMARY KEY (thing_one, thing_two),
    CHECK (thing_one != thing_two)
);


In order to ensure we don't INSERT AB and BA:

CREATE UNIQUE INDEX unique_pair_ix
    ON relationships (
        least(thing_one, thing_two),
        greatest(thing_one, thing_two)
    );


Is there a better or more efficient way to store/model this data than the example?

EDIT: There are multiple DBMS being considered for the greater application. They include PostgreSQL, MariaDB, and MySQL. PostgreSQL is the current preference.

Solution

In SQL Server 2022 you can now use a GREATEST() and LEAST() function. Brent Ozar talks about it in this Blog Post of his.

If you are not using a SQL Server instance older than 2022 there may be some other alternatives, I am not totally sure I have the best answer. I was not able to come up with anything that utilized some kind of hash or other type of comparison mechanism. I am also running in SQL Server so I don't have the ability to use a least() and greatest() function. See this DBA Stack Exchange Question.

I have however done some performance testing with a few different methods. What data I captured is below:

+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
|                                                         |           | Instead of Trigger | Instead of Trigger |              | After Trigger  | Sum and Absolute Difference |
|                          Event                          | Baseline  |  Case Statements   |     Not Exists     | Indexed View | Only Completed | Computed Persisted Columns  |
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
| Execution Time                                          | 07:06.510 | 13:46.490          | 08:47.594          | 22:18.911    | 30:38.267      | 11:24:38                    |
| Query Profile Statistics                                |           |                    |                    |              |                |                             |
|   Number of INSERT, DELETE and UPDATE statements        | 125250    | 249500             | 499000             | 250000       |                | 249999                      |
|   Rows affected by INSERT, DELETE, or UPDATE statements | 124750    | 249500             | 374250             | 124750       |                | 124750                      |
|   Number of SELECT statements                           | 0         | 0                  | 0                  | 0            |                | 0                           |
|   Rows returned by SELECT statements                    | 0         | 0                  | 0                  | 0            |                | 0                           |
|   Number of transactions                                | 125250    | 249500             | 499000             | 250000       |                | 249999                      |
| Network Statistics                                      |           |                    |                    |              |                |                             |
|   Number of server roundtrips                           | 1         | 250000             | 250000             | 250000       |                | 250000                      |
|   TDS packets sent from client                          | 6075      | 250000             | 250000             | 250000       |                | 250000                      |
|   TDS packets received from server                      | 462       | 250000             | 250000             | 250000       |                | 250000                      |
|   Bytes sent from client                                | 24882190  | 62068000           | 62568000           | 59568000     |                | 61567990                    |
|   Bytes received from server                            | 1888946   | 76910970           | 8782500            | 67527710     |                | 69783720                    |
| Time Statistics                                         |           |                    |                    |              |                |                             |
|   Client processing time                                | 420901    | 269564             | 18202              | 240341       |                | 238190                      |
|   Total execution time                                  | 424682    | 811028             | 512726             | 1325281      |                | 665491                      |
|   Wait time on server replies                           | 3781      | 541464             | 494524             | 1084940      |                | 427301                      |
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+


This data would suggest one of 3 options:

  • Assuming you are ok with the use of an IDENTITY column as the Primary Key, then the best performing method appears to be the INSTEAD OF Trigger - NOT EXISTS.



  • Assuming you are not ok with the use of an IDENTITY column as the Primary Key, and you are ok with the existence of 2 other Persistent Computed Columns then the best performing method appears to be the Persistent Computed Columns.



  • Assuming you are not ok wi

Code Snippets

+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
|                                                         |           | Instead of Trigger | Instead of Trigger |              | After Trigger  | Sum and Absolute Difference |
|                          Event                          | Baseline  |  Case Statements   |     Not Exists     | Indexed View | Only Completed | Computed Persisted Columns  |
+---------------------------------------------------------+-----------+--------------------+--------------------+--------------+----------------+-----------------------------+
| Execution Time                                          | 07:06.510 | 13:46.490          | 08:47.594          | 22:18.911    | 30:38.267      | 11:24:38                    |
| Query Profile Statistics                                |           |                    |                    |              |                |                             |
|   Number of INSERT, DELETE and UPDATE statements        | 125250    | 249500             | 499000             | 250000       |                | 249999                      |
|   Rows affected by INSERT, DELETE, or UPDATE statements | 124750    | 249500             | 374250             | 124750       |                | 124750                      |
|   Number of SELECT statements                           | 0         | 0                  | 0                  | 0            |                | 0                           |
|   Rows returned by SELECT statements                    | 0         | 0                  | 0                  | 0            |                | 0                           |
|   Number of transactions                                | 125250    | 249500             | 499000             | 250000       |                | 249999                      |
| Network Statistics                                      |           |                    |                    |              |                |                             |
|   Number of server roundtrips                           | 1         | 250000             | 250000             | 250000       |                | 250000                      |
|   TDS packets sent from client                          | 6075      | 250000             | 250000             | 250000       |                | 250000                      |
|   TDS packets received from server                      | 462       | 250000             | 250000             | 250000       |                | 250000                      |
|   Bytes sent from client                                | 24882190  | 62068000           | 62568000           | 59568000     |                | 61567990                    |
|   Bytes received from server                            | 1888946   | 76910970           | 8782500            | 67527710     |                | 69783720                    |
| Time S
INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(1, 1, 1 * 1)
GO

INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(1, 2, 1 * 2)
GO

INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(1, 3, 1 * 3)
GO
...
INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(500, 498, 500 * 498)
GO

INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(500, 499, 500 * 499)
GO

INSERT INTO relationship (thing_one, thing_two, relationship) VALUES(500, 500, 500 * 500)
GO
CREATE TRIGGER InsteadOfInsertTrigger on [dbo].[relationship]
INSTEAD OF INSERT
AS
INSERT INTO [dbo].[relationship]
(
    thing_one,
    thing_two,
    relationship
)
SELECT
CASE
    WHEN I.thing_one <= I.thing_two
        THEN I.thing_one
    ELSE
        I.thing_two
    END
,CASE
    WHEN I.thing_one <= I.thing_two
        THEN I.thing_two
    ELSE
        I.thing_one
    END
,I.relationship
FROM inserted I
GO
CREATE TRIGGER InsteadOfInsertTrigger on [dbo].[relationship]
INSTEAD OF INSERT
AS
INSERT INTO [dbo].[relationship]
(
    thing_one,
    thing_two,
    relationship
)
SELECT
I.thing_one
,I.thing_two
,I.relationship
FROM inserted I
WHERE NOT EXISTS
(
    SELECT 1
    FROM [dbo].[relationship] t
    WHERE (t.thing_one = i.thing_two AND t.thing_two = i.thing_one)
    --This one shouldn't be needed because of the Primary Key
    --AND (t.thing_one = i.thing_one AND t.thing_two = i.thing_two)
)
GO
CREATE VIEW dbo.relationship_indexedview_view
WITH SCHEMABINDING
AS
    SELECT 
    CASE
        WHEN thing_one <= thing_two
            THEN thing_one
        ELSE
            thing_two
        END as thing_one_sorted,
    CASE
        WHEN thing_one <= thing_two
            THEN thing_two
        ELSE
            thing_one
        END as thing_two_sorted
    FROM [dbo].[relationship_indexedview]
GO

CREATE UNIQUE CLUSTERED INDEX relationship_indexedview_view_unique
    ON dbo.relationship_indexedview_view (thing_one_sorted, thing_two_sorted)
GO

Context

StackExchange Database Administrators Q#261309, answer score: 2

Revisions (0)

No revisions yet.