patternsqlMinor
Only Select Supersets
Viewed 0 times
selectsupersetsonly
Problem
I have two tables (along with a nonclustered index) that can be created with the below commands:
While technically my tables are slightly different and I am joining on a few other tables, this is a suitable proxy for my situation.
The following is an example to illustrate what I am asking:
` GroupTable RecordTable
GroupKey RecordCount GroupScore RecordKey GroupKey
------------------------------------ ---------------------
1 3 6.2 A 1
29 2 9.8 A 29
95 3 6.2 A 95
192 4 7.1 A 192
B 1
B 29
B 95
B 192
C 1
C
CREATE TABLE GroupTable
(
GroupKey int NOT NULL PRIMARY KEY,
RecordCount int NOT NULL,
GroupScore float NOT NULL
);
CREATE TABLE RecordTable
(
RecordKey varchar(10) NOT NULL,
GroupKey int NOT NULL,
PRIMARY KEY(RecordKey, GroupKey)
);
CREATE UNIQUE INDEX ixGroupRecord ON RecordTable(GroupKey, RecordKey);While technically my tables are slightly different and I am joining on a few other tables, this is a suitable proxy for my situation.
- I would like to select all
GroupKeysthat are not subsets of anotherGroupKey.
- For a given superset, I would like to grab the maximum
GroupScoreof all its subsets (including itself).
- In the instance when a
GroupKeycontains the same exactRecordKeysas anotherGroupKey(s), then only one of thoseGroupKeysis grabbed (it doesn't matter which one).
- Any
GroupKeythat has the same exactRecordKeysas anotherGroupKey(s)will also have the sameGroupScore.
- Non-related
GroupKeyscan have the same score as well.
The following is an example to illustrate what I am asking:
` GroupTable RecordTable
GroupKey RecordCount GroupScore RecordKey GroupKey
------------------------------------ ---------------------
1 3 6.2 A 1
29 2 9.8 A 29
95 3 6.2 A 95
192 4 7.1 A 192
B 1
B 29
B 95
B 192
C 1
C
Solution
I would like the output to be the following:
Using correlated subqueries is one way to get the output you want.
I'm returning the Group with the lowest GroupKey when there is a match, but that is arbitrary as you say it doesn't matter.
test data:
query:
The subquery in the SELECT gets the highest
In the WHERE clause, there are two cases to consider for filtering. In either case, the 'g1' set is only filtered if all the 'g1'
The two cases are: ① the 'g1' set has fewer
output:
dbfiddle here
GroupKey RecordCount GroupScore
-------------------------------------
1 3 9.8
192 4 7.1Using correlated subqueries is one way to get the output you want.
- In the instance when a GroupKey contains the same exact RecordKeys as another GroupKey(s), then only one of those GroupKeys is grabbed (it doesn't matter which one).
I'm returning the Group with the lowest GroupKey when there is a match, but that is arbitrary as you say it doesn't matter.
test data:
INSERT INTO RecordTable(RecordKey,GroupKey)
VALUES ('A',1)
, ('A',29)
, ('A',95)
, ('A',192)
, ('B',1)
, ('B',29)
, ('B',95)
, ('B',192)
, ('C',1)
, ('C',95)
, ('D',192)
, ('E',192);
INSERT INTO GroupTable(GroupKey,RecordCount,GroupScore)
VALUES (1,3,6.2) -- ABC
, (29,2,9.8) -- AB
, (95,3,6.2) -- ABC
, (192,4,7.1); -- ABDE
GOquery:
SELECT GroupKey
, RecordCount
, GroupScore = ( SELECT max(GroupScore)
FROM GroupTable g2
WHERE ( SELECT count(*)
FROM ( SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g1.GroupKey
UNION
SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g2.GroupKey ) z
)=g1.RecordCount )
FROM GroupTable g1
WHERE NOT EXISTS ( SELECT *
FROM GroupTable g3
WHERE ( SELECT count(*)
FROM ( SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g1.GroupKey
UNION
SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g3.GroupKey ) z )=g3.RecordCount
AND ( g3.RecordCount>g1.RecordCount
OR ( g3.RecordCount=g1.RecordCount
AND g3.GroupKey<g1.GroupKey ) ) );
GOThe subquery in the SELECT gets the highest
GroupScore from only those groups that are subsets of this ('g1') group. It achieves this by counting the UNION of the RecordKey's for the 'g1' set and each 'g2' set. If the UNION is larger than the 'g1' set, there must be at least one RecordKey in the 'g2' set without a corresponding RecordKey for the 'g1' set, so the 'g2' set is not a subset and should not be considered for this row.In the WHERE clause, there are two cases to consider for filtering. In either case, the 'g1' set is only filtered if all the 'g1'
RecordKeys are also present in the 'g3' set; and this check is achieved by counting the union again (as per the SELECT clause).The two cases are: ① the 'g1' set has fewer
RecordKeys (g3.RecordCount>g1.RecordCount; in which case we filter), and ② the 'g1' set is identical to the 'g3' set (g3.RecordCount=g1.RecordCount; in which case we arbitrarily choose the set with the lower GroupKey)output:
/*
|GroupKey|RecordCount|GroupScore|
|-------:|----------:|---------:|
| 1| 3| 9.8|
| 192| 4| 9.8|
*/dbfiddle here
Code Snippets
GroupKey RecordCount GroupScore
-------------------------------------
1 3 9.8
192 4 7.1INSERT INTO RecordTable(RecordKey,GroupKey)
VALUES ('A',1)
, ('A',29)
, ('A',95)
, ('A',192)
, ('B',1)
, ('B',29)
, ('B',95)
, ('B',192)
, ('C',1)
, ('C',95)
, ('D',192)
, ('E',192);
INSERT INTO GroupTable(GroupKey,RecordCount,GroupScore)
VALUES (1,3,6.2) -- ABC
, (29,2,9.8) -- AB
, (95,3,6.2) -- ABC
, (192,4,7.1); -- ABDE
GOSELECT GroupKey
, RecordCount
, GroupScore = ( SELECT max(GroupScore)
FROM GroupTable g2
WHERE ( SELECT count(*)
FROM ( SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g1.GroupKey
UNION
SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g2.GroupKey ) z
)=g1.RecordCount )
FROM GroupTable g1
WHERE NOT EXISTS ( SELECT *
FROM GroupTable g3
WHERE ( SELECT count(*)
FROM ( SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g1.GroupKey
UNION
SELECT RecordKey
FROM RecordTable
WHERE GroupKey=g3.GroupKey ) z )=g3.RecordCount
AND ( g3.RecordCount>g1.RecordCount
OR ( g3.RecordCount=g1.RecordCount
AND g3.GroupKey<g1.GroupKey ) ) );
GO/*
|GroupKey|RecordCount|GroupScore|
|-------:|----------:|---------:|
| 1| 3| 9.8|
| 192| 4| 9.8|
*/Context
StackExchange Database Administrators Q#166062, answer score: 7
Revisions (0)
No revisions yet.