snippetsqlMinor
SQL: How to look within each "cluster" value in one column and return those that ONLY have one specific value in another column
Viewed 0 times
thoseeachcolumnsqlreturnspecificlookvaluewithinone
Problem
I have a scenario where I am querying "match clusters" for master records from an MDM database. Each "cluster" has a specific number and within that cluster, you'll have a row/record for the "master" and all associated child records.
We operate in different regions of the world and have a column called 'domain' with reference values ranging from '1001' to 1010' that represent a different region/domain. What I've been able to successfully do to this point is say, "I want the master record from all match clusters where the domain '1008' is in the cluster. That looks like as follows (the "business partner type" is strictly calling out 1001 = customer and 1002 = supplier) - this query just grabs any clusters that has '1008', but doesn't exclude other domain values from being in there:
The above grabs me any match cluster where 1008 exists (just an example - same concept for 1001, 1002, etc., etc.). What I'm now looking for is, grab me all clusters where only one domain value exists. Examples follow:
Then you have the following, which is what I don't want to appear - any match clusters that have more than 1 domain value from those "child records":
What I've tried to this point hasn't worked - the closest I got was the following, which still returns me those "masters" with child records containing more than one domain value:
Any ideas on the best way to write this query? Hopefully the pics help in stepping back to the big picture of what I'm trying to accomplish. No
We operate in different regions of the world and have a column called 'domain' with reference values ranging from '1001' to 1010' that represent a different region/domain. What I've been able to successfully do to this point is say, "I want the master record from all match clusters where the domain '1008' is in the cluster. That looks like as follows (the "business partner type" is strictly calling out 1001 = customer and 1002 = supplier) - this query just grabs any clusters that has '1008', but doesn't exclude other domain values from being in there:
select * from data.vTest
where [Match Cluster] in (select distinct [Match Cluster] from data.vTest
where [Domain] = 1008 and [Match Cluster] is not null and [Business Partner Type] = 1001)
and [Business Partner Type] is null;The above grabs me any match cluster where 1008 exists (just an example - same concept for 1001, 1002, etc., etc.). What I'm now looking for is, grab me all clusters where only one domain value exists. Examples follow:
Then you have the following, which is what I don't want to appear - any match clusters that have more than 1 domain value from those "child records":
What I've tried to this point hasn't worked - the closest I got was the following, which still returns me those "masters" with child records containing more than one domain value:
select * from data.vTest
where [Match Cluster] in (select [Match Cluster] from data.vTest
where [Domain] not in (1001,1002,1003,1004,1005,1006,1007,1009,1010) and [Match Cluster] is not null and [Business Partner Type] = 1001)
and [Business Partner Type] is null;Any ideas on the best way to write this query? Hopefully the pics help in stepping back to the big picture of what I'm trying to accomplish. No
Solution
What I'm now looking for is, grab me all clusters where only one domain value exists.
Another way to express what you want is:
Translating that to SQL:
That will give you the match clusters you are interested in.
You can return rows with those match cluster values with a self-join or
There are several other ways to express the same query specification using
You could also exploit the fact that one distinct item per group means the minimum and maximum must be the same:
I am trying to go that last layer down to say "grab me all the match clusters where there's only 1 domain value AND that domain value is a specific value such as '1008'.
To do this, add another predicate to the
or
You can use any aggregate there, since there will only be one matching value. I chose
You could equally well write:
Another way to express what you want is:
- Select all match clusters
- From the table or view
- Where the match cluster attribute isn't null
- Group rows with the same match cluster
- Restrict to groups having exactly one distinct domain value (ignoring nulls)
Translating that to SQL:
SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
COUNT_BIG(DISTINCT T.Domain) = 1;That will give you the match clusters you are interested in.
You can return rows with those match cluster values with a self-join or
IN clause, as you have been doing already.There are several other ways to express the same query specification using
[NOT] EXISTS or windowing functions.You could also exploit the fact that one distinct item per group means the minimum and maximum must be the same:
SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
MIN(T.Domain) = MAX(T.Domain);I am trying to go that last layer down to say "grab me all the match clusters where there's only 1 domain value AND that domain value is a specific value such as '1008'.
To do this, add another predicate to the
HAVING clause:SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
COUNT_BIG(DISTINCT T.Domain) = 1
AND MIN(T.Domain) = 1008;or
SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
MIN(T.Domain) = MAX(T.Domain)
AND MIN(T.Domain) = 1008;You can use any aggregate there, since there will only be one matching value. I chose
MIN. An aggregate is needed because you're filtering after grouping.You could equally well write:
HAVING
MIN(T.Domain) = 1008
AND MAX(T.Domain) = 1008;Code Snippets
SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
COUNT_BIG(DISTINCT T.Domain) = 1;SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
MIN(T.Domain) = MAX(T.Domain);SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
COUNT_BIG(DISTINCT T.Domain) = 1
AND MIN(T.Domain) = 1008;SELECT
T.[Match Cluster]
FROM
data.vTest AS T
WHERE
T.[Match Cluster] IS NOT NULL
GROUP BY
T.[Match Cluster]
HAVING
MIN(T.Domain) = MAX(T.Domain)
AND MIN(T.Domain) = 1008;HAVING
MIN(T.Domain) = 1008
AND MAX(T.Domain) = 1008;Context
StackExchange Database Administrators Q#333665, answer score: 3
Revisions (0)
No revisions yet.