snippetsqlMinor

SQL: How to look within each "cluster" value in one column and return those that ONLY have one specific value in another column

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

thoseeachcolumnsqlreturnspecificlookvaluewithinone

Problem

I have a scenario where I am querying "match clusters" for master records from an MDM database. Each "cluster" has a specific number and within that cluster, you'll have a row/record for the "master" and all associated child records.

We operate in different regions of the world and have a column called 'domain' with reference values ranging from '1001' to 1010' that represent a different region/domain. What I've been able to successfully do to this point is say, "I want the master record from all match clusters where the domain '1008' is in the cluster. That looks like as follows (the "business partner type" is strictly calling out 1001 = customer and 1002 = supplier) - this query just grabs any clusters that has '1008', but doesn't exclude other domain values from being in there:

select * from data.vTest 
where [Match Cluster] in (select distinct [Match Cluster] from data.vTest 
where [Domain] = 1008 and [Match Cluster] is not null and [Business Partner Type] = 1001)
and [Business Partner Type] is null;

The above grabs me any match cluster where 1008 exists (just an example - same concept for 1001, 1002, etc., etc.). What I'm now looking for is, grab me all clusters where only one domain value exists. Examples follow:

Then you have the following, which is what I don't want to appear - any match clusters that have more than 1 domain value from those "child records":

What I've tried to this point hasn't worked - the closest I got was the following, which still returns me those "masters" with child records containing more than one domain value:

select * from data.vTest 
where [Match Cluster] in (select [Match Cluster] from data.vTest
where [Domain] not in (1001,1002,1003,1004,1005,1006,1007,1009,1010) and [Match Cluster] is not null and [Business Partner Type] = 1001)
and [Business Partner Type] is null;

Any ideas on the best way to write this query? Hopefully the pics help in stepping back to the big picture of what I'm trying to accomplish. No

Solution

What I'm now looking for is, grab me all clusters where only one domain value exists.

Another way to express what you want is:

Select all match clusters

From the table or view

Where the match cluster attribute isn't null

Group rows with the same match cluster

Restrict to groups having exactly one distinct domain value (ignoring nulls)

Translating that to SQL:

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    COUNT_BIG(DISTINCT T.Domain) = 1;

That will give you the match clusters you are interested in.

You can return rows with those match cluster values with a self-join or IN clause, as you have been doing already.

There are several other ways to express the same query specification using [NOT] EXISTS or windowing functions.

You could also exploit the fact that one distinct item per group means the minimum and maximum must be the same:

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    MIN(T.Domain) = MAX(T.Domain);

I am trying to go that last layer down to say "grab me all the match clusters where there's only 1 domain value AND that domain value is a specific value such as '1008'.

To do this, add another predicate to the HAVING clause:

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    COUNT_BIG(DISTINCT T.Domain) = 1
    AND MIN(T.Domain) = 1008;

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    MIN(T.Domain) = MAX(T.Domain)
    AND MIN(T.Domain) = 1008;

You can use any aggregate there, since there will only be one matching value. I chose MIN. An aggregate is needed because you're filtering after grouping.

You could equally well write:

HAVING 
    MIN(T.Domain) = 1008
    AND MAX(T.Domain) = 1008;

Code Snippets

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    COUNT_BIG(DISTINCT T.Domain) = 1;

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    MIN(T.Domain) = MAX(T.Domain);

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    COUNT_BIG(DISTINCT T.Domain) = 1
    AND MIN(T.Domain) = 1008;

SELECT 
    T.[Match Cluster]
FROM
    data.vTest AS T
WHERE 
    T.[Match Cluster] IS NOT NULL
GROUP BY 
    T.[Match Cluster]
HAVING 
    MIN(T.Domain) = MAX(T.Domain)
    AND MIN(T.Domain) = 1008;

HAVING 
    MIN(T.Domain) = 1008
    AND MAX(T.Domain) = 1008;

Context

StackExchange Database Administrators Q#333665, answer score: 3

Revisions (0)

No revisions yet.