HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Pairwise intersections

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
pairwiseintersectionsstackoverflow

Problem

I have a table with two columns, let's say FirstName and LastName.
I need to get another table, which for every pair of FirstName's from the first one contains a count of the common LastName's.

Is this even feasible to do in SQL?

There are much more unique LastName's than FirstName's, if this affects efficiency of the query.

A toy example, input:

FirstName, LastName
John, Smith
John, Doe
Jane, Doe


Output:

FirstName1, FirstName2, CommonLastNames
John, John, 2
John, Jane, 1
Jane, Jane, 1
Jane, John, 1


As this relation is reflexive and symmetrical, it's OK if the result is just one of the triangles (e.g, the one above the diagonal).

Solution

I'm going to use MS SQL Server to do this since I have a copy to hand. I believe most any of the majors would do it similarly.

First a sample table, with data. I use a table variable but it's the same for any flavour of table.

declare @t table (FirstName char(10), LastName char(10));

insert @t(FirstName,LastName)
values ('John','Smith'),('John','Doe'),('Jane','Doe');


You can get all pairs by doing a self-join:

select
    a.FirstName, a.LastName, b.FirstName, b.LastName
from @t as a
cross apply @t as b;


Using CROSS APPLY avoids having to jump through hoops to find a join condition for an ON clause.

Next you need something to count. This is where the CASE statement comes in. The case returns an integer value per pair of first names, which is what gets counted. (If I'm reading your question correctly you want where the LastNames match so that's the comparison I have. Hopefully it's obvious how to modify this if I'm wrong.)

select
    ...
    case
        when a.LastName = b.LastName then 1
        else 0
    end
...etc.


Add in a SUM() and GROUP BY and you get your answer:

select
    a.FirstName,
    b.FirstName,
    sum(
    case
        when a.LastName = b.LastName then 1
        else 0
    end
    ) as CommonLastNames
from @t as a
cross apply @t as b
group by a.FirstName, b.FirstName;

Code Snippets

declare @t table (FirstName char(10), LastName char(10));

insert @t(FirstName,LastName)
values ('John','Smith'),('John','Doe'),('Jane','Doe');
select
    a.FirstName, a.LastName, b.FirstName, b.LastName
from @t as a
cross apply @t as b;
select
    ...
    case
        when a.LastName = b.LastName then 1
        else 0
    end
...etc.
select
    a.FirstName,
    b.FirstName,
    sum(
    case
        when a.LastName = b.LastName then 1
        else 0
    end
    ) as CommonLastNames
from @t as a
cross apply @t as b
group by a.FirstName, b.FirstName;

Context

StackExchange Database Administrators Q#103961, answer score: 5

Revisions (0)

No revisions yet.