HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

T-SQL brain teaser: find non-overlapping sets

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
nonsqlbrainoverlappingfindsetsteaser

Problem

I would like to ask for help coming up with a query, which can identify groups on non-overlapping records. Here is a sample scenario (admittedly contrived). Let's say I have employees who are assigned to work on various projects. While an employee can be assigned to multiple projects, he/she can only work on one project at any given time (don't you wish we all had this luxury :). I need to find out which projects can be scheduled to be worked on in parallel because they do not share any employees. Here is some code to setup sample tables and data.

--1. Create #Projects table
IF OBJECT_ID('tempdb..#Projects') IS NOT NULL 
    DROP TABLE #Projects

CREATE TABLE #Projects (
    ProjectId INT,
    ProjectName VARCHAR(16)
    )

INSERT INTO #Projects ( ProjectId, ProjectName )
VALUES  ( 1, 'Project 1'), ( 2, 'Project 2'), ( 3, 'Project 3'), ( 4, 'Project 4'), ( 5, 'Project 5'),
        ( 6, 'Project 6'), ( 7, 'Project 7')

--2. Create #Employees table
IF OBJECT_ID('tempdb..#Employees') IS NOT NULL 
    DROP TABLE #Employees

CREATE TABLE #Employees (
    EmployeeId INT,
    EmployeeName VARCHAR(16)
    )

INSERT INTO #Employees (EmployeeId, EmployeeName)
VALUES (101, 'Employee 101'), (102, 'Employee 102'), (103, 'Employee 103'), (104, 'Employee 104'), 
        (105, 'Employee 105'), (106, 'Employee 106'), (107, 'Employee 107')

--3. Create #Employee_Projects table
IF OBJECT_ID('tempdb..#Employee_Projects') IS NOT NULL 
    DROP TABLE #Employee_Projects

CREATE TABLE #Employee_Projects (
    ProjectId INT,
    EmployeeId INT
    )

INSERT INTO #Employee_Projects (ProjectId, EmployeeId)
VALUES (1, 101), (1, 105), (1, 107), (2, 102), (2, 103), (2, 107), (3, 104), (3, 105),  (3, 106), (4, 100), (4, 101), (4, 102), (5, 103), (5, 104), (6, 105), (6, 106), (7, 106), (7, 107), (8, 102), (8, 104), (8, 106)


And here is the query, which will show you the employees and projects we have created:

```
SELECT p.ProjectId, p.ProjectName, e.EmployeeId, e.EmployeeName
FROM

Solution

This is a sort of bin-packing problem, so you'll most likely need to choose from one of the available approximate solutions, rather than attempting an exhaustive search.

One very straightforward idea is to pack projects into groups choosing the project with the largest number of employees each time. As soon as the current project no longer fits into the current group, we start a new group.

This is relatively easy to express iteratively, though it may not perform well, depending on the characteristics of the real data set. It might be quite challenging to express the same logic in a single T-SQL statement. Anyway, as an example, here is an iterative T-SQL solution using the simple greedy algorithm outlined above:

SET NOCOUNT ON;
CREATE TABLE #Groups 
(
    GroupID integer NOT NULL, 
    ProjectID integer NOT NULL, 

    CONSTRAINT PK_Groups
    PRIMARY KEY (GroupID, ProjectID)
);

-- First group
DECLARE @GroupID integer = 1;

-- Choose the largest project to start with
-- and assign to group 1
INSERT #Groups
SELECT TOP (1) 
    @GroupID AS GroupID,
    P.ProjectId
FROM #Projects AS P
GROUP BY 
    P.ProjectId
ORDER BY 
    COUNT_BIG(*) DESC,
    NEWID(); -- Random order tie-breaker

WHILE 1 = 1
BEGIN
    -- Add the next largest project to the current group
    INSERT #Groups
    SELECT TOP (1) 
        @GroupID AS GroupID,
        P.ProjectId
    FROM #Projects AS P
    WHERE NOT EXISTS
    (
        -- Project not already allocated to a group
        SELECT * 
        FROM #Groups AS G
        WHERE G.ProjectID = P.ProjectId
    )
    AND NOT EXISTS
    (
        -- Employees for this project
        SELECT EP2.EmployeeId
        FROM #Employee_Projects AS EP2
        WHERE EP2.ProjectId = P.ProjectId

        INTERSECT

        -- Employees already in the current group of projects
        SELECT EP.EmployeeId
        FROM #Groups AS G2
        JOIN #Employee_Projects AS EP
            ON EP.ProjectId = G2.ProjectID
        WHERE G2.GroupID = @GroupID
    )
    GROUP BY P.ProjectId
    ORDER BY 
        COUNT_BIG(*) DESC,
        NEWID(); -- Random order tie-breaker

    -- No project found for this group
    IF @@ROWCOUNT = 0
    BEGIN
        -- Finished when no more projects left to process
        IF NOT EXISTS
        (
            SELECT P.ProjectId FROM #Projects AS P
            EXCEPT
            SELECT G.ProjectID FROM #Groups AS G
        )
        BEGIN
            BREAK;
        END;

        -- Next group number
        SET @GroupID += 1;
    END;
END;

SELECT 
    G.GroupID,
    G.ProjectID
FROM #Groups AS G;


This code is non-deterministic because any ties encountered (when sorting by the number of employees per project descending) are broken randomly using NEWID. Nevertheless, a typical output is:

╔═════════╦═══════════╗
║ GroupID ║ ProjectID ║
╠═════════╬═══════════╣
║       1 ║         3 ║
║       1 ║         4 ║
║       2 ║         2 ║
║       2 ║         6 ║
║       3 ║         5 ║
║       3 ║         7 ║
║       4 ║         1 ║
╚═════════╩═══════════╝

Code Snippets

SET NOCOUNT ON;
CREATE TABLE #Groups 
(
    GroupID integer NOT NULL, 
    ProjectID integer NOT NULL, 

    CONSTRAINT PK_Groups
    PRIMARY KEY (GroupID, ProjectID)
);

-- First group
DECLARE @GroupID integer = 1;

-- Choose the largest project to start with
-- and assign to group 1
INSERT #Groups
SELECT TOP (1) 
    @GroupID AS GroupID,
    P.ProjectId
FROM #Projects AS P
GROUP BY 
    P.ProjectId
ORDER BY 
    COUNT_BIG(*) DESC,
    NEWID(); -- Random order tie-breaker

WHILE 1 = 1
BEGIN
    -- Add the next largest project to the current group
    INSERT #Groups
    SELECT TOP (1) 
        @GroupID AS GroupID,
        P.ProjectId
    FROM #Projects AS P
    WHERE NOT EXISTS
    (
        -- Project not already allocated to a group
        SELECT * 
        FROM #Groups AS G
        WHERE G.ProjectID = P.ProjectId
    )
    AND NOT EXISTS
    (
        -- Employees for this project
        SELECT EP2.EmployeeId
        FROM #Employee_Projects AS EP2
        WHERE EP2.ProjectId = P.ProjectId

        INTERSECT

        -- Employees already in the current group of projects
        SELECT EP.EmployeeId
        FROM #Groups AS G2
        JOIN #Employee_Projects AS EP
            ON EP.ProjectId = G2.ProjectID
        WHERE G2.GroupID = @GroupID
    )
    GROUP BY P.ProjectId
    ORDER BY 
        COUNT_BIG(*) DESC,
        NEWID(); -- Random order tie-breaker

    -- No project found for this group
    IF @@ROWCOUNT = 0
    BEGIN
        -- Finished when no more projects left to process
        IF NOT EXISTS
        (
            SELECT P.ProjectId FROM #Projects AS P
            EXCEPT
            SELECT G.ProjectID FROM #Groups AS G
        )
        BEGIN
            BREAK;
        END;

        -- Next group number
        SET @GroupID += 1;
    END;
END;

SELECT 
    G.GroupID,
    G.ProjectID
FROM #Groups AS G;
╔═════════╦═══════════╗
║ GroupID ║ ProjectID ║
╠═════════╬═══════════╣
║       1 ║         3 ║
║       1 ║         4 ║
║       2 ║         2 ║
║       2 ║         6 ║
║       3 ║         5 ║
║       3 ║         7 ║
║       4 ║         1 ║
╚═════════╩═══════════╝

Context

StackExchange Database Administrators Q#86740, answer score: 6

Revisions (0)

No revisions yet.