patternsqlMinor
T-SQL brain teaser: find non-overlapping sets
Viewed 0 times
nonsqlbrainoverlappingfindsetsteaser
Problem
I would like to ask for help coming up with a query, which can identify groups on non-overlapping records. Here is a sample scenario (admittedly contrived). Let's say I have employees who are assigned to work on various projects. While an employee can be assigned to multiple projects, he/she can only work on one project at any given time (don't you wish we all had this luxury :). I need to find out which projects can be scheduled to be worked on in parallel because they do not share any employees. Here is some code to setup sample tables and data.
And here is the query, which will show you the employees and projects we have created:
```
SELECT p.ProjectId, p.ProjectName, e.EmployeeId, e.EmployeeName
FROM
--1. Create #Projects table
IF OBJECT_ID('tempdb..#Projects') IS NOT NULL
DROP TABLE #Projects
CREATE TABLE #Projects (
ProjectId INT,
ProjectName VARCHAR(16)
)
INSERT INTO #Projects ( ProjectId, ProjectName )
VALUES ( 1, 'Project 1'), ( 2, 'Project 2'), ( 3, 'Project 3'), ( 4, 'Project 4'), ( 5, 'Project 5'),
( 6, 'Project 6'), ( 7, 'Project 7')
--2. Create #Employees table
IF OBJECT_ID('tempdb..#Employees') IS NOT NULL
DROP TABLE #Employees
CREATE TABLE #Employees (
EmployeeId INT,
EmployeeName VARCHAR(16)
)
INSERT INTO #Employees (EmployeeId, EmployeeName)
VALUES (101, 'Employee 101'), (102, 'Employee 102'), (103, 'Employee 103'), (104, 'Employee 104'),
(105, 'Employee 105'), (106, 'Employee 106'), (107, 'Employee 107')
--3. Create #Employee_Projects table
IF OBJECT_ID('tempdb..#Employee_Projects') IS NOT NULL
DROP TABLE #Employee_Projects
CREATE TABLE #Employee_Projects (
ProjectId INT,
EmployeeId INT
)
INSERT INTO #Employee_Projects (ProjectId, EmployeeId)
VALUES (1, 101), (1, 105), (1, 107), (2, 102), (2, 103), (2, 107), (3, 104), (3, 105), (3, 106), (4, 100), (4, 101), (4, 102), (5, 103), (5, 104), (6, 105), (6, 106), (7, 106), (7, 107), (8, 102), (8, 104), (8, 106)And here is the query, which will show you the employees and projects we have created:
```
SELECT p.ProjectId, p.ProjectName, e.EmployeeId, e.EmployeeName
FROM
Solution
This is a sort of bin-packing problem, so you'll most likely need to choose from one of the available approximate solutions, rather than attempting an exhaustive search.
One very straightforward idea is to pack projects into groups choosing the project with the largest number of employees each time. As soon as the current project no longer fits into the current group, we start a new group.
This is relatively easy to express iteratively, though it may not perform well, depending on the characteristics of the real data set. It might be quite challenging to express the same logic in a single T-SQL statement. Anyway, as an example, here is an iterative T-SQL solution using the simple greedy algorithm outlined above:
This code is non-deterministic because any ties encountered (when sorting by the number of employees per project descending) are broken randomly using
One very straightforward idea is to pack projects into groups choosing the project with the largest number of employees each time. As soon as the current project no longer fits into the current group, we start a new group.
This is relatively easy to express iteratively, though it may not perform well, depending on the characteristics of the real data set. It might be quite challenging to express the same logic in a single T-SQL statement. Anyway, as an example, here is an iterative T-SQL solution using the simple greedy algorithm outlined above:
SET NOCOUNT ON;
CREATE TABLE #Groups
(
GroupID integer NOT NULL,
ProjectID integer NOT NULL,
CONSTRAINT PK_Groups
PRIMARY KEY (GroupID, ProjectID)
);
-- First group
DECLARE @GroupID integer = 1;
-- Choose the largest project to start with
-- and assign to group 1
INSERT #Groups
SELECT TOP (1)
@GroupID AS GroupID,
P.ProjectId
FROM #Projects AS P
GROUP BY
P.ProjectId
ORDER BY
COUNT_BIG(*) DESC,
NEWID(); -- Random order tie-breaker
WHILE 1 = 1
BEGIN
-- Add the next largest project to the current group
INSERT #Groups
SELECT TOP (1)
@GroupID AS GroupID,
P.ProjectId
FROM #Projects AS P
WHERE NOT EXISTS
(
-- Project not already allocated to a group
SELECT *
FROM #Groups AS G
WHERE G.ProjectID = P.ProjectId
)
AND NOT EXISTS
(
-- Employees for this project
SELECT EP2.EmployeeId
FROM #Employee_Projects AS EP2
WHERE EP2.ProjectId = P.ProjectId
INTERSECT
-- Employees already in the current group of projects
SELECT EP.EmployeeId
FROM #Groups AS G2
JOIN #Employee_Projects AS EP
ON EP.ProjectId = G2.ProjectID
WHERE G2.GroupID = @GroupID
)
GROUP BY P.ProjectId
ORDER BY
COUNT_BIG(*) DESC,
NEWID(); -- Random order tie-breaker
-- No project found for this group
IF @@ROWCOUNT = 0
BEGIN
-- Finished when no more projects left to process
IF NOT EXISTS
(
SELECT P.ProjectId FROM #Projects AS P
EXCEPT
SELECT G.ProjectID FROM #Groups AS G
)
BEGIN
BREAK;
END;
-- Next group number
SET @GroupID += 1;
END;
END;
SELECT
G.GroupID,
G.ProjectID
FROM #Groups AS G;This code is non-deterministic because any ties encountered (when sorting by the number of employees per project descending) are broken randomly using
NEWID. Nevertheless, a typical output is:╔═════════╦═══════════╗
║ GroupID ║ ProjectID ║
╠═════════╬═══════════╣
║ 1 ║ 3 ║
║ 1 ║ 4 ║
║ 2 ║ 2 ║
║ 2 ║ 6 ║
║ 3 ║ 5 ║
║ 3 ║ 7 ║
║ 4 ║ 1 ║
╚═════════╩═══════════╝Code Snippets
SET NOCOUNT ON;
CREATE TABLE #Groups
(
GroupID integer NOT NULL,
ProjectID integer NOT NULL,
CONSTRAINT PK_Groups
PRIMARY KEY (GroupID, ProjectID)
);
-- First group
DECLARE @GroupID integer = 1;
-- Choose the largest project to start with
-- and assign to group 1
INSERT #Groups
SELECT TOP (1)
@GroupID AS GroupID,
P.ProjectId
FROM #Projects AS P
GROUP BY
P.ProjectId
ORDER BY
COUNT_BIG(*) DESC,
NEWID(); -- Random order tie-breaker
WHILE 1 = 1
BEGIN
-- Add the next largest project to the current group
INSERT #Groups
SELECT TOP (1)
@GroupID AS GroupID,
P.ProjectId
FROM #Projects AS P
WHERE NOT EXISTS
(
-- Project not already allocated to a group
SELECT *
FROM #Groups AS G
WHERE G.ProjectID = P.ProjectId
)
AND NOT EXISTS
(
-- Employees for this project
SELECT EP2.EmployeeId
FROM #Employee_Projects AS EP2
WHERE EP2.ProjectId = P.ProjectId
INTERSECT
-- Employees already in the current group of projects
SELECT EP.EmployeeId
FROM #Groups AS G2
JOIN #Employee_Projects AS EP
ON EP.ProjectId = G2.ProjectID
WHERE G2.GroupID = @GroupID
)
GROUP BY P.ProjectId
ORDER BY
COUNT_BIG(*) DESC,
NEWID(); -- Random order tie-breaker
-- No project found for this group
IF @@ROWCOUNT = 0
BEGIN
-- Finished when no more projects left to process
IF NOT EXISTS
(
SELECT P.ProjectId FROM #Projects AS P
EXCEPT
SELECT G.ProjectID FROM #Groups AS G
)
BEGIN
BREAK;
END;
-- Next group number
SET @GroupID += 1;
END;
END;
SELECT
G.GroupID,
G.ProjectID
FROM #Groups AS G;╔═════════╦═══════════╗
║ GroupID ║ ProjectID ║
╠═════════╬═══════════╣
║ 1 ║ 3 ║
║ 1 ║ 4 ║
║ 2 ║ 2 ║
║ 2 ║ 6 ║
║ 3 ║ 5 ║
║ 3 ║ 7 ║
║ 4 ║ 1 ║
╚═════════╩═══════════╝Context
StackExchange Database Administrators Q#86740, answer score: 6
Revisions (0)
No revisions yet.