patternsqlMinor
Implementing "ANY" aggregate operator in T-SQL
Viewed 0 times
operatorsqlanyimplementingaggregate
Problem
I have a database that contains department and employee tables, and I need to return the number of employees for each department. I write the following query:
The problem is, I'm worried that the
I could probably use the MIN/MAX aggregate functions, but they would likely make the query even slower. I know that, in MySQL, you can select a column that doesn't appear in the
SELECT d.ID, d.Name, COUNT(*) EmployeeCount
FROM dbo.Departments d
INNER JOIN dbo.Personnel p
ON p.DepartmentID = d.ID
GROUP BY d.ID, d.Name;The problem is, I'm worried that the
GROUP BY operator is going to be slow when working on a VARCHAR column. I'd like to be able to write SELECT d.ID, ANY(d.Name), COUNT(*) EmployeeCount
FROM dbo.Departments d
INNER JOIN dbo.Personnel p
ON p.DepartmentID = d.ID
GROUP BY d.ID;I could probably use the MIN/MAX aggregate functions, but they would likely make the query even slower. I know that, in MySQL, you can select a column that doesn't appear in the
GROUP BY clause, and the DB engine will return any row value at random for that column. Is something like that doable in T-SQL?Solution
I think you are trying to optimize a performance problem that doesn't exist. I don't know why you think a
So this query doesn't do the
GROUP BY is going to be problematic here. Conceptually there isn't really much difference between what you wrote and the following:SELECT d.Id, d.Name, s.c
FROM dbo.Departments AS d
INNER JOIN
(
SELECT ID = DepartmentID, c = COUNT(*)
FROM dbo.Personnel
GROUP BY DepartmentID
) AS s
ON d.ID = s.ID;So this query doesn't do the
GROUP BY on the Name column, but in my testing this is actually less efficient than your version unless you create a non-clustered index on Personnel.DepartmentID (in which case the plans are exactly the same). I suggest optimizing performance when you actually observe a performance problem, not based on hunches. In your case the grouping essentially comes along for free because it's part of the same clustered index scan that has to read all the data anyway. I'm not sure how forcing SQL Server to pick a random name (which is going to be in the same row as the ID you already read anyway) can possibly make anything faster...Code Snippets
SELECT d.Id, d.Name, s.c
FROM dbo.Departments AS d
INNER JOIN
(
SELECT ID = DepartmentID, c = COUNT(*)
FROM dbo.Personnel
GROUP BY DepartmentID
) AS s
ON d.ID = s.ID;Context
StackExchange Database Administrators Q#22120, answer score: 8
Revisions (0)
No revisions yet.